Cloud Networking Engineer - GPU

Apple Inc

Seattle, WA

Job posting number: #7293311 (Ref:apl-200577801)

Posted: November 6, 2024

Job Description

Summary
Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products very quickly. Bring passion and dedication to your job, and there's no telling what we can accomplish together. We're looking for a hardworking and passionate person to join this amazing team, and if you feel this is you, we'd love to hear from you!

The Apple Services Engineering (ASE) organization is responsible for building powerful platforms that enable engineers to deliver incredible experiences to customers.

Join this team, and you'll help us create and deploy systems that support Appleʼs world-renowned hardware and software architecture.

Our compute team is responsible for designing and building the foundational pieces of our in-house cloud technologies. In this role, you will collaborate with teams across Apple to deliver forward-looking high-performance virtual networking technologies for various cloud platforms supporting AIML workload. The successful candidate is highly motivated individual with strong technical, communication, and project management skills to create intuitive user experiences, who is passionate about quality, and is meticulous about the details that surprise and delight our customers.
Description
In this role you will be responsible for developing, debugging and maintaining virtual networking software solutions for GPUs for various cloud platforms. You will

- Drive ideas from inception to implementation establishing a reputation sought out throughout the organization for advice and consultation through setting standards, process and technical direction

- Design, implement, code, review, and debug software components and drivers

- Generate and review design documentation - Participate in qualifications and rollouts of software to production clusters

- Benchmark, analyze and improve scale, performance and resiliency issues

- Hold yourself and others to a high quality standard encouraged of Apple products
Minimum Qualifications
  • Bachelor’s Degree in Computer Science, or equivalent related experience.
  • Experience qualifying and configuring multiple versions of GPU hardware, NVIDIA drivers, and CUDA, plus their interaction with well known frameworks (TensorFlow, PyTorch, Keras, Horovod, etc).
  • Experience with advanced high-speed/low-latency Kubernetes networking stack. Ex : RoCE (RDMA over Converged Ethernet)
  • Experience in software development and deployment of networking technologies (SDN, OpenVSwitch)
  • Experience in using and building cloud technologies such as CloudStack, OpenStack, AWS, GCP etc.
  • A critical accuracy for correct code
  • Proficiency in testing your software
  • Ability to parse user requirements and develop production-quality solutions.
Preferred Qualifications
  • Experience running large NVIDIA GPU fleets at scale. NVIDIA GPU nodes are less stable than CPU-only nodes. Requires knowledge on linux kernel tuning for GPUs, common HW/SW issues, and implementing preemptive health-checks.
  • Experience qualifying and configuring environments where containers run a different GPU driver version from the one running on the host (Over docker/K8S or just LXC containers).
  • Ability to effectively communicate within a team, with project collaborators, and project partners
  • Experience with automation frameworks and CI/CD systems.
  • Excellent problem solving and analytical thinking skills
  • Enthusiastic about quality, design, and user experience
Pay & Benefits




Apply Now

Please mention to the employer that you saw this ad on Sciencejobs.org

More Info

Job posting number:#7293311 (Ref:apl-200577801)
Application Deadline:Open Until Filled
Employer Location:Apple Inc
Jacksonville,Florida
United States
More jobs from this employer