Service Reliability Engineer, G&A Solutions Engineering
Explicitly mentions vibe coding experience as a preferred qualification.
About the Role
Join Apple's G&A Solutions Engineering team as a Service Reliability Engineer to ensure the reliability, scalability, and performance of global, mission-critical production services. You will apply SRE principles, automate operational tasks, lead incident response, and collaborate across engineering, data, and network teams to improve service resilience.
Job Description
Role
As a Service Reliability Engineer on Apple’s G&A Solutions Engineering team, you will maintain the health, stability, and efficiency of global production services. The role focuses on applying Site Reliability Engineering (SRE) principles to automate operations, optimize performance, and ensure services are designed and operated for high reliability and scalability.
Key Responsibilities
- Proactively monitor service performance and identify bottlenecks.
- Lead incident response and conduct thorough root cause analyses (RCA).
- Develop and implement automation strategies to reduce manual intervention and improve resilience.
- Apply SRE principles to maintain reliable, scalable infrastructure.
- Collaborate with development, data engineering, DBAs, and network specialists to design services for operational excellence (monitoring, alerting, scalability).
- Create and maintain documentation, run-books, and service level objectives (SLOs).
- Participate in on-call rotations and provide 24/7 support for critical services.
- Define and supervise service level indicators (SLIs) and drive process improvement initiatives.
- Promote continuous learning and knowledge sharing within the team.
Preferred Qualifications
- Familiarity with CI/CD pipelines and DevOps practices.
- Experience with database technologies such as MySQL, PostgreSQL, and NoSQL databases.
- Knowledge of ITIL frameworks and incident management processes.
- Experience with “vibe coding”.
- Understanding of Linux/Unix system administration.
- Experience with configuration management tools (Ansible, Chef, Puppet).
Minimum Qualifications
- 4+ years of experience in Site Reliability Engineering, DevOps, or related roles supporting large-scale, enterprise services.
- Strong proficiency in at least one programming language (examples: Python, Java, Go) and scripting languages (examples: Bash, PowerShell).
- Experience with cloud platforms (examples: AWS, Azure, GCP) and cloud-native technologies (examples: Kubernetes, Docker).
- Hands-on experience with monitoring and alerting tools (examples: Prometheus, Grafana, Splunk, Datadog).
- Bachelor’s degree in Computer Science or equivalent work experience.