Job Description
As a Site Reliability Engineer, you will be a core contributor in Juniper Cloud platform team. Your core responsibility is to provide operational support of cloud-based SaaS applications with an emphasis on deployment, scalability, and reliability running on cloud infrastructure.
We are looking for a highly motivated, self-driven, and dedicated Site Reliability Engineer possessing hands-on experience with:
- Experience building and running large-scale, fault-tolerant production cloud systems on AWS and/or GCP.
- Coding infrastructure automation with Terraform, Packer, and Ansible.
- Experience with Linux/Unix operating systems internals, file systems, system tuning, administration, and networking.
- Deep experience in microservice technologies, container orchestration and continuous deployment (Kubernetes, Docker, Helm, GitOps with Flux CD/Argo CD).
- Experience in designing, building, maintaining production services, troubleshooting large-scale distributed systems.
- Experience with technologies like Apache Kafka, Redis/Valkey, Postgres, Elasticsearch.
- Experience with observability tools and methodology (monitoring, logging, tracing, SLOs/SLIs) for detecting and diagnosing issues in advance before causing customer impact or performance degradation.
- Strong software development using Python.
- Have an urge for delivering quickly and effectively.
- Strong problem solving and debugging skills with a high sense of ownership.
Responsibilities
- Engage in and improve the whole lifecycle of services — from inception and design, through to deployment, operation, and refinement.
- Support development of services from planning phase before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Provide technical leadership and guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Capacity planning the growth of cloud infrastructure.
- Improve operational processes such as deployments and upgrades.
- Manage execution of project priorities, deadlines, and deliverables.
- Be on an on-call rotation to respond to incidents that impact platform availability.
- Use your on-call shift to prevent incidents from ever happening.
- Experience in incident response, including conducting post-mortems and implementing lessons learned, enhances system reliability.
Preferred Qualifications
- 8+ years of engineering or systems experience.
- Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns.
- Strong understanding of network design and architecture.
- Scaling and managing distributed systems.
- Significant experience with monitoring and observability platforms.
- Demonstrated ability to debug, fix, and optimize code.
- Troubleshooting skills across network, application, and distributed services layers.
- The ability to learn quickly and adapt to new technologies is essential.
- Excellent communications skills, both verbal and written.
ABOUT JUNIPER NETWORKS
Juniper Networks is in the business of network innovation. From devices to data centers, from consumers to cloud providers, Juniper Networks delivers the software, silicon and systems that transform the experience and economics of networking. Our products and technology run the world's largest and most demanding networks today, enabling service providers, enterprises, and governments to create value and accelerate business success. Everyday our 9,000+ colleagues come together across 46 countries to realize our company vision - Connect Everything, Empower Everyone. We are innovating in ways that empower our customers, our partners and ultimately, everyone, in a connected world. These customers include the top 130 global service providers, 96 of the Fortune 100 and hundreds of public sector organizations.
WHERE WILL YOU DO YOUR BEST WORK?
Wherever you are in the world, whether it's downtown Sunnyvale or London, Westford or Bangalore, Juniper is a place that was founded on disruptive thinking - where colleague innovation is not only valued, but expected. We believe that the great task of delivering a new network for the next decade is delivered through the creativity and commitment of our people. The Juniper Way is the commitment to all our colleagues that the culture and company inspire their best work-their life's work. At Juniper we believe this is more than a job - it's an opportunity to help change the world...