Job Description
Job Summary:
We are seeking a highly skilled and motivated Infrastructure Manager to lead and manage our Infrastructure Operations (InfraOps), Application Operations (AppOps), and Site Reliability Engineering (SRE) teams. This is a pivotal role within our engineering department, tasked with ensuring our platform's reliability, scalability, and security while fostering a high-performing team culture.
The ideal candidate will bring a strong technical background in infrastructure and operations management, coupled with exceptional leadership and organizational skills.
Main Areas of Responsibility:
1. Team Leadership and Development
- Lead and mentor three teams: InfraOps, AppOps, and SRE.
- Recruit, develop, and retain top talent to ensure a high-performing team.
- Foster a collaborative culture with a strong focus on accountability, innovation, and continuous improvement.
- Define team goals and KPIs aligned with organizational objectives.
2. Infrastructure Management
- Oversee the design, deployment, and maintenance of scalable, reliable, and secure infrastructure.
- Ensure compliance with uptime SLAs (99.99%) through proactive monitoring and incident management.
- Drive automation initiatives to reduce manual work and improve efficiency.
- Manage capacity planning and cost optimization strategies.
3. Application Operations (AppOps)
- Ensure the seamless operation of deployed applications and services.
- Optimize application performance and reliability, working closely with engineering teams.
- Oversee release management processes to minimize downtime and ensure smooth rollouts.
4. Site Reliability Engineering (SRE)
- Implement and uphold SRE practices to enhance platform reliability and scalability.
- Oversee observability initiatives, including logging, monitoring, and alerting frameworks.
- Drive post-incident reviews to identify root causes and implement preventive measures.
5. Security and Compliance
- Collaborate with security teams to enforce best practices across infrastructure and applications.
- Ensure compliance with industry standards and regulations (e.g., ISO 27001, GDPR).
6. Cross-functional Collaboration
- Work closely with engineering, product, and business stakeholders to align infrastructure initiatives with organizational goals.
- Serve as a point of escalation for critical infrastructure and operational issues.