We are seeking an Site Reliability Engineer (SRE) to support Digital and Commercial applications. This role is part of IT Operations and is responsible for ensuring the availability, stability, and performance of production systems through proactive monitoring, incident response, and operational support. The role reports to the Director of IT and works closely with application, infrastructure, and security teams.
Responsibilities Key Responsibilities- Provide 24x5 operational support for Digital and Commercial applications and underlying infrastructure.
- Act as a primary responder for P1/P2 production incidents , performing triage, recovery, and coordination until service restoration.
- Monitor application and infrastructure health using monitoring and alerting tools; take action based on defined runbooks.
- Execute standard recovery procedures such as service restarts, scaling, failover, and rollback activities.
- Support change, release, and deployment activities , ensuring operational readiness and minimal business impact.
- Perform initial root cause analysis (RCA) and provide logs, metrics, and incident details to L3 or engineering teams.
- Maintain and update runbooks, SOPs, and operational documentation .
- Support disaster recovery (DR) and backup processes , including testing and validation activities.
- Ensure adherence to security, compliance, and vulnerability remediation requirements for supported systems.
- Work closely with L3 SREs, application teams, and vendors for issue resolution and continuous improvement.
Qualifications YOU MUST HAVE- Experience in production support, SRE, or IT Operations roles.
- Hands‑on knowledge of Linux/Windows , cloud platforms ( AWS ), and basic networking concepts.
- Experience with monitoring, alerting, incident management, and ticketing tools .
- Ability to follow runbooks and operate in a high‑availability, SLA‑driven environment .
- Exposure to ITIL processes (Incident, Problem, Change).
WE VALUE- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Relevant industry certifications such as TOGAF, AWS Certified Solutions Architect, or similar.
- Strong communication and interpersonal skills, capable of effectively articulating complex technical concepts to non-technical stakeholders.
- Experience collaborating with cross-functional teams, including business partners, service owners, and technical teams.
- Excellent problem-solving and analytical skills with a strong attention to detail.
- Demonstrated success in driving digital transformation initiatives.
- Strong project management and leadership skills.
- A passion for staying current with emerging industry trends and best practices.