
Cloud Site Reliability Engineer 2
Trivandrum
in 4 days
Brief Description Lead and manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment. Design and implement strategic, operational enhancements to improve resiliency and system reliability. Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence. Represent the organisation in external client escalation calls, providing expert guidance and solutions. Architect and optimise cloud infrastructure for high performance, scalability, and cost-effectiveness. Provide thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift. Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution. Develop and execute automation strategies to streamline operational workflows and incident responses. Create and maintain comprehensive documentation of cloud architectures, processes, and incident management strategies. Mentor and coach junior engineers, fostering a culture of continuous learning and innovation. Drive strategic initiatives, collaborating with cross-functional teams to achieve organisational objectives. Preferred Skills Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s degree preferred). 12+ years of experience in cloud support, operations, or a related role. Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms. Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift. Proven leadership in managing automated deployment pipelines, including Azure DevOps. Mastery of enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools. Advanced scripting skills with PowerShell, Python, or similar languages. Extensive experience in incident management and defining SLAs for global production environments. In-depth knowledge of database management, particularly Postgres. Preferred Qualifications
Advanced certifications in cloud platforms (e.g., Azure Solutions Architect Expert). Experience with ITSM tools and processes (e.g., ServiceNow). Comprehensive understanding of security and compliance in cloud environments.
Soft Skills
Exceptional analytical and problem-solving abilities. Strong leadership and mentoring skills. Advanced communication and collaboration capabilities. Visionary approach to operational innovation and strategic planning.