Job Description:Summary
The Site Reliability Engineer 3 (SRE) will develop, manage, and optimize cloud-based services on AWS and Azure. This position will play a key role in ensuring the reliability, availability, and performance of our services in cloud environments, focusing on automation, scalability, and observability. Additionally, this role will provide technical expertise in cloud architecture, system design, and DevOps practices, and will lead and mentor a team of Engineers.
...
Essential Job Functions
- Design cloud-based infrastructure and services in AWS and Azure, adhering to SRE best practices. Ensure high availability, scalability, and security of cloud environments. Collaborate with architecture and development teams to design cloud solutions that meet business requirements. - (20%)
- Work with infrastructure and development teams to ensure seamless integration of new applications and services into the cloud infrastructure following SRE best practices. Mentor junior Engineers and provide technical guidance to the team. Collaborate with cross-functional teams to design and implement disaster recovery and business continuity plans. - (20%)
- Help to implement and refine monitoring, logging and alerting systems to detect and respond to incidents proactively. Develop and enforce SRE best practices, including automation of repetitive tasks and incident response processes. Manage capacity planning, performance tuning, and cost optimization strategies for cloud resources. Build processes and tools to enable application teams to be SREs. - (20%)
- Proactively identify and address availability and performance issues and partner with teams to find solutions by fixing code, building tools, or refining process. - (20%)
- Identify and implement process improvements to enhance the reliability, scalability, and efficiency of cloud operations. Stay current with emerging cloud technologies and industry trends and recommend their adoption where appropriate. Advocate for SRE practices and drive adoption across the organization. - (20%)
Minimum Qualifications
- Bachelor’s Degree in Information Technology or related field.
- 8+ years of experience working in IT.
- 5+ years of experience working in SRE/DevOps.
- 3+ years of experience in development.
- Skills
- Application Troubleshooting
- Root Cause Analysis (RCA)
- Collaborative Mindset
- Operational Excellence (OpEx)
- Team Mentorship
- Solution Oriented Approach
- Organized Thinking
experience
12
show more Job Description:Summary
The Site Reliability Engineer 3 (SRE) will develop, manage, and optimize cloud-based services on AWS and Azure. This position will play a key role in ensuring the reliability, availability, and performance of our services in cloud environments, focusing on automation, scalability, and observability. Additionally, this role will provide technical expertise in cloud architecture, system design, and DevOps practices, and will lead and mentor a team of Engineers. ...
Essential Job Functions
- Design cloud-based infrastructure and services in AWS and Azure, adhering to SRE best practices. Ensure high availability, scalability, and security of cloud environments. Collaborate with architecture and development teams to design cloud solutions that meet business requirements. - (20%)
- Work with infrastructure and development teams to ensure seamless integration of new applications and services into the cloud infrastructure following SRE best practices. Mentor junior Engineers and provide technical guidance to the team. Collaborate with cross-functional teams to design and implement disaster recovery and business continuity plans. - (20%)
- Help to implement and refine monitoring, logging and alerting systems to detect and respond to incidents proactively. Develop and enforce SRE best practices, including automation of repetitive tasks and incident response processes. Manage capacity planning, performance tuning, and cost optimization strategies for cloud resources. Build processes and tools to enable application teams to be SREs. - (20%)
- Proactively identify and address availability and performance issues and partner with teams to find solutions by fixing code, building tools, or refining process. - (20%)
- Identify and implement process improvements to enhance the reliability, scalability, and efficiency of cloud operations. Stay current with emerging cloud technologies and industry trends and recommend their adoption where appropriate. Advocate for SRE practices and drive adoption across the organization. - (20%)
Minimum Qualifications
- Bachelor’s Degree in Information Technology or related field.
- 8+ years of experience working in IT.
- 5+ years of experience working in SRE/DevOps.
- 3+ years of experience in development.
- Skills
- Application Troubleshooting
- Root Cause Analysis (RCA)
- Collaborative Mindset
- Operational Excellence (OpEx)
- Team Mentorship
- Solution Oriented Approach
- Organized Thinking
experience
12
show more