The SRE will be part of the central DevOps and SRE team with responsibilities across different products in Magnitude. SRE handles the Production deployments for various enterprise customers, their connectivity to the stack on cloud while making sure that the other teams within the department can focus on the application development.
The Senior SRE is one with a strong affinity for monitoring and observability tools, troubleshooting incidents, driving automation, and implementing preventative actions
Be analytical by nature with an innate ability to understand and solve failures due to complex system interactions
Have strong skills in scripting and debugging cloud connectivity and functional skills for a complex application deployed on AWS\Azure
Be analytical by nature with an innate ability to understand and solve failures due to complex system interactions.
Passionate about keeping the customers we have happy with our product!
Must have skills:
Proven work experience as a SRE or DevOps engineer for On-premises, Cloud and Hybrid environments
Strong AWS knowledge: AWS services, network configs, Application security, infrastructure security
SRE tooling for monitoring, profiling, troubleshooting, and patching. Prescribe proactive ways for automated measurements of production availability, uptime, outages etc.,
SRE operational processes such Incident response, RCA, prepare runbooks
Tech skills - AWS, monitoring, alerting tools such as Dynatrace, PagerDuty, Strong troubleshooting skills.
Proven work experience in maintaining Dev, Staging and Production environments
Work experience with Windows / Linux administration, scripting such as powershell for automating administrative tasks.
Good to have skills:
Working experience with orchestration tools such as Ansible, Terraform and similar IaC technologies
work experience with Git and other SCM tools
work experience with Python, PowerShell, and/or any other scripting technologies, automated software installs and upgrades
work experience with Desktop and Web Technologies, Service APIs
work experience with Docker, Kubernetes, or other containerization technologies
Support our customers, responding to escalation from our support and professional consultant departments, so that customers are able to use our product fully at all times.
Suggest and prioritize new services and tools we should introduce to streamline our customer support processes in order to meet our Service Level Agreements sustainably.
Contribute to the definition of Service Level Agreements for critical services, driving adoption and helping team meet those agreements sustainably.
Spin up stacks using existing automated deployments to the cloud, establish connectivity to the customer systems (if applicable), perform stack hardening and bootstrapping including security before handing over the stack to the customer support.
Monitor and maintain OS updates, vulnerabilities reported in tools such as Qualys and Sentinel 1
Align with Application infrastructure needs and our customer support department to understand our customer needs and set appropriate expectations.
Promote processes and practices that strengthen and enable a culture of continuous improvement.
Implement and improve security posture of our cloud environments
Quickly identify and solve problems detected in production
Participate in an on-call rotation
7+ years of experience running applications in the cloud esp AWS. Strong cloud networking skills.
3+ years of experience with scripting and automation of infrastructure creation and modifications. Proficient in automating repeatable maintenance related tasks.
Passionate about observability
Customer empathy & Automation mindset...