· Solid understanding of Application monitoring and troubleshooting
· Experience in Datadog
· Experience with APIs performance monitoring tool alerts, dashboards, or data trend analysis in a monitoring tool
· Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs.
· Ability to provide monitoring tool infrastructure recommendations.
· Hands-on and technically savvy, you have experience helping teams launch applications into a complex production environment.
· Demonstrated ability to work collaboratively across the organization; strong technical and leadership skills, experience building and fostering strong working relationships
· Solid communication skills, attention to detail, strong presentation skills
Roles & Responsibilities:
· Ensure that the application monitoring team is prepared, scheduled, equipped and coordinated to manage highly available systems.
· Monitoring Application functioning, uptime, and issues in a 24*7 environment
· Monitoring third party applications integrated and ensuring smooth functioning of the same
· Investigate issues found while monitoring and fix them
· Coordinate with Engineering, DevOps, and Customer Teams to resolve issues
· Partner with Product, Operations and infrastructure teams around Datadog to understand how the applications are deployed so we can effectively scale, evolve and support a broad range of use cases to deploy Datadog.
· Create high-scale, highly-performant interactive visualizations (graphs, maps, charts) that help Operations better understand the story and health of our infrastructure.
· Employ expertise in performance monitoring tool alerts, dashboards, and data trend analysis in a monitoring tool to provides AlwaysOn alerting, metrics visualization, logs, and application tracing.
· Provide technical solutions to a wide range of difficult problems. Provide on-the-job training to application POCs and event management.
· Build rock solid libraries to trace requests as they flow across servers, databases, caches and micro-services.
· Responsible for ensuring that our high-volume, low-latency environments continue to perform around the clock.
· Experience in interacting with cloud infrastructure (AWS, Azure)
· Solve a scaling bottleneck in a critical service
· Identify opportunities and implement improvements in monitoring processes.
· Lead talented engineers in solving problems
Work Location: Noida
Shift : Rotational...