- LocationTamil Nadu, India
-
IndustryInformation Technology and Services
Location: Open (should be flexible with Korea time zone)
Total Experience: 8+ Years
Notice Period: Immediate to 30 Days Preferred
Our client is looking for a skilled Observability & Site Reliability Engineer to join their team supporting large-scale, enterprise-grade infrastructure. The ideal candidate will have deep experience with observability tools—especially Grafana, Loki, Mimir, and Kubernetes metrics/logs—and a passion for performance, scale, and uptime.
Key Must-Have Skills:
- 5+ years in Observability Engineering
- Expertise in Grafana, Loki, Mimir, Alloy agent
- Strong understanding of infrastructure metrics (GPU/CPU/K8s)
- Familiarity with scripting (Python, Go, Bash)
- Prior exposure to Prometheus, ELK, Docker, Terraform
- Flexible to work with Korean stakeholders & time zones
Role Highlights:
- Design and manage observability stack across large datacentre infra.
- Build scalable telemetry systems, dashboards, alerts & reports
- Apply SRE practices to ensure system reliability and performance
- Troubleshoot real-time issues and support ongoing optimisation.
Good to Have:
- Prior experience working with Korean stakeholders
- Knowledge of cloud platforms like AWS, GCP, Azure