- LocationCairo, Egypt
-
IndustryComputer & Network Security
*Responsibilities:
1) Hands on experience in design and implementation of an Observability Platform/
framework for a complex setup.
2) Should build and maintain solutions for getting insights on infrastructure and services
3) Aupporting applications with focus on logs, metrics and application traces that improve
Observability.
4) Should have experience in implementing effective monitoring, metrics, logging, and
visualization systems to enable end-to-end Observability platform.
5) Should have extensive experience in working with time series data, telemetry, log parsing/ processing / analytics, or data engineering.
6) Should be highly technical and hands-on engineer who has built and developed highly
scalable and mission critical observability platform for traditional compute and cloud
native servers.
7) Should focus on Adaptability of monitoring tools and Observability platforms.
8) Providing health and performance reports, developing AIOps rules, creating alerts,
creating custom dashboards
9) Should think about the problem end-to-end: automation of data collection from common
data sources, store data efficiently in Datalake or Monitoring tool, render this information
for the user based on the defined SLOs and SLIs.
10) Collaborate engineering with MSPs, application owners, management and infrastructure
teams to fulfil the monitoring deliverables.
*Qualifications:
1) 6 - 8 years of Experience with monitoring and observability tools and methodology of
products such as; ELK, Splunk, Elastic APM, AppDynamics, Dynatrace,
Solarwinds/Sevone,ThousandEye, Grafana, Prometheus, BigPanda, ServiceNow ITOM,
etc.
2) Solid understanding of performance metrics, KPIs, statistical calculations, machine
learning, and correlation.
3) Experience in using Elastic tools - Beats, Logstash, Elasticsearch, Enterprise search,
Elastic agents, Kibana and X-Pack.
4) Experience in setting up APM server and client.
5) creating insightful dashboards in Grafana with Multi source Data Source.
6) APM Monitoring (Java, .NET, PHP, ruby, python, Go, C SDK, Node.js monitoring, APM
transaction, logs, etc.).
7) 3 - 5 years of experience in understanding and maintaining APM, RUM, and Distributed
tracing capabilities for an observability stack.
8) At least 4 years of experience instrumenting application for Open-telemetry data.
9) Must have strong automation/scripting skills - proficiency in Python or Ruby is a plus.
10) 3 to 5 years of experience working with various agents and collectors
(Beats,telegraf, opentelemetry, splunk, etc).
