Information Technology & Software
8 Jun
Senior Engineer, Cloud Infrastructure
Senior Engineer – Cloud Infrastructure (AWS, Automation, AI-Driven Operations)
Skills:Cloud Infrastructure | AWS | Infrastructure as Code | DevOps | Automation | Distributed Systems | AI/ML in Infra | Observability | Security & Compliance
Department: Cloud Infrastructure / Platform EngineeringEmployment Type: Full TimeWork Mode: Onsite / Hybrid / RemoteExperience: 3–8 Years
About the Role
We are seeking a highly skilled Senior Engineer – Cloud Infrastructure to design, build, and operate scalable, secure, and highly available cloud platforms, primarily on AWS.
This role focuses on automation-first infrastructure engineering, enabling teams to build and deploy applications efficiently while maintaining high standards of security, reliability, and cost optimization.
A key aspect of this role involves leveraging AI/ML and agent-based systems to automate infrastructure workflows, incident response, and operational processes, improving system efficiency and reducing manual intervention.
Key Responsibilities
Cloud Infrastructure Engineering (AWS)
Design, implement, and manage highly available and scalable AWS infrastructure
Work with core services including:
VPC, networking, and routing
EC2, Auto Scaling, Load Balancers
S3, EBS/EFS/FSx storage systems
IAM, KMS, and security services
Ensure infrastructure is secure, resilient, and optimized for performance and cost
Infrastructure as Code & Automation
Develop and maintain Infrastructure-as-Code (IaC) using tools such as:
AWS CDK
CloudFormation
Terraform
Build automation tools using Python or TypeScript
Eliminate manual processes through automation of provisioning, patching, compliance, and reporting
AI-Driven Infrastructure & Agentic Systems
Identify opportunities to automate infrastructure workflows using AI/ML and agent-based systems
Design and implement single-agent and multi-agent workflows for:
Incident triage
Runbook automation
Change impact analysis
Cost and capacity optimization
Integrate AI agents with:
Cloud APIs
Monitoring and observability tools
Ticketing systems and workflows
Implement guardrails for safety, compliance, and reliability in AI-driven operations
Reliability, Monitoring & Operations
Own infrastructure reliability across environments
Implement monitoring and observability using tools such as:
CloudWatch
Datadog
Splunk
Define and manage SLOs, SLAs, and alerting systems
Participate in on-call rotations and incident management
Conduct root cause analysis and drive continuous improvement initiatives
Security & Compliance
Implement security best practices across cloud infrastructure
Work with IAM policies, encryption (KMS), and network security controls
Ensure compliance with organizational and regulatory standards
Collaborate with security teams for audits and governance
Collaboration & Architecture
Collaborate with:
SRE teams
Security engineering
Product engineering teams
Participate in architecture discussions, design reviews, and technical planning
Contribute to standards, best practices, and reusable infrastructure patterns
Mentoring & Knowledge Sharing
Mentor junior engineers on:
Cloud infrastructure fundamentals
Automation best practices
AI-driven operations
Contribute to:
Documentation
Runbooks
Knowledge base articles
Lead internal training sessions on cloud and automation practices
Required Qualifications
3–8 years of experience in Cloud Infrastructure / DevOps / Platform Engineering
Strong hands-on experience with AWS cloud services
Deep understanding of:
Networking (VPC, subnets, routing, VPN, security groups)
Compute and storage services
Identity and access management (IAM)
Experience with Infrastructure-as-Code tools (CDK, Terraform, CloudFormation)
Strong programming skills in Python or TypeScript
Experience building and managing production-grade cloud environments
Knowledge of monitoring, logging, and observability practices
AI / Automation Skills (Required)
Experience working with LLM-based or AI-driven automation systems
Hands-on exposure to:
AI agent frameworks or orchestration tools
Multi-step workflow automation using APIs and function calling
Understanding of:
Prompt engineering
Retrieval-Augmented Generation (RAG)
AI safety and output validation
Technical Skills
Cloud Platforms
AWS (primary)
Exposure to Azure or GCP is a plus
Infrastructure & DevOps
Terraform, AWS CDK, CloudFormation
CI/CD pipelines, Git workflows
Programming
Python
TypeScript / Node.js
Observability
CloudWatch, Datadog, Splunk
Logging, metrics, tracing, alerting
Security
IAM, encryption, network security
Compliance and governance frameworks
Good-to-Have
Experience in large-scale SaaS or multi-tenant environments
Knowledge of FinOps and cost optimization strategies
Experience integrating AI agents with:
Ticketing systems (Jira, ServiceNow)
Collaboration tools (Slack, Teams)
AWS certifications (Solutions Architect, Security, Networking)
Professional Competencies
Strong problem-solving and analytical skills
Ability to manage complex infrastructure projects end-to-end
Strong collaboration and stakeholder management
Leadership and mentoring capabilities
Adaptability in fast-changing environments
Focus on innovation, automation, and continuous improvement
Why This Role is High Impact
Build and scale enterprise-grade cloud infrastructure platforms
Drive automation-first and AI-driven infrastructure operations
Work on high-availability, large-scale distributed systems
Influence cloud architecture and engineering best practices
Contribute to next-generation infrastructure innovation
#CloudInfrastructure #AWS #DevOps #PlatformEngineering #InfrastructureAsCode #Automation #AIinInfra #MLOps #DistributedSystems #SRE #CloudEngineering #TechCareers #HiringNow