- LocationBucharest, Romania
-
IndustryInformation Technology
Project Description
ML/Cloud-based system that efficiently analyzes collected data to predict/prevent/troubleshoot system failures and performance issues in smart-devices Multi-tenancy & medium-high data volume processing Data collected from smart devices is accessed from cloud (AWS) storage and undergoes translation from the device-specific schema, file formats, etc and transformations such as selection of relevant data and features before being applied to an ML model training subsystem; qualified models are then pushed to the production environment for prediction/execution. Data handling employs scalable spark-based access. The entire processing workflow is kept in sync via pipelines defined in Airflow. The state of the entire data engineering (& ML models, training, and execution) is available via Dashboard UI
Responsibilities
Work closely with team members in a friendly, agile, highly collaborative, and supportive professional environment.
• Work with scrum master(s), business analyst(s), tech lead(s) to analyze and understand user stories in each sprint.
• Create design documents or make changes to existing ones.
• Complete coding & unit testing for the allotted stories.
• Participate in code reviews.
• Participate in-progress reviews.
Skills
Must have
At least 5 years of experience developing microservices with Java (version 8 or higher required) and Spring Boot.
Experience with Relational Databases (Postgres, MySQL) and NoSQL Databases (Mongo).
Experience with AWS Cloud technologies: S3, EMR, EC2, Glue, Athena.
Experience with Maven, Gradle, Jenkins CI/CD.
Experience building data pipelines, CI/CD pipelines, and fit-for-purpose data stores.
Nice to have
- Experience with Python
- Experience with Apache Spark (Java or Python API)
- Experience with Apache Airflow
- Experience with Dimensional Data Modeling
- Experience building data pipelines that process more than 1TB both in streaming and batch mode kubeFlow , Spark on Kubernetes.
