- LocationBucharest, Romania
Project Description:
ML/Cloud-based system that efficiently analyzes collected data to predict/prevent/troubleshoot system failures and performance issues in smart-devices Multi-tenancy & medium-high data volume processing Data collected from smart devices is accessed from cloud (AWS) storage and undergoes translation from the device-specific schema, file formats, etc and transformations such as selection of relevant data and features before being applied to an ML model training subsystem; qualified models are then pushed to the production environment for prediction/execution. Data handling employs scalable spark-based access. The entire processing workflow is kept in sync via pipelines defined in Airflow. The state of the entire data engineering (& ML models, training, and execution) is available via Dashboard UI.
Responsibilities:
Work closely with team members in a friendly, agile, highly collaborative and supportive professional environment
• Work with scrum master(s), business analyst(s), tech lead(s) to analyze and understand user stories in each sprint
• Create design documents or make changes to existing ones
• Complete coding & unit testing for the allotted stories
• Participate in code reviews
• Participate in progress reviews
Skills
Must have
- At least 5 years of experience developing microservices with Java (version 8 or higher required) and Spring Boot.
- Experience with Relational Databases (Postgres, MySQL) and NoSQL Databases (Mongo).
- Experience with AWS Cloud technologies: S3, EMR, EC2, Glue, Athena.
- Experience with Maven, Gradle, Jenkins CI/CD.
- Experience building data pipelines, CI/CD pipelines, and fit-for-purpose data stores.
Nice to have
- Experience with Python.
- Experience with Apache Spark (Java or Python API).
- Experience with Apache Airflow.
- Experience with Dimensional Data Modeling.
- Experience building data pipelines that process more than 1TB both in streaming and batch mode kubeFlow , Spark on Kubernetes.
