- LocationBangalore, India
-
IndustryInformation Technology
Good knowledge on GCP Data Flow, Big Query, Google cloud storage, cloud function, Cloud composer, etc.
➢ Good knowledge on working on the access permissions like IAM.
➢ Understanding the upstream data designing efficient data pipelines, building data marts in Big Query, having the transformed data exposed to downstream applications.
➢ Should be comfortable in building and optimizing performant data pipelines which include data ingestion, data cleansing and curation into a data warehouse, database, or any other data platform using PySpark.
➢ Experience in distributed computing environment and Spark architecture.
➢ Optimize performance for data access requirements by choosing the appropriate file formats (AVRO, Parquet, ORC etc.) and compression codec respectively.
➢ Experience in writing production ready code in Python and test, participate in code reviews to maintain and improve code quality, stability, and supportability.
➢ Experience in designing data warehouse/data mart.
➢ Experience with any RDBMS preferably SQL Server and must be able to write complex SQL queries.
➢ Expertise in requirement gathering, technical design and functional documents.
➢ Experience in Agile/Scrum practices.
➢ Experience in leading other developers and guiding them technically.
➢ Experience in deploying data pipelines using automated CI/CD approach.
➢ Ability to write modularized reusable code components.
➢ Proficient in identifying data issues and anomalies during analysis.
➢ Strong analytical and logical skills.
➢ Must be able to comfortably tackle new challenges and learn.
➢ Must have strong verbal and written communication skills.
