- LocationBangalore, India
-
IndustryInformation Technology
Good knowledge on GCP Data Flow, Big Query, Google cloud storage, cloud function, Cloud composer, etc.
➢ Should be comfortable in building and optimizing performant data pipelines which include data ingestion, data cleansing and curation into a data warehouse, database, or any other data platform using PySpark.
➢ Experience in distributed computing environment and Spark architecture.
➢ Optimize performance for data access requirements by choosing the appropriate file formats (AVRO, Parquet, ORC etc) and compression codec respectively.
➢ Experience in writing production ready code in Python and test, participate in code reviews to maintain and improve code quality, stability and supportability.
➢ Ability to write modularized reusable code components.
➢ Able to mentor new members for onboarding to the project.
➢ Experience in designing data warehouse/data mart.
➢ Leading the client calls to flag off any delays, blockers, escalations and collate all the requirements
➢ Experience with any RDBMS preferably SQL Server and must be able to write complex SQL queries.
➢ Proficient in identifying data issues and anomalies during analysis.
➢ Strong analytical and logical skills.
➢ Must be able to comfortably tackle new challenges and learn.
➢ Must have strong verbal and
