The person in this position will be responsible for maintaining and improving the ETL applications in environments based on Azure Cloud and Google Cloud components. We are looking for a person who will be part of the operations team (support) but at the same time willing to undertake development activities. Development tasks will account for a maximum of 40% of the work.
The main goal of the operations team is to supervise the processing of data provided from external sources. Every day, we receive approximately 2.5 TB of data in approximately 35,000 flat files or downloaded using API from over 40 different data providers. Collected data is processed with DataBricks / DataProc clusters or with BigQuery and is inserted into externals and native tables. The output data is used by a narrow group of Data Scientists. Due to the very wide range of supported systems, people who are well-organized but with divisibility of attention, who easily move from topic to topic, will find themselves in the team much better.
- Supervision over the processing of data supplied from external sources on the Microsoft Azure or Google Cloud Platform.
- Incidents investigation and problems analysis (technical issues related to processing data on platform and data quality issues).
- Communication and close cooperation with external data providers, platform support teams, client representatives, key users, data architects.
- In addition to the typical tasks in the field of application maintenance, there will be a need to adapt applications to changing customer expectations and changes on the part of data providers
Who we're looking for?
- High analytical skills, abstract thinking and the thought of working with many types of data and the amount of data.
- Theoretical knowledge of data ingestion, ETL, CI/CD, GitHub.
- Basic knowledge of the topics related to cloud services, Data Engineering, Big Data.
o Experience in working with cloud object/blob storage (AWS S3 or Azure Blob or GCP Storage).
o Experience in working with various file formats (validation, loading, editing): csv, json, parquet, xml, yaml.
o Knowledge of Spark - basic understanding and ability to use.
- SQL knowledge on intermediate level - the ability to write queries and analyse data, schema changes, data modification.
- Python knowledge on intermediate level - the ability to write scripts independently and the ability to modify existing ones.
- Familiar with following tools: Airflow (DAG concept).
- Experience in using the REST APIs.
- Good command of English and Polish.
- Experience with Jira, Confluence, Azure DevOps,
- Knowledge of various Google Cloud Platform components, especially: DataProc, BigQuery, Composer, Cloud Storage, Transfer Service, PubSub.
- Knowledge of various Microsoft Azure Cloud components, especially: Azure Data Factory, Azure Data Lake Storage, Azure Databricks.
- Experience in log and error analysis.
- Experience with ETL process, developing data processing end to end, anticipating potential threats and handling them.
- Experience in the operations (support) work (communication with the end-user, Incident Management, Problem Management, ITIL).