ML/Cloud based system that efficiently analyzes collected data to predict/prevent/troubleshoot system failures and performance issues in smart-devices
Multi-tenancy & medium-high data volume processing
Data collected from smart devices is accessed from cloud (AWS) storage and undergoes translation from device-specific schema, file formats, etc and transformations such as selection of relevant data and features before being applied to a ML model training subsystem; qualified models are then pushed to production environment for prediction/execution. Data handling employs scalable spark-based access. The entire processing workflow is kept in sync via pipelines defined in airflow.
The state of the entire data engineering (& ML models, training and execution) is available via Dashboard UI
- Bring data science/machine learning expertise to a local Scrum team, providing qualified deliverables and services on schedule.
- Participate in progress reviews
- Work with scrum master(s), tech lead(s) to analyze and understand user stories in each sprint
- Complete coding & unit testing for the allocated stories
- Create design documents or make changes to existing ones
- Complete code reviews
Who we're looking for?
- At least 4 years of experience working on data science/machine learning problems
- Knowledge of a variety of machine learning techniques (clustering, random forests, neural networks, etc.) and their real-world advantages/drawbacks.
- Experience working with ML frameworks (Pytorch, Tensorflow, etc.) and libraries (scikit-learn, pandas)
- Experience working in Python
- Experience working in Data frameworks such as Spark, Hadoop, HDFS
- Experience with OOP software engineering practices
- Experience with Kaggle competitions (https://www.kaggle.com/competitions)
- Knowledge of Apache Airflow
- Knowledge / experience with AWS tools *Experience working in IoT or predictive maintenance domain *Experience visualizing/presenting data for stakeholders using: D3, Qlik sense, ggplot