We are looking for a DevOps software engineer with in-depth knowledge of software support infrastructure for Machine Learning projects. In this position, you will manage infrastructure projects and processes and develop the necessary tools to support such infrastructure. Keen attention to detail, problem-solving abilities, and a solid knowledge base are essential.
- Design and implement software infrastructure for build, deployment, and configuration management in a distributed environment.
- Build and test automation processes
- Manage and develop continuous integration (CI) and continuous delivery (CD) tools
- Document and design various processes
- Review new software infrastructure techniques and technologies
Who we're looking for?
- Experience and in-depth knowledge of Linux infrastructure
- Experience with job schedulers such as SLURM or PBS
- Experience with implementing and maintaining provisioning software such as Ansible
- Experience working with Dockers containers
- Basic networking skills such as setting up routes, tunnels and understanding of security firewalls
- In-depth understanding of relational database systems
- Working knowledge of CI/CD systems, such as Jenkins
- Working knowledge of Grafana and Prometheus
- Working knowledge git and GitHub
- Knowledge of shell scripting, Python, Groovy and C++
- Understanding of testing infrastructures (CTest, Google Tests etc.)
- Good interpersonal skills and communication with all levels of management
- Previous experience in development and operations, or related IT, computer, or operations field is a plus
- Understanding of agile development methodologies such as Scrum and Kanban
- Up to date on the latest industry trends, able to articulate trends clearly and confidently
- Experience with build systems, such as cmake and Bazel is a huge plus