Intermediate Site Reliability Engineer, Environment Automation
GitLab is an open core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world.
As a Site Reliability Engineer at GitLab, you are responsible for keeping all user-facing services and other GitLab production systems running smoothly. Our SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments and the GitLab codebase.
What you’ll do
- Automate operational tasks for Environment Automation SRE.
- Develop a good warning system for reliable maintenance tasks.
- Plan monitoring and alerting systems based on customer usage patterns.
- Respond to user emergencies and support requests.
- Implement new security measures for GitLab infrastructure.
- Collaborate with engineering stakeholders to resolve architectural bottlenecks.
What you’ll bring
- Experience in running and operating production workloads.
- Strong programming skills - preferably with Ruby and/or Go.
- Strong background with Infrastructure as Code technologies like Terraform and Ansible.
- Able to reason about large systems and their operations.
- Enjoy working collaboratively across teams.