Middle Site Reliability Engineer, Online Retailer

Online interview
Wrocław Lublin

Project description

About the vacancy

Our client is one of the biggest online retailers worldwide with annual revenue of £1 billion. Over the years we helped the client develop web-portals, mobile apps, delivery control systems, staff management tools, data storage, and much more. The systems we’ve built together are in operation 24/7, contributing to the client’s success.

Site Reliability Engineering is a new role, first introduced by Google, that combines the skills of developers and ops to deliver more reliable, scalable software. The goal is to analyze a diverse set of applications (primarily built using Java, Oracle, AWS, Google Cloud services, and several other technologies) and bind them into a reliable self-healing suite, working within defined reliability requirements. This requires proactive work to ensure observability, analyze potential bottlenecks, and suggest their fixes before they become a production incident.

This position may be of interest to DevOps engineers who would like to get closer to the code or get valuable specialization with a focus on JVM stack. The position may also appeal to developers who are interested in how large scale systems operate and what happens to the code is compiled.


  • Analyze and improve the availability, latency, performance, and efficiency of the applications
  • Proactive support of production applications (both in-office and out of hours) across a range of domains, these are mainly written in Java and use Oracle databases
  • Improve the monitoring and alerting of the applications
  • Capacity planning and provisioning
  • Improve and standardize build pipelines, identify and reduce any areas of manual toil through automation
  • Consult in areas of reliability and scalability for the development of new applications
  • Work together with teams in other departments to find solutions
  • Conduct periodic on-call duties

Who we're looking for?

Must have

  • Experience in analyzing and troubleshooting production systems
  • Experience with modern software development, preferably in Java
  • Deep Understanding of Linux and UNIX-based systems
  • Familiarity with Agile software development practices
  • Understanding of TDD principles
  • Solid knowledge of SQL and modern databases
  • Experience with CI/CD-systems
  • Experience with networking (TCP/UDP, ICMP, DNS, etc), OSI Layers, infrastructure services, and security
  • Experience with software monitoring and alerting systems
  • Good English communication and problem-solving skills

Would be a plus

  • Familiarity with cloud technologies
  • Experience with Docker and Kubernetes
  • Experience with NoSQL databases

  • Healthcare package
  • Healthcare package for families
Leisure package
  • Leisure package
  • Leisure package for families
  • Cold beverages
  • Hot beverages
  • Fruits
  • Snacks
  • Conferences
  • Trainings
  • Car parking
  • Bicycle parking
  • Shower
  • Chill room
  • Integration events

Our company


Wrocław, Lublin 3000+
Tech skills
  • JavaScript
  • .NET
  • Java

Check out similar job offers