Production Engineer (SRE)
Client Overview: Our client is modernizing the brokerage ecosystem. They are a diversified financial services firm replacing the legacy infrastructure used across capital markets.
They started from scratch by building a completely cloud-native clearing and custody system designed for today’s complex, global market. Their proprietary prime brokerage platform adds significant efficiency to the market while focusing on minimizing risk, redundancy, and cost for clients. Their goal is to create a single source-of-truth platform for every asset class, in every country, and any currency.
By combining highly skilled product and engineering talent with seasoned finance professionals, our client is building the essentials to compete in today’s fast-paced markets.
Role Overview:
- Our client is seeking a passionate Site Reliability Engineer (SRE) eager to learn and adopt new technologies in a fast-paced environment.
- An ideal candidate should have demonstrated great problem-solving and analytics skills. You must be a quick learner and should have a proven record of innovation and continuous learning in your previous roles. You must have demonstrated the capability to develop tooling and automation for recovery and diagnostics. You should be a self-starter and an independent team member with the ability to take initiative and work with minimal guidance.
- In this role, you will spend half of your time in understanding the current challenges and opportunities in the Technology platform by supporting Operations and Business teams to run the daily business. The other half will be dedicated to building out-of-the-box innovation solutions to minimize human intervention and improve monitoring, observability, and recovery of the platform.
Responsibilities:
- Innovation and Improvement: Continuously innovate and enhance the stability, resiliency, and recovery of production systems.
- Multi-team Collaboration: Work effectively with multiple teams and manage context switching to support a multitude of systems.
- Analytical Thinking: Employ strong analytical skills to devise out-of-the-box solutions.
- Monitoring and Observability: Envision, design, and implement comprehensive monitoring and observability solutions.
- Incident Management: Analyze incidents, identify trends, and develop solutions to prevent future occurrences.
- Tooling and Automation: Develop and implement tooling and automation to diagnose issues and improve recovery times.
- Daily Production Support: Spend time in daily production support to understand challenges and propose improvements.
- Subject Matter Expertise: Become a subject matter expert for various applications, supporting our business and technology needs.
- Independent Work: Demonstrate the ability to work independently with minimal supervision.
- Architectural Understanding: Understand common design patterns and develop an understanding of application architecture to support various teams.
- Communication: Maintain excellent communication skills and a flexible attitude.
Requirements:
The ideal candidate will have experience working with various cloud-native technologies and a background in the financial industry, which will be an added advantage.
- 5+ years of experience as a SRE
- Strong hands-on knowledge of Python is preferred
- Exposure to Cloud-native platforms
- Experience in leveraging common cloud platform tooling such as Datadog, Kubernetes, Argo, Terraform, etc
- Familiarity with Java, Golang, Kafka, Redis, Snowflake, and Postgres
Educational Background:
Bachelor’s degree in Computer Science, Engineering, or a related field.