Senior Site reliability Engineer
A industry leading global investment firm is seeking a Senior Site Reliability Engineer to join its core engineering platform team. This is a high‑impact role where you will help shape the reliability, observability, and operational excellence of a rapidly scaling technology environment that underpins cutting‑edge research and trading systems.
If you are passionate about building resilient platforms, automating everything, and influencing engineering standards across an organisation, this role offers exceptional scope and technical challenge.
As a Senior SRE, you will:
- Lead the development and evolution of the firm's observability stack, ensuring high‑quality metrics, alert fidelity, and scalable system health monitoring.
- Build reliable, low‑noise dashboards and alerting using modern tooling across metrics and logs.
- Improve incident detection, response, and post‑incident processes through automation, configuration improvements, and engineering changes.
- Define and apply SLIs/SLOs to support operational and strategic decision‑making.
- Enhance reliability, scalability, and operability of core services through hands‑on development work.
- Reduce manual operational tasks by identifying recurring issues and implementing automation.
- Apply Infrastructure as Code principles across observability and platform components.
- Develop tooling and automation primarily in Go (preferred) or Python.
- Shape engineering standards by introducing best‑practice patterns, documentation, and platform defaults.
- Collaborate with service‑owning teams to deliver measurable, sustained platform reliability improvements.
What You'll Bring
- Strong, practical SRE and SWE experience within production environments.
- Hands‑on experience operating containerised workloads (Docker or Podman).
- Essential development experience in Go (preferred) or Python.
- Experience with Grafana (dashboards and alerting).
- Strong Infrastructure-as-Code experience across Terraform and/or Ansible.
- Familiarity with OpenTelemetry: metrics, logs, and tracing.
- Kubernetes and cloud-native engineering experience.
- Exposure to datacentre compute platforms and hardware-backed services.
- AWS configuration and deployment experience.
FAQs
Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your resume and details on file so when we see similar roles or see skillsets that drive growth in organizations, we will always reach out to discuss opportunities.
Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.
We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business.
That's why we recommend registering your resume so you can be considered for roles that have yet to be created.
Yes, we help with resume and interview preparation. From customized support on how to optimize your resume to interview preparation and compensation negotiations, we advocate for you throughout your next career move.