Site Reliability Engineer Core Data Services
We are seeking a Site Reliability Engineer to help shape, refine, and scale the SRE function within a growing Core Data Services organization. In this role, you will develop and promote modern reliability practices, partner closely with engineering teams, and ensure the stability and performance of complex distributed systems across both cloud and on‑premise environments.
What You'll Do
- Establish and continuously improve SRE best practices, reliability standards, and operational processes that support a rapidly expanding infrastructure ecosystem.
- Implement and mature end‑to‑end observability using tools such as Prometheus, Grafana, Loki, and Tempo to ensure deep visibility into system performance and health.
- Participate in an equitable on‑call rotation, proactively responding to incidents and supporting operational excellence across the platform.
- Define and maintain application reliability requirements and performance objectives for Kubernetes‑based environments, optimizing deployments for cost, performance, and resilience.
- Develop automation and internal tooling to improve deployment pipelines, system health checks, incident recovery procedures, and operational efficiency.
- Collaborate with development teams to improve service stability and scalability, and support blameless post‑mortems and defined service level objectives.
What You Bring
- Five or more years of experience in SRE or related roles working with large, complex distributed systems.
- A background in engineering, computer science, information systems, or equivalent applied experience.
- Expertise with observability tools such as Prometheus, Grafana, Loki, and Tempo.
- Strong hands‑on experience with Kubernetes and containerization using Docker.
- Experience operating across both cloud and on‑premises platforms, preferably with AWS.
- Proficiency in scripting languages such as Python, Bash, or Go for automation and tooling.
- A solid understanding of CI/CD concepts, Agile development methods, and DevOps culture.
- Strong communication abilities and a detail‑oriented, reliability‑focused mindset.
Nice to Have
- Exposure to databases such as PostgreSQL, Redis, or Snowflake.
- Experience with event‑streaming platforms like Kafka or Solace.
- Familiarity with orchestration tools such as Airflow.
Why This Role
This position offers the chance to build SRE foundations from the ground up, influence engineering culture, and operate across a diverse set of modern infrastructure components. The team maintains a fast‑moving environment where reliability, automation, and cross‑functional collaboration are central to success-an ideal fit for someone who enjoys designing scalable systems, driving technical standards, and making meaningful improvements to core data‑driven platforms.
FAQs
Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your CV and details on file so when we see similar roles or see skillsets that drive growth in organisations, we will always reach out to discuss opportunities.
Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.
We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business.
That's why we recommend registering your CV so you can be considered for roles that have yet to be created.
Yes, we help with CV and interview preparation. From customised support on how to optimise your CV to interview preparation and compensation negotiations, we advocate for you throughout your next career move.