Site Reliability Engineer
Our client, a leading proprietary trading firm specialising in both systematic and discretionary strategies, is seeking a Site Reliability Engineer to join their Zurich office. This is a unique opportunity to evolve and enhance a highly sophisticated production trading environment, ensuring exceptional uptime and performance. The role focuses on delivering code-driven solutions while partnering closely with developers and traders to strengthen reliability, observability, and overall operational maturity within a low-latency, high-performance ecosystem.
The ideal candidate will bring deep experience supporting highly available, performance-critical, latency-sensitive systems, alongside a strong understanding of Linux internals and networking. A solid background in reliability engineering is essential, with a clear automation-first mindset and hands-on experience with containerisation technologies.
Key responsibilities:
* Reliability & Production Ownership: Own availability, stability, and performance of Linux-based trading systems (RedHat, Rocky, Ubuntu).
* Incident Response: Lead incident management, on-call, and blameless post-mortems, driving automation to prevent recurrence.
* Operational Processes: Maintain runbooks, documentation, and standards for consistent production support.
* Production Readiness: Partner with developers and traders to ensure reliable, high-performance system design and deployment.
* Linux Systems & Performance: Perform low-level tuning (CPU, IRQ, memory, networking) for latency-sensitive workloads.
* Performance Diagnostics: Troubleshoot using perf, ftrace, tcpdump, and eBPF.
* Automation & Infrastructure: Deliver infrastructure as code with Ansible, Terraform, Python, and shell scripting.
Required Qualifications:
* Experience in Site Reliability Engineering, Linux engineering, DevOps, or infrastructure-focused roles.
* Production Systems: Proven experience supporting highly available, performance-sensitive production environments.
* Linux Expertise: Deep knowledge of Linux internals, including scheduling, memory management, interrupts, filesystems, and storage.
* Networking: Strong understanding of TCP/IP, UDP, multicast, and distributed systems networking.
* Automation & Tooling: Proficiency with Ansible, Terraform, Python, shell scripting, YAML/JSON, and Git-based workflows.
* Containers & Observability: Experience with Docker (or similar) and familiarity with observability tools such as Prometheus, Grafana, ELK, or equivalent.
FAQs
Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your CV and details on file so when we see similar roles or see skillsets that drive growth in organisations, we will always reach out to discuss opportunities.
Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.
We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business.
That's why we recommend registering your CV so you can be considered for roles that have yet to be created.
Yes, we help with CV and interview preparation. From customised support on how to optimise your CV to interview preparation and compensation negotiations, we advocate for you throughout your next career move.
