Senior Engineer, Advanced Computing - HPC/Kubernetes
A leading global investment firm is seeking a highly skilled and innovative Senior Infrastructure Engineer to join its elite technology team. This role offers the opportunity to work on cutting-edge infrastructure projects that directly support quantitative research and trading strategies. The ideal candidate will be passionate about high-performance computing (HPC), cloud-native technologies, and building scalable, resilient systems across hybrid environments.
Key Responsibilities
As a Senior Infrastructure Engineer, you will play a pivotal role in designing, building, and maintaining the firm's next-generation compute and storage platforms.
Your responsibilities will include:
- HPC Platform Development: Architect and enhance a high-performance computing environment tailored for research workloads, ensuring optimal performance and scalability.
- Global Storage Systems: Design, implement, and manage a robust, distributed storage infrastructure across multiple data centers and cloud environments.
- Linux Systems Engineering: Maintain and optimize a large fleet of Linux servers, focusing on performance tuning, kernel-level debugging, and system reliability.
- Kubernetes Operations: Build and support a hybrid Kubernetes estate, spanning both on-premises and AWS environments, with a focus on automation, observability, and resilience.
- Task Scheduling Infrastructure: Contribute to the development and refinement of a proprietary task scheduling system that orchestrates complex compute workflows across the firm's infrastructure.
Curiosity & Passion:
- A deep interest in emerging technologies and a commitment to continuous learning.
- Team-Oriented Mindset: Excellent communication skills and a collaborative approach to problem-solving.
- Creative Engineering: The ability to design elegant, original solutions to complex technical challenges, rather than relying solely on existing tools or frameworks.
- Automation-First Philosophy: A strong belief in infrastructure-as-code and the automation of all aspects of system management.
Required Technical Expertise
To be successful in this role, candidates should bring a strong foundation in systems engineering and software development, including:
- Linux Mastery: Deep expertise in Linux internals, including kernel operations, performance tuning, and advanced troubleshooting.
- Software & Systems Design: Proven experience designing and implementing complex systems at scale.
- Programming Proficiency: Strong coding skills in at least one compiled language (C++ or Go preferred) and one interpreted language (such as Python, Ruby, or Perl).
- Kubernetes Architecture: In-depth knowledge of Kubernetes internals, including custom operators, advanced deployment strategies, and performance optimization.
- Low-Level Debugging: Ability to diagnose and resolve intricate system-level issues using methodical analysis and creative thinking.
- AI/ML Infrastructure: Familiarity with the infrastructure demands of machine learning workloads, including GPU scheduling, data pipelines, and model training environments.
- HPC Experience: Hands-on experience with modern HPC environments, including workload orchestration, parallel computing, and performance engineering.
Preferred Qualifications (Bonus Points)
- Enterprise Storage Systems:Experience with high-performance and parallel file systems such as Vast Data, Weka, NetApp, Pure Storage, Lustre, or GPFS.
- Infrastructure as Code: Proficiency with tools like Terraform, Ansible, or Puppet for managing infrastructure declaratively.
- Observability & Monitoring: Familiarity with observability stacks such as Prometheus, Grafana, and related tools for system monitoring and alerting.
- Networking Expertise: Solid understanding of networking protocols and the ability to troubleshoot issues in distributed systems.
- Advanced I/O Hardware: Experience configuring and optimizing systems with high-performance I/O hardware, including NVMe, RDMA, and specialized interconnects.
FAQs
Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your resume and details on file so when we see similar roles or see skillsets that drive growth in organizations, we will always reach out to discuss opportunities.
Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.
We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business.
That's why we recommend registering your resume so you can be considered for roles that have yet to be created.
Yes, we help with resume and interview preparation. From customized support on how to optimize your resume to interview preparation and compensation negotiations, we advocate for you throughout your next career move.