Lead Engineer for HPC Hardware
We're seeking a Lead Engineer for High-Performance Compute Hardware to take ownership of a large-scale compute infrastructure built on GPU and CPU systems at this leading trading firm. In this role, you'll be responsible for the design, expansion, and fine-tuning of a platform that powers intensive computational workloads across research and engineering teams.
You'll work in close partnership with colleagues across infrastructure, data center operations, AI/ML, security, and software development to ensure the platform is robust, efficient, and ready to scale. This position emphasizes automation, hardware performance, and operational excellence, with a strong focus on mentorship and long-term infrastructure strategy.
What You'll Be Doing
- Architect and oversee a high-throughput compute environment
- Expand and optimize infrastructure to support growing technical demands
- Manage a bare-metal provisioning stack, with emphasis on OpenStack Ironic
- Continuously monitor system health and implement performance improvements
- Establish and refine operational procedures to reduce downtime and hardware faults
- Conduct diagnostics, performance tuning, and capacity forecasting
- Review and enhance hardware lifecycle workflows
- Collaborate across teams to align infrastructure with broader technical goals
- Apply security best practices to hardware and platform-level systems
- Guide and mentor junior team members, fostering a culture of technical growth
What You Bring
- Hands-on experience managing complex HPC environments at scale
- In-depth understanding of server architecture, including compute, memory, storage, and networking components
- Strong background in bare-metal provisioning and infrastructure-as-code practices
- Proven ability to troubleshoot and resolve hardware issues in production environments
- Familiarity with automation frameworks such as Ansible, Puppet, or Chef
- Experience with out-of-band management tools and APIs (e.g., Redfish, iDRAC, iLO, BMC, IPMI)
- Skills in system tuning, diagnostics, and capacity planning
- Knowledge of thermal and power efficiency in data center environments
- Awareness of hardware-level security practices
- Strong analytical and communication skills, with a collaborative mindset
Bonus Points For
- Experience in hyperscale or large compute cluster environments
- Knowledge of high-speed networking technologies (e.g., InfiniBand, 100GbE)
- Familiarity with Linux systems and scripting languages (Python, Bash, PowerShell)
- Exposure to OpenStack or similar cloud infrastructure platforms
- Experience with GPU management tools and debugging (e.g., NVIDIA-SMI)
- Prior leadership experience in mentoring or managing technical teams
FAQs
Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your CV and details on file so when we see similar roles or see skillsets that drive growth in organisations, we will always reach out to discuss opportunities.
Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.
We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business.
That's why we recommend registering your CV so you can be considered for roles that have yet to be created.
Yes, we help with CV and interview preparation. From customised support on how to optimise your CV to interview preparation and compensation negotiations, we advocate for you throughout your next career move.