ML / AI Engineer
Responsibilities
- Lead the design and development of a scalable, reliable, and reproducible machine learning research platform.
- Build infrastructure to support large-scale experimentation, model training, and simulation across both on‑premise high‑performance compute environments and multi‑cloud setups.
- Work closely with researchers to understand evolving workflows and translate those needs into robust platform capabilities.
- Architect and optimize distributed training pipelines for high-throughput, GPU‑accelerated workloads.
- Enhance experiment management, model versioning, artifact tracking, and data lineage to ensure transparent and repeatable research processes.
- Develop tools and frameworks that improve feature engineering, dataset creation, and large-scale backtesting.
- Drive initiatives to improve compute efficiency, resource allocation, and workload isolation across heterogeneous environments.
- Enhance platform observability with improved metrics, logging, tracing, and debugging capabilities tailored to ML and distributed systems.
- Support rapid iteration by delivering features and fixes quickly while maintaining strong engineering standards.
- Contribute to long-term architectural planning to ensure the platform scales with growing data volumes and model complexity.
Qualifications
- 2+ years of experience designing and building distributed systems at scale, ideally supporting research or data-heavy workloads.
- Strong programming skills in Python with a focus on clean, maintainable, high-performance code.
- Experience running applications on Linux-based HPC clusters and/or cloud computing platforms.
- Solid understanding of distributed computing, parallel processing, and resource management.
- Hands-on experience with GPU workloads and familiarity with modern ML frameworks such as PyTorch, TensorFlow, or JAX.
- Experience optimizing data pipelines and handling large structured and unstructured datasets.
- Strong debugging skills with the ability to diagnose issues across multiple layers of the stack.
- Comfortable working independently in a fast-paced, research-oriented environment.
- Strong communication skills and experience collaborating directly with researchers or data-focused teams.
Preferred Attributes
- Experience building internal ML platforms or research tooling at scale.
- Familiarity with experiment‑tracking tools, workflow orchestration systems, and model lifecycle management.
- Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Exposure to high-performance or latency-sensitive domains such as quantitative research, simulation systems, or large‑scale distributed compute.
FAQs
Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your CV and details on file so when we see similar roles or see skillsets that drive growth in organisations, we will always reach out to discuss opportunities.
Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.
We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business.
That's why we recommend registering your CV so you can be considered for roles that have yet to be created.
Yes, we help with CV and interview preparation. From customised support on how to optimise your CV to interview preparation and compensation negotiations, we advocate for you throughout your next career move.