HPC Production Engineer


Chicago
Permanent
USD250000 - USD350000
Quantitative Analytics Research and Trading
PR/551887_1751900960
HPC Production Engineer

Position Title: HPC Production Engineer - High-Performance Infrastructure Operations

A globally recognized, technology-driven trading firm specializing in high-frequency strategies is seeking a highly motivated and technically adept HPC Production Engineer to join its dynamic infrastructure team. This role is ideal for someone who thrives in fast-paced, high-stakes environments and is passionate about solving complex technical challenges at scale. You'll be part of a collaborative, cross-functional engineering group that supports the backbone of a global trading operation, ensuring systems are robust, scalable, and optimized for performance.

About the Role

As an HPC Production Engineer, you will be responsible for maintaining and enhancing the firm's high-performance computing infrastructure. This includes working on a wide array of global projects that span multiple technical domains, from systems engineering and automation to performance tuning and incident response. You'll be expected to take ownership of critical systems, proactively identify areas for improvement, and contribute to the development of tools and processes that enhance operational efficiency and reliability.

This position is not confined to a narrow set of responsibilities. Instead, it offers the opportunity to engage with a broad spectrum of engineering challenges. Whether it's debugging a low-level kernel issue, optimizing a distributed system, or automating infrastructure deployment, you'll be empowered to make a meaningful impact. The firm values engineers who are curious, adaptable, and eager to learn-those who are not afraid to dive deep into unfamiliar territory and who take pride in delivering high-quality solutions.

Key Responsibilities

  • Global Infrastructure Projects: Collaborate with teams across multiple regions to design, implement, and support infrastructure solutions that meet the demanding requirements of high-frequency trading systems.
  • Complex Problem Solving: Investigate and resolve high-impact technical issues with a focus on root-cause analysis. You'll be expected to handle incidents with a high degree of technical complexity and urgency.
  • Tool Development: Build and maintain internal tools that assist in diagnosing, triaging, and resolving production issues. These tools will be critical in improving visibility, reducing response times, and automating repetitive tasks.
  • Cross-Functional Engineering: Participate in a wide range of engineering initiatives beyond traditional infrastructure work. This includes contributing to software development, performance engineering, and systems architecture.
  • Non-Standard Work Schedule: Support a modified work week that includes weekend shifts. This schedule is designed to ensure continuous coverage and support for critical systems while offering flexibility during the traditional workweek.

What You'll Bring

  • Passion for Learning: A relentless curiosity and a drive to continuously expand your technical knowledge. You should be excited by the opportunity to work on new technologies and solve novel problems.
  • Linux Expertise: Deep understanding of Linux internals, system administration, and performance tuning. You should be comfortable working at both the user and kernel levels.
  • Networking Fundamentals: A solid grasp of networking concepts and protocols, with a willingness to deepen your expertise in areas such as low-latency networking, packet capture, and traffic analysis.
  • Scripting and Automation: Proficiency in at least one scripting language, preferably Go or Python. You should be comfortable writing clean, maintainable code and open to learning new languages as needed.
  • Debugging and Profiling: Extensive experience with debugging tools, performance profilers, and system monitoring utilities. You should be adept at identifying bottlenecks and optimizing system performance.
  • Systems Design: Proven ability to design, build, and maintain complex, distributed systems. You should have a strong understanding of scalability, fault tolerance, and system reliability.
  • Configuration Management: Familiarity with tools such as Salt, Ansible, or Puppet. You should be able to manage infrastructure as code and automate system provisioning and configuration.
  • Analytical Mindset: A methodical approach to problem-solving, with a strong emphasis on root-cause analysis and long-term solutions rather than quick fixes.
  • Dependability: A high level of reliability and accountability. You should be someone your team can count on during critical incidents and high-pressure situations.

FAQs

Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your resume and details on file so when we see similar roles or see skillsets that drive growth in organizations, we will always reach out to discuss opportunities.

Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.

We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business. 

That's why we recommend registering your resume so you can be considered for roles that have yet to be created. 

Yes, we help with resume and interview preparation. From customized support on how to optimize your resume to interview preparation and compensation negotiations, we advocate for you throughout your next career move.

Handpicked roles for you