×

ML Systems Architect

Design and implement LLM inference systems with expandable distributed memory.

Role Details

Location: Boston

In this role, you will

  • prototype and optimize emerging ML inference systems.
  • develop novel memory models for expandable vRAM.
  • perform design-space exploration, implementation, and benchmarking of inference engines, both in simulations and on real hardware.

Role requirements

  • MS or PhD in computer systems, ideally with the focus on LLM inference and/or distributed systems.
  • familiarity with high-performance data exchange systems: RDMA, NCCL, MPI, etc.
  • proficiency in Python, PyTorch, C/C++.
  • knowledge of SOTA inference engines and their extensions (such as vLLM, LMCache) is a strong plus.

Compensation & Benefits (US)

  • Annual salary ranges from $180K to $300K
  • Equity awards granting you ownership of the company, in addition to salary
  • Full health, dental, vision, disability, and life insurance
  • 401(k) with company matching
  • 100% charity donation matching
  • Visa sponsorship and relocation support to assist you in moving to our HQ in Boston
  • Flexible PTO and remote work options