ML Systems Architect

Design and implement LLM inference systems with expandable distributed memory.

Role Details

Location: Boston

In this role, you will

prototype and optimize emerging ML inference systems.
develop novel memory models for expandable vRAM.
perform design-space exploration, implementation, and benchmarking of inference engines, both in simulations and on real hardware.

Role requirements

MS or PhD in computer systems, ideally with the focus on LLM inference and/or distributed systems.
familiarity with high-performance data exchange systems: RDMA, NCCL, MPI, etc.
proficiency in Python, PyTorch, C/C++.
knowledge of SOTA inference engines and their extensions (such as vLLM, LMCache) is a strong plus.

Compensation & Benefits (US)

Annual salary ranges from $180K to $300K
Equity awards granting you ownership of the company, in addition to salary
Full health, dental, vision, disability, and life insurance
401(k) with company matching
100% charity donation matching
Visa sponsorship and relocation support to assist you in moving to our HQ in Boston
Flexible PTO and remote work options