ML Systems Architect
Design and implement LLM inference systems with expandable distributed memory.
Role Details
Location: Boston
In this role, you will
- prototype and optimize emerging ML inference systems.
- develop novel memory models for expandable vRAM.
- perform design-space exploration, implementation, and benchmarking of inference engines, both in simulations and on real hardware.
Role requirements
- MS or PhD in computer systems, ideally with the focus on LLM inference and/or distributed systems.
- familiarity with high-performance data exchange systems: RDMA, NCCL, MPI, etc.
- proficiency in Python, PyTorch, C/C++.
- knowledge of SOTA inference engines and their extensions (such as vLLM, LMCache) is a strong plus.
Compensation & Benefits (US)
- Annual salary ranges from $180K to $300K
- Equity awards granting you ownership of the company, in addition to salary
- Full health, dental, vision, disability, and life insurance
- 401(k) with company matching
- 100% charity donation matching
- Visa sponsorship and relocation support to assist you in moving to our HQ in Boston
- Flexible PTO and remote work options