Job Description
### About the Role
The vLLM and LLM-D Engineering team at Red Hat is seeking a customer-focused developer to join as a **Forward Deployed Engineer**. In this role, you will bridge our cutting-edge inference platform (LLM-D and vLLM) with our customers' critical production environments.
### Responsibilities
- **Orchestrate Distributed Inference**: Deploy and configure LLM-D and vLLM on Kubernetes clusters, setting up advanced deployments to maximize hardware utilization.
- **Optimize for Production**: Run performance benchmarks, tune vLLM parameters, and configure intelligent inference routing policies to meet SLOs for latency and throughput.
- **Code Side-by-Side**: Collaborate with customer engineers to write production-quality code (Python/Go/YAML) that integrates our inference engine into their Kubernetes ecosystem.
- **Solve the "Unsolvable"**: Debug complex interactions between model architectures, hardware accelerators, and Kubernetes networking.
- **Feedback Loop**: Act as the "Customer Zero" for our engineering teams, channeling field learnings back to product development.
### Requirements
- **Experience**: 8+ years in Backend Systems, SRE, or Infrastructure Engineering.
- **Skills**: Deep Kubernetes expertise, understanding of AI inference, and proficiency in coding.
- **Customer Fluency**: Ability to communicate effectively between systems engineering and business value.
- **Bias for Action**: Preference for rapid prototyping and iteration.
### Travel
Travel only as needed to present, demo, or help execute proof-of-concepts.