Job Description

### About the Role The vLLM and LLM-D Engineering team at Red Hat is seeking a customer-focused developer to join as a **Forward Deployed Engineer**. In this role, you will bridge our cutting-edge inference platform (LLM-D and vLLM) with our customers' critical production environments. ### Responsibilities - **Orchestrate Distributed Inference**: Deploy and configure LLM-D and vLLM on Kubernetes clusters, setting up advanced deployments to maximize hardware utilization. - **Optimize for Production**: Run performance benchmarks, tune vLLM parameters, and configure intelligent inference routing policies to meet SLOs for latency and throughput. - **Code Side-by-Side**: Collaborate with customer engineers to write production-quality code (Python/Go/YAML) that integrates our inference engine into their Kubernetes ecosystem. - **Solve the "Unsolvable"**: Debug complex interactions between model architectures, hardware accelerators, and Kubernetes networking. - **Feedback Loop**: Act as the "Customer Zero" for our engineering teams, channeling field learnings back to product development. ### Requirements - **Experience**: 8+ years in Backend Systems, SRE, or Infrastructure Engineering. - **Skills**: Deep Kubernetes expertise, understanding of AI inference, and proficiency in coding. - **Customer Fluency**: Ability to communicate effectively between systems engineering and business value. - **Bias for Action**: Preference for rapid prototyping and iteration. ### Travel Travel only as needed to present, demo, or help execute proof-of-concepts.

Job Description

Forward Deployed Engineer AI Inference

Job Description

Forward Deployed Engineer AI Inference

Job Description