vllm on Home

vllm on Home https://www.stephan.michard.io/tags/vllm/ Recent content in vllm on Home Hugo -- gohugo.io en Sat, 07 Mar 2026 00:00:00 +0000 Extending the Local AI Stack with On-Demand GPU Inference on RunPod https://www.stephan.michard.io/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/ Sat, 07 Mar 2026 00:00:00 +0000 https://www.stephan.michard.io/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/ Conceptual illustration of the extended AI stack with elastic cloud GPU resources for running large language models on demand - AI generated Introduction In this post, I want to describe how I extended the local AI stack I built in my homelab with on-demand GPU-backed model inference, without adding any GPU hardware to the lab itself. The two previous posts in this series provide the context for what follows. The homelab post covers the base infrastructure: thin clients, Docker Compose, Traefik, and internal DNS.