<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>vllm on Home</title>
    <link>https://www.stephan.michard.io/tags/vllm/</link>
    <description>Recent content in vllm on Home</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <lastBuildDate>Sat, 07 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.stephan.michard.io/tags/vllm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Extending the Local AI Stack with On-Demand GPU Inference on RunPod</title>
      <link>https://www.stephan.michard.io/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/</link>
      <pubDate>Sat, 07 Mar 2026 00:00:00 +0000</pubDate>
      
      <guid>https://www.stephan.michard.io/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/</guid>
      <description>Conceptual illustration of the extended AI stack with elastic cloud GPU resources for running large language models on demand - AI generated Introduction In this post, I want to describe how I extended the local AI stack I built in my homelab with on-demand GPU-backed model inference, without adding any GPU hardware to the lab itself.
The two previous posts in this series provide the context for what follows. The homelab post covers the base infrastructure: thin clients, Docker Compose, Traefik, and internal DNS.</description>
    </item>
    
  </channel>
</rss>
