<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Home</title><link>/posts/</link><description>Recent content in Posts on Home</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sun, 17 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="/posts/" rel="self" type="application/rss+xml"/><item><title>Running the Red Hat AI Inference Server on OpenShift</title><link>/2026/running-the-red-hat-ai-inference-server-on-openshift/</link><pubDate>Sun, 17 May 2026 00:00:00 +0000</pubDate><guid>/2026/running-the-red-hat-ai-inference-server-on-openshift/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_32/overview.png"data-src="/images/posts/post_32/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Drop-in OpenAI-compatible inference on OpenShift — RHAIIS packages vLLM for production, with hardware flexibility and a secure external endpoint out of the box - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, I want to describe how to deploy the &lt;strong&gt;Red Hat AI Inference Server (RHAIIS)&lt;/strong&gt; on OpenShift and expose it as an OpenAI-compatible API endpoint. This post builds on &lt;a href="/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/"&gt;Deploying OpenShift on AWS with Automated Cluster Provisioning&lt;/a&gt;, which covers getting a working OpenShift cluster into place. If you already have a cluster running, you can skip directly to the deployment steps.&lt;/p&gt;
&lt;p&gt;The inference server will load a model from Hugging Face Hub and expose a &lt;code&gt;/v1/chat/completions&lt;/code&gt; endpoint that any OpenAI-compatible client can talk to. At the end, I show how to connect the endpoint to the &lt;a href="https://openwebui.com/"&gt;Open WebUI&lt;/a&gt; setup described in &lt;a href="/2026/my-local-ai-stack-open-webui-litellm-searxng-and-docling/"&gt;My Local AI Stack&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="what-is-red-hat-ai-inference-server"&gt;What is Red Hat AI Inference Server&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;vLLM&lt;/em&gt; is an open-source inference engine designed for high-throughput LLM serving. It handles memory-efficient attention via &lt;em&gt;PagedAttention&lt;/em&gt;, continuous batching, and GPU-optimized execution, and it exposes an OpenAI-compatible HTTP API out of the box. I covered how to run vLLM on the GPU cloud provider RunPod in a &lt;a href="/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/"&gt;previous post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Red Hat AI Inference Server&lt;/strong&gt; is the supported, enterprise-packaged distribution of vLLM. Red Hat provides a hardened container image distributed through &lt;code&gt;registry.redhat.io&lt;/code&gt;, tested against specific GPU driver and CUDA versions and with a defined support lifecycle. The API surface is identical to upstream vLLM. Any client that works against a plain vLLM inference server works against RHAIIS without modification.&lt;/p&gt;
&lt;p&gt;Deploying RHAIIS directly on OpenShift is one way to reach a running inference endpoint through Red Hat technology. Red Hat OpenShift AI offers other paths, e.g. model serving through KServe, where OpenShift AI manages the deployment lifecycle via a web dashboard and exposes RHAIIS through a &lt;code&gt;ServingRuntime&lt;/code&gt;, or a &lt;a href="https://github.com/opendatahub-io/models-as-a-service"&gt;Model as a Service&lt;/a&gt; approach that provisions shared inference endpoints across a cluster, so teams can consume models without operating their own deployment. The approach in this post is the most direct option, suited for cases where you want a single inference endpoint.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;This setup requires the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A running OpenShift cluster with at least one GPU-enabled worker node. The post &lt;a href="/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/"&gt;Deploying OpenShift on AWS&lt;/a&gt; covers one way to get there.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/specialized_hardware_and_driver_enablement/psap-node-feature-discovery-operator"&gt;&lt;strong&gt;Node Feature Discovery (NFD) Operator&lt;/strong&gt;&lt;/a&gt; installed and running to detect GPU hardware on the node.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html"&gt;&lt;strong&gt;NVIDIA GPU Operator&lt;/strong&gt;&lt;/a&gt; installed to provide the CUDA runtime and device plugin.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;&lt;strong&gt;OpenShift CLI (oc)&lt;/strong&gt;&lt;/a&gt; – required to interact with the OpenShift cluster, installed and logged into the cluster.&lt;/li&gt;
&lt;li&gt;A Hugging Face access token if you intend to use a gated model. Publicly available models like &lt;a href="https://huggingface.co/ibm-granite/collections"&gt;Granite&lt;/a&gt; do not require one.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="deploying-the-red-hat-ai-inference-server"&gt;Deploying the Red Hat AI Inference Server&lt;/h2&gt;
&lt;p&gt;The deployment consists of a namespace, two secrets, a PersistentVolumeClaim for model caching, a Deployment, a Service, and a Route. All deployment files are available in the &lt;a href="https://github.com/smichard/agent_on_ocp"&gt;smichard/agent_on_ocp&lt;/a&gt; GitHub repository. The steps below apply them in sequence.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clone the repository:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git clone https://github.com/smichard/agent_on_ocp.git
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; rhaiis
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Create a Namespace&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc new-project rhaiis
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Create the required Secrets&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Hugging Face access token:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc create secret generic hf-secret &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --from-literal&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your_huggingface_token&amp;gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -n rhaiis
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;API key for the inference endpoint:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The server requires clients to present an API key as a bearer token. Storing it as a secret keeps it out of the Deployment spec.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc create secret generic vllm-api-key-secret &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --from-literal&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;VLLM_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;openssl rand -hex 32&lt;span class="k"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -n rhaiis
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Create the ConfigMap&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Set the Hugging Face model ID you want to serve. Research which model fits your use case before settling on one, the only hard requirement is that the model is supported by the vLLM inference server. The ConfigMap also carries the tool call parser name, which the deployment references to set the correct parsing mode for the chosen model.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;ConfigMap&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm-config&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;Qwen/Qwen3-Coder-30B-A3B-Instruct&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;TOOL_CALL_PARSER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;qwen3_coder&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Apply the file to create the ConfigMap:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc apply -f configmap.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="5"&gt;
&lt;li&gt;Create a PersistentVolumeClaim&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The model weights are downloaded once on first startup and cached on a persistent volume. This avoids re-downloading the model on every pod restart.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;PersistentVolumeClaim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;model-cache&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;accessModes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;ReadWriteOnce&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;150Gi&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Apply the file to create the PVC:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc apply -f pvc.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="6"&gt;
&lt;li&gt;Deploy the Inference Server&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The Deployment below references the RHAIIS container image and pulls the model ID from the ConfigMap created in step 4. To serve a different model, update the ConfigMap rather than editing the Deployment spec. The &lt;code&gt;HF_TOKEN&lt;/code&gt; and &lt;code&gt;VLLM_API_KEY&lt;/code&gt; values are injected from the secrets created in step 3.&lt;/p&gt;
&lt;style type="text/css"&gt;.notice{--root-color:#444;--root-background:#eff;--title-color:#fff;--title-background:#7bd;--warning-title:#c33;--warning-content:#fee;--info-title:#fb7;--info-content:#fec;--note-title:#6be;--note-content:#e7f2fa;--tip-title:#5a5;--tip-content:#efe}@media (prefers-color-scheme:dark){.notice{--root-color:#ddd;--root-background:#eff;--title-color:#fff;--title-background:#7bd;--warning-title:#800;--warning-content:#400;--info-title:#a50;--info-content:#420;--note-title:#069;--note-content:#023;--tip-title:#363;--tip-content:#121}}body.dark .notice{--root-color:#ddd;--root-background:#eff;--title-color:#fff;--title-background:#7bd;--warning-title:#800;--warning-content:#400;--info-title:#a50;--info-content:#420;--note-title:#069;--note-content:#023;--tip-title:#363;--tip-content:#121}.notice{line-height:24px;margin-bottom:24px;border-radius:4px;color:var(--root-color);background:var(--root-background)}.notice p:last-child{margin-bottom:0; padding: .5rem 1.2rem 1rem;}.notice-title{margin:-18px -18px 12px;padding:4px 18px;border-radius:4px 4px 0 0;font-weight:700;color:var(--title-color);background:var(--title-background)}.notice.warning .notice-title{background:var(--warning-title)}.notice.warning{background:var(--warning-content)}.notice.info .notice-title{background:var(--info-title)}.notice.info{background:var(--info-content)}.notice.note .notice-title{background:var(--note-title)}.notice.note{background:var(--note-content)}.notice.tip .notice-title{background:var(--tip-title)}.notice.tip{background:var(--tip-content)}.icon-notice{display:inline-flex;align-self:center;margin-right:8px}.icon-notice img,.icon-notice svg{height:1em;width:1em;fill:currentColor}.icon-notice img,.icon-notice.baseline svg{top:.125em;position:relative}&lt;/style&gt;
&lt;div&gt;&lt;svg width="0" height="0" display="none" xmlns="http://www.w3.org/2000/svg"&gt;&lt;symbol id="tip-notice" viewBox="0 0 512 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M504 256c0 136.967-111.033 248-248 248S8 392.967 8 256 119.033 8 256 8s248 111.033 248 248zM227.314 387.314l184-184c6.248-6.248 6.248-16.379 0-22.627l-22.627-22.627c-6.248-6.249-16.379-6.249-22.628 0L216 308.118l-70.059-70.059c-6.248-6.248-16.379-6.248-22.628 0l-22.627 22.627c-6.248 6.248-6.248 16.379 0 22.627l104 104c6.249 6.249 16.379 6.249 22.628.001z"/&gt;&lt;/symbol&gt;&lt;symbol id="note-notice" viewBox="0 0 512 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M504 256c0 136.997-111.043 248-248 248S8 392.997 8 256C8 119.083 119.043 8 256 8s248 111.083 248 248zm-248 50c-25.405 0-46 20.595-46 46s20.595 46 46 46 46-20.595 46-46-20.595-46-46-46zm-43.673-165.346l7.418 136c.347 6.364 5.609 11.346 11.982 11.346h48.546c6.373 0 11.635-4.982 11.982-11.346l7.418-136c.375-6.874-5.098-12.654-11.982-12.654h-63.383c-6.884 0-12.356 5.78-11.981 12.654z"/&gt;&lt;/symbol&gt;&lt;symbol id="warning-notice" viewBox="0 0 576 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M569.517 440.013C587.975 472.007 564.806 512 527.94 512H48.054c-36.937 0-59.999-40.055-41.577-71.987L246.423 23.985c18.467-32.009 64.72-31.951 83.154 0l239.94 416.028zM288 354c-25.405 0-46 20.595-46 46s20.595 46 46 46 46-20.595 46-46-20.595-46-46-46zm-43.673-165.346l7.418 136c.347 6.364 5.609 11.346 11.982 11.346h48.546c6.373 0 11.635-4.982 11.982-11.346l7.418-136c.375-6.874-5.098-12.654-11.982-12.654h-63.383c-6.884 0-12.356 5.78-11.981 12.654z"/&gt;&lt;/symbol&gt;&lt;symbol id="info-notice" viewBox="0 0 512 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M256 8C119.043 8 8 119.083 8 256c0 136.997 111.043 248 248 248s248-111.003 248-248C504 119.083 392.957 8 256 8zm0 110c23.196 0 42 18.804 42 42s-18.804 42-42 42-42-18.804-42-42 18.804-42 42-42zm56 254c0 6.627-5.373 12-12 12h-88c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h12v-64h-12c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h64c6.627 0 12 5.373 12 12v100h12c6.627 0 12 5.373 12 12v24z"/&gt;&lt;/symbol&gt;&lt;/svg&gt;&lt;/div&gt;&lt;div class="notice note" &gt;
&lt;p class="first notice-title"&gt;&lt;span class="icon-notice baseline"&gt;&lt;svg&gt;&lt;use href="#note-notice"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/span&gt;Note&lt;/p&gt;&lt;p&gt;Depending on the model size, the number of GPUs and the CPU and memory allocations will need to be adjusted. The example below was tested on an AWS &lt;code&gt;g5.12xlarge&lt;/code&gt; node (4x NVIDIA A10G, 24 GB VRAM per GPU) and uses all four GPUs via tensor parallelism.&lt;/p&gt;&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;apps/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Deployment&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;replicas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;matchLabels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;tolerations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;effect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;NoSchedule&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Exists&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;serviceAccountName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;default&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;model-cache&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;persistentVolumeClaim&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;claimName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;model-cache&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;shm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;emptyDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;medium&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Memory&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;sizeLimit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;16Gi&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.1-1775680192&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;imagePullPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Always&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;valueFrom&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;secretKeyRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;hf-secret&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;VLLM_API_KEY&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;valueFrom&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;secretKeyRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm-api-key-secret&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;VLLM_API_KEY&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;valueFrom&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;configMapKeyRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm-config&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;HF_HOME&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;/cache&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;HF_HUB_OFFLINE&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;VLLM_ALLOW_LONG_MAX_MODEL_LEN&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;TOOL_CALL_PARSER&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;valueFrom&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;configMapKeyRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm-config&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;TOOL_CALL_PARSER&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;python&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;-m&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;vllm.entrypoints.openai.api_server&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--port=8000&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--model=$(MODEL_NAME)&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--served-model-name=$(MODEL_NAME)&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--tensor-parallel-size=4&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--gpu-memory-utilization=0.85&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--max-model-len=65536&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--enable-auto-tool-choice&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s1"&gt;&amp;#39;--tool-call-parser=$(TOOL_CALL_PARSER)&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;limits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;10&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;4&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;128Gi&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;cpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;32Gi&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;4&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumeMounts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;model-cache&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;mountPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;/cache&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;shm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;mountPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;/dev/shm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;restartPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Always&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Apply the file to create the deployment:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc apply -f deployment.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The container reads the model ID from the ConfigMap at startup and downloads it from HuggingFace into &lt;code&gt;/cache&lt;/code&gt; (backed by the PVC). Initial startup takes several minutes depending on model size and network speed.
Follow the progress with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc logs -f deployment/rhaiis-vllm -n rhaiis
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The server is ready when the log shows &lt;em&gt;Application startup complete&lt;/em&gt;.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_32/vllm_startup.png"data-src="/images/posts/post_32/vllm_startup.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;vLLM server log output on startup, showing all registered API routes and the final Application startup complete confirmation&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Once the pod is running, you can verify GPU access from the pod terminal with &lt;code&gt;nvidia-smi&lt;/code&gt;. All four GPUs should be visible, each running a tensor-parallel worker process.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_32/nvidia_smi.png"data-src="/images/posts/post_32/nvidia_smi.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;nvidia-smi output from inside the vLLM pod, confirming all four A10G GPUs are visible and each tensor-parallel worker has allocated approximately 20 GB of VRAM&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Create a Service and Route&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Create a Service that maps port 80 to port 8000 on the pod:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Service&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;ports&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;http&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;protocol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;TCP&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;targetPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Create a TLS-terminated Route if you want to expose the endpoint outside the cluster:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;route.openshift.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Route&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Service&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;rhaiis-vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;targetPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;http&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;tls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;termination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;edge&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;insecureEdgeTerminationPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Redirect&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Apply both and retrieve the assigned hostname:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc apply -f service.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc apply -f route.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get route rhaiis-vllm -n rhaii-namespace -o &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;{.spec.host}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;OpenShift builds the hostname from the route and namespace names following the pattern &lt;code&gt;&amp;lt;route-name&amp;gt;-&amp;lt;namespace&amp;gt;.apps.&amp;lt;cluster-domain&amp;gt;&lt;/code&gt;. The result looks something like &lt;code&gt;rhaiis-vllm-rhaiis-namespace.apps.ocp.example.com&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="testing-the-endpoint"&gt;Testing the Endpoint&lt;/h2&gt;
&lt;p&gt;Store the hostname and API key in shell variables to keep the commands readable:&lt;/p&gt;
&lt;p&gt;Set environment variables once:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nv"&gt;RHAIIS_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;oc get route rhaiis-vllm -n rhaiis -o &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;{.spec.host}&amp;#39;&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nv"&gt;RHAIIS_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;oc get secret vllm-api-key-secret -n rhaiis &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -o &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;{.data.VLLM_API_KEY}&amp;#39;&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; base64 -d&lt;span class="k"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nv"&gt;MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;oc get configmap vllm-config -n rhaiis &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -o &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;{.data.MODEL_NAME}&amp;#39;&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Verify all three are populated before proceeding:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;RHAIIS_HOST : &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RHAIIS_HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;RHAIIS_API_KEY : &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RHAIIS_API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Model: &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MODEL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;**List available models:**
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sb"&gt;```&lt;/span&gt;bash
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl -s https://&lt;span class="nv"&gt;$RHAIIS_HOST&lt;/span&gt;/v1/models &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -H &lt;span class="s2"&gt;&amp;#34;Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$RHAIIS_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; jq .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Send a chat completion request:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl -sS &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;https://&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RHAIIS_HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/v1/chat/completions&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -H &lt;span class="s2"&gt;&amp;#34;Authorization: Bearer &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RHAIIS_API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -H &lt;span class="s2"&gt;&amp;#34;Content-Type: application/json&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -d &lt;span class="s1"&gt;&amp;#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;model&amp;#34;: &amp;#34;&amp;#39;&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MODEL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;messages&amp;#34;: [{&amp;#34;role&amp;#34;: &amp;#34;user&amp;#34;, &amp;#34;content&amp;#34;: &amp;#34;What is OpenShift?&amp;#34;}],
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;temperature&amp;#34;: 0.1,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;max_tokens&amp;#34;: 200
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; }&amp;#39;&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; jq -r &lt;span class="s1"&gt;&amp;#39;.choices[0].message.content&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A successful response confirms the server is running, the model is loaded, and the API key authentication is working.&lt;/p&gt;
&lt;h2 id="connecting-to-open-webui"&gt;Connecting to Open WebUI&lt;/h2&gt;
&lt;p&gt;The inference server exposes a standard OpenAI-compatible API, which means &lt;em&gt;Open WebUI&lt;/em&gt; can connect to it directly as an external provider. The setup in &lt;a href="/2026/my-local-ai-stack-open-webui-litellm-searxng-and-docling/"&gt;My Local AI Stack&lt;/a&gt; already runs Open WebUI. Adding the RHAIIS endpoint as a direct external connection requires no changes to the existing stack.&lt;/p&gt;
&lt;p&gt;In Open WebUI, go to &lt;strong&gt;Settings &amp;gt; Connections&lt;/strong&gt; and add a new external connection. Set the URL to the route hostname with the &lt;code&gt;/v1&lt;/code&gt; suffix, add the API key created in step 3 as a bearer token, set the provider type to &lt;strong&gt;OpenAI&lt;/strong&gt;, and the API type to &lt;strong&gt;Chat Completions&lt;/strong&gt;. Leave the model ID field empty so Open WebUI queries the &lt;code&gt;/v1/models&lt;/code&gt; endpoint and discovers available models automatically.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_32/open_webui.png"data-src="/images/posts/post_32/open_webui.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Open WebUI external connection configured against the Red Hat AI Inference Server endpoint&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Once saved, the deployed model appears in the model selector alongside any other configured providers.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The Red Hat AI Inference Server puts the vLLM engine into OpenShift, or any other supported platform, with a supported container image and a deployment pattern that fits standard Kubernetes workflows. The outcome is an OpenAI-compatible endpoint running on your own cluster, backed by a model from Hugging Face Hub, secured with an API key, and accessible over a TLS-terminated OpenShift Route. Any client that speaks the OpenAI Chat Completions format can talk to it, including Open WebUI, which connects to it the same way it connects to any other provider.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;GitHub repository with eployment files - &lt;a href="https://github.com/smichard/agent_on_ocp"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Deploying OpenShift on AWS with Automated Cluster Provisioning - &lt;a href="/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;My Local AI Stack: Open WebUI, LiteLLM, SearXNG, and Docling - &lt;a href="/2026/my-local-ai-stack-open-webui-litellm-searxng-and-docling/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Extending the Local AI Stack with On-Demand GPU Inference on RunPod - &lt;a href="/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Model as a Service GitHub repository - &lt;a href="https://github.com/opendatahub-io/models-as-a-service"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Node Feature Discovery Operator - &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/specialized_hardware_and_driver_enablement/psap-node-feature-discovery-operator"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;NVIDIA GPU Operator - &lt;a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenShift CLI (oc) - &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Granite family of models on Hugging Face - &lt;a href="https://huggingface.co/ibm-granite/collections"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;smichard/agent_on_ocp - GitHub repository - &lt;a href="https://github.com/smichard/agent_on_ocp"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Red Hat AI Inference Server - Documentation - &lt;a href="https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.4"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Deploying Red Hat AI Inference Server on OpenShift - &lt;a href="https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.4/html-single/deploying_red_hat_ai_inference_server_in_openshift_container_platform/index"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;vLLM - upstream project - &lt;a href="https://github.com/vllm-project/vllm"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;vLLM - OpenAI-compatible server documentation - &lt;a href="https://docs.vllm.ai/en/stable/serving/openai_compatible_server/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open WebUI - project site - &lt;a href="https://openwebui.com/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Installing OpenShift AI on OpenShift</title><link>/2026/installing-openshift-ai-on-openshift/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>/2026/installing-openshift-ai-on-openshift/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_21/overview.png"data-src="/images/posts/post_21/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;From GitOps repo to OpenShift AI deployment with verified GPU access in minutes - AI generated]&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, I want to describe how to install &lt;strong&gt;Red Hat OpenShift AI&lt;/strong&gt; on an existing OpenShift cluster and configure it to run GPU-accelerated workloads. The approach uses the &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;rhoai-gitops&lt;/a&gt; repository, created and maintained by my team mate &lt;strong&gt;Álvaro López Medina&lt;/strong&gt;, which automates the installation of OpenShift AI, the required operators, and the NVIDIA GPU stack through a single script backed by a &lt;em&gt;GitOps&lt;/em&gt; approach.&lt;/p&gt;
&lt;p&gt;If you do not have an OpenShift cluster available yet and want to provision one on AWS, a previous post &lt;a href="/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/"&gt;Deploying OpenShift on AWS with Automated Cluster Provisioning&lt;/a&gt; covers exactly that. The steps below pick up where that post leaves off, though they apply equally to any running OpenShift cluster.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;Before proceeding, ensure the following are in place:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A running OpenShift cluster with sufficient compute capacity&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;OpenShift CLI (oc)&lt;/a&gt; installed and available on your workstation&lt;/li&gt;
&lt;li&gt;Cluster-admin access&lt;/li&gt;
&lt;li&gt;If GPU support is needed: sufficient AWS quota for GPU instance types&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="selecting-the-correct-gpu-instance-node-type"&gt;Selecting the correct GPU instance node type&lt;/h2&gt;
&lt;p&gt;Selecting the right GPU instance type for your workload is a decision that is worth getting right before you provision anything, the instance family determines not just raw performance but also memory capacity, which directly constrains which models you can load and at what precision. Undersizing leads to out-of-memory failures, oversizing means paying for capacity you do not use.&lt;/p&gt;
&lt;p&gt;Consult the &lt;a href="https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html"&gt;AWS recommended GPU instances for deep learning&lt;/a&gt; to identify instance families suited to your workload, then cross-reference with the &lt;a href="https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-regions.html"&gt;EC2 instance type availability by region&lt;/a&gt; to confirm that your target region actually offers the instance type you need. GPU instance availability varies significantly across regions and is a common source of unexpected quota errors at deployment time.&lt;/p&gt;
&lt;p&gt;The following AWS instance types are commonly used in OpenShift AI GPU deployments:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance Name&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;GPU RAM&lt;/th&gt;
&lt;th&gt;vCPUs&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;g5.4xlarge&lt;/td&gt;
&lt;td&gt;1x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;24 GiB&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;64 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;g5.12xlarge&lt;/td&gt;
&lt;td&gt;4x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;96 GiB&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;192 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;g5.24xlarge&lt;/td&gt;
&lt;td&gt;4x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;96 GiB&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;384 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;g5.48xlarge&lt;/td&gt;
&lt;td&gt;8x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;192 GiB&lt;/td&gt;
&lt;td&gt;192&lt;/td&gt;
&lt;td&gt;768 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p4d.24xlarge&lt;/td&gt;
&lt;td&gt;8x NVIDIA A100&lt;/td&gt;
&lt;td&gt;320 GiB&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;1,152 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="installing-openshift-ai"&gt;Installing OpenShift AI&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Clone the &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;rhoai-gitops&lt;/a&gt; repository:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git clone https://github.com/alvarolop/rhoai-gitops
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; rhoai-gitops
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Open the installation script and review the GPU-related configuration:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;vi auto-install.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The three parameters that matter most for GPU-enabled deployments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;CREATE_GPU_MACHINESETS&lt;/code&gt; (Line 9):&lt;/strong&gt; When set to &lt;code&gt;true&lt;/code&gt;, the script automatically creates &lt;em&gt;MachineSets&lt;/em&gt; for GPU nodes. Set to &lt;code&gt;false&lt;/code&gt; if you do not need GPU support initially.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;GPU_NODE_COUNT&lt;/code&gt; (Line 10):&lt;/strong&gt; Total number of GPU nodes to provision. The nodes are distributed across Availability Zones a, b, and c for resilience.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;AWS_GPU_INSTANCE&lt;/code&gt; (Line 18):&lt;/strong&gt; Defaults to &lt;code&gt;g5.4xlarge&lt;/code&gt;, which provides an NVIDIA A10G GPU per node. Adjust based on the workload requirements and available quota.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Throughout the following steps, any value written in &lt;code&gt;&amp;lt;angle brackets&amp;gt;&lt;/code&gt; is a placeholder and must be replaced with your actual value before running the command.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Log in to the OpenShift cluster:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc login -u &amp;lt;user_name&amp;gt; &amp;lt;cluster_api_url&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Run the installation script:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./auto-install.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The script installs the required operators — including the &lt;em&gt;OpenShift AI Operator&lt;/em&gt;, the &lt;em&gt;Node Feature Discovery Operator&lt;/em&gt;, and the &lt;em&gt;NVIDIA GPU Operator&lt;/em&gt; — and provisions GPU MachineSets if configured to do so. Depending on node provisioning times, the complete process takes 15 to 30 minutes.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Confirm that the GPU worker nodes have joined the cluster:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get machineset -n openshift-machine-api
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get machine -n openshift-machine-api
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get nodes
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="6"&gt;
&lt;li&gt;Verify that the NVIDIA driver is loaded and that the GPU is accessible:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc &lt;span class="nb"&gt;exec&lt;/span&gt; -it -n nvidia-gpu-operator &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;$(&lt;/span&gt;oc get pod -o wide -l openshift.driver-toolkit&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -o &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;{.items[0].metadata.name}&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -n nvidia-gpu-operator&lt;span class="k"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -- nvidia-smi
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure&gt;&lt;img src="/images/posts/post_21/nvidia_smi.png"data-src="/images/posts/post_21/nvidia_smi.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;nvidia-smi output confirming GPU access from within the NVIDIA GPU Operator pod&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Check the &lt;em&gt;Argo CD&lt;/em&gt; applications deployed as part of the GitOps installation:&lt;/li&gt;
&lt;/ol&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_21/argo_cd.png"data-src="/images/posts/post_21/argo_cd.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Argo CD application overview after the rhoai-gitops installation completes&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;All applications should be in a healthy and synced state before proceeding to configuration.&lt;/p&gt;
&lt;h2 id="configuring-openshift-ai-for-gpu-workloads"&gt;Configuring OpenShift AI for GPU Workloads&lt;/h2&gt;
&lt;p&gt;With OpenShift AI installed, a small amount of configuration is needed to allow workbenches to schedule onto the GPU nodes. GPU nodes in OpenShift are typically tainted with &lt;code&gt;nvidia.com/gpu:NoSchedule&lt;/code&gt; to prevent standard workloads from landing on them accidentally. Workbenches that need GPU access must be configured with a matching toleration.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check the taints applied to the GPU nodes:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get nodes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc describe node &amp;lt;gpu_node_name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The relevant taint will appear as &lt;code&gt;nvidia.com/gpu=:NoSchedule&lt;/code&gt; in the node description.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;
&lt;p&gt;In the OpenShift AI console, navigate to &lt;strong&gt;Settings &amp;gt; Hardware Profiles&lt;/strong&gt; and create a new profile (for example, &lt;code&gt;nvidia-gpu&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add a &lt;strong&gt;Toleration&lt;/strong&gt; with the following values:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;nvidia.com/gpu&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effect&lt;/td&gt;
&lt;td&gt;&lt;code&gt;NoSchedule&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operator&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Exists&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_21/toleration.png"data-src="/images/posts/post_21/toleration.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Configuring a toleration for the NVIDIA GPU taint in the Hardware Profile&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This toleration allows workbenches assigned to this profile to be scheduled onto GPU nodes while keeping those nodes unavailable to other workloads.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;
&lt;p&gt;Create a new workbench and select the &lt;code&gt;nvidia-gpu&lt;/code&gt; hardware profile. The workbench pod will be scheduled on a GPU node.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Once the workbench is running, open a terminal and confirm GPU access:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;nvidia-smi
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure&gt;&lt;img src="/images/posts/post_21/nvidia_smi_2.png"data-src="/images/posts/post_21/nvidia_smi_2.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;nvidia-smi output from inside an OpenShift AI workbench, confirming direct access to the NVIDIA A10G GPU&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For a complete reference on hardware profiles and toleration configuration, the &lt;a href="https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/managing_openshift_ai/managing-hardware-profiles"&gt;Red Hat OpenShift AI documentation&lt;/a&gt; covers the options in detail.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;rhoai-gitops&lt;/code&gt; repository makes the Red Hat OpenShift AI installation genuinely straightforward: one script handles the operator stack, the GPU node provisioning, and the GitOps wiring. The manual steps that remain — creating the hardware profile and configuring the workbench — are minimal and need to be done only once per cluster.&lt;/p&gt;
&lt;p&gt;The end result is an OpenShift AI environment with full GPU access, ready for running Jupyter notebooks, training jobs, or serving models. If you provisioned the underlying cluster using the approach described in &lt;a href="/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/"&gt;Deploying OpenShift on AWS with Automated Cluster Provisioning&lt;/a&gt;, the two repositories together cover the entire path from a blank AWS account to a working AI platform within a short timeframe of approximately two hours.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;rhoai-gitops - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ocp-on-aws - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Red Hat OpenShift AI - Managing Hardware Profiles - &lt;a href="https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/managing_openshift_ai/managing-hardware-profiles"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenShift AI - Product documentation - &lt;a href="https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenShift CLI (oc) - Getting started - &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;NVIDIA GPU Operator documentation - &lt;a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;AWS EC2 instance type availability by region - &lt;a href="https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-regions.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;AWS recommended GPU instances for deep learning - &lt;a href="https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;G5-Instances von Amazon EC2 - &lt;a href="https://aws.amazon.com/de/ec2/instance-types/g5/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Amazon-EC2-P4-Instances - &lt;a href="https://aws.amazon.com/de/ec2/instance-types/p4/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Deploying OpenShift on AWS with Automated Cluster Provisioning</title><link>/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/</link><pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate><guid>/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_20/overview.png"data-src="/images/posts/post_20/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;The full provisioning pipeline: CLI setup, ocp-on-aws config, and a single script that spins up VPCs, EC2 instances, DNS records, and an Argo CD baseline - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, I want to describe how to deploy &lt;strong&gt;Red Hat OpenShift&lt;/strong&gt; in a blank Amazon Web Services (AWS) environment using a fully automated and repeatable approach. This post is part of a series of two posts: 1. This post covers the cluster provisioning step. 2. The installation of OpenShift AI on top of the running OpenShift cluster is covered in a separate post: &lt;a href="/2026/installing-openshift-ai-on-openshift/"&gt;Install OpenShift AI on OpenShift&lt;/a&gt;. If you already have an OpenShift cluster available, feel free to jump straight to that post.
Both workflows build on two GitHub repositories that cover both infrastructure provisioning and the installation of the AI platform components, and they reduce what could easily be a multi-hour manual effort to a handful of shell commands.&lt;/p&gt;
&lt;p&gt;I should be upfront: one purpose of this post is also to serve as a personal reference for future me, who will inevitably return here after six months asking &amp;ldquo;wait, what was the exact command again?&amp;rdquo; Consider this the written documentation I should have filed away the first time.&lt;/p&gt;
&lt;p&gt;A special thanks goes to my team mate &lt;a href="https://github.com/alvarolop"&gt;&lt;strong&gt;Álvaro López Medina&lt;/strong&gt;&lt;/a&gt;, who created and maintains the &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;ocp-on-aws&lt;/a&gt; and &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;rhoai-gitops&lt;/a&gt; repositories. Without his work and support, setting up this environment would have been significantly more involved.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;Before starting, a Linux workstation or jump host is recommended for running the commands. The following command line tools must be installed and configured:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;&lt;strong&gt;OpenShift CLI (oc)&lt;/strong&gt;&lt;/a&gt; – required to interact with the OpenShift cluster&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html"&gt;&lt;strong&gt;AWS CLI&lt;/strong&gt;&lt;/a&gt; – required to provision and manage AWS infrastructure&lt;/li&gt;
&lt;li&gt;&lt;a href="https://httpd.apache.org/docs/current/programs/htpasswd.html"&gt;&lt;strong&gt;htpasswd&lt;/strong&gt;&lt;/a&gt; – required to generate user credentials for the cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are fundamental prerequisites. The installation scripts will fail or behave unexpectedly without them.&lt;/p&gt;
&lt;h2 id="ordering-an-aws-blank-environment"&gt;Ordering an AWS Blank Environment&lt;/h2&gt;
&lt;p&gt;For Red Hat employees and Red Hat partners, the easiest starting point is an &lt;a href="https://catalog.demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.sandbox-open.prod&amp;amp;utm_source=webapp&amp;amp;utm_medium=share-link"&gt;AWS Blank Open Environment&lt;/a&gt; from the &lt;a href="https://catalog.demo.redhat.com/catalog"&gt;Red Hat Demo Platform (RHDP)&lt;/a&gt;. Otherwise, an existing AWS account accessed through the &lt;a href="https://aws.amazon.com/"&gt;AWS Web Console&lt;/a&gt; works just as well.&lt;/p&gt;
&lt;p&gt;This tutorial was validated against eu-west-1. The blank environment provides a clean, ephemeral AWS account with the necessary IAM permissions and service quotas to support an &lt;em&gt;Installer-Provisioned Infrastructure (IPI)&lt;/em&gt; deployment of OpenShift.&lt;/p&gt;
&lt;p&gt;Once the environment is provisioned, the service overview page contains the AWS access credentials and the base DNS zone that will be needed in the configuration step below.&lt;/p&gt;
&lt;h2 id="deploying-openshift-on-aws"&gt;Deploying OpenShift on AWS&lt;/h2&gt;
&lt;p&gt;With the AWS environment in place, the &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;ocp-on-aws&lt;/a&gt; repository handles the rest of the cluster provisioning. The repository wraps the OpenShift IPI installer in a shell script and manages user creation, cluster-admin group configuration, and the pull secret in a structured, repeatable way.&lt;/p&gt;
&lt;h3 id="preparing-the-repository"&gt;Preparing the repository&lt;/h3&gt;
&lt;p&gt;Throughout the following steps, any value written in &lt;code&gt;&amp;lt;angle brackets&amp;gt;&lt;/code&gt; is a placeholder and must be replaced with your actual value before running the command.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clone the repository:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git clone https://github.com/alvarolop/ocp-on-aws
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ocp-on-aws
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Copy the authentication file templates:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cp auth/users.htpasswd.example auth/users.htpasswd
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cp auth/group-cluster-admins.yaml.example auth/group-cluster-admins.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Generate a password hash for your user:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;htpasswd -b -B auth/users.htpasswd &amp;lt;user_name&amp;gt; &amp;lt;password&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Adjust &lt;code&gt;auth/group-cluster-admins.yaml&lt;/code&gt; to list the users that should receive &lt;code&gt;cluster-admin&lt;/code&gt; privileges:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;user.openshift.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Group&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;cluster-admins&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;users&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;redhat&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;&amp;lt;user_name&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="configuring-the-installation"&gt;Configuring the installation&lt;/h3&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Copy the configuration template:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cp aws-ocp4-config aws-ocp4-config-labs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="6"&gt;
&lt;li&gt;Open the configuration file and adjust the following parameters:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;vi aws-ocp4-config-labs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The key values to review:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;OPENSHIFT_VERSION&lt;/code&gt; (Line 6):&lt;/strong&gt; Set this to match your local &lt;code&gt;oc&lt;/code&gt; client version for maximum compatibility.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;RHPDS_TOP_LEVEL_ROUTE53_DOMAIN&lt;/code&gt; (Line 9):&lt;/strong&gt; The base DNS zone for your cluster; find this in the RHDP service overview.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; (Lines 16–18):&lt;/strong&gt; The programmatic access credentials from the RHDP environment, required to create the VPC and EC2 instances.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;RHOCM_PULL_SECRET&lt;/code&gt; (Line 31):&lt;/strong&gt; Retrieve this from the &lt;a href="https://console.redhat.com/openshift/install/pull-secret"&gt;Hybrid Cloud Console&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;WORKER_REPLICAS&lt;/code&gt; (Line 47):&lt;/strong&gt; Set to the number of worker nodes required for your workload.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="running-the-installation"&gt;Running the installation&lt;/h3&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Start the cluster installation:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./aws-ocp4-install.sh aws-ocp4-config-labs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The script invokes the OpenShift IPI installer and creates all required AWS infrastructure: VPC, subnets, EC2 instances, Elastic Load Balancers, and Route53 DNS records. The process typically takes 30 to 45 minutes. It is worth monitoring the AWS console in the corresponding region during this time to observe the resources coming up.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_20/aws_console.png"data-src="/images/posts/post_20/aws_console.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;EC2 instances and load balancers provisioned in AWS after the installation completes&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Once the installer finishes, the cluster API and console URLs, along with the &lt;code&gt;kubeconfig&lt;/code&gt; file, will be available in the output and in the &lt;code&gt;auth/&lt;/code&gt; directory of the repository.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_20/argo_cd.png"data-src="/images/posts/post_20/argo_cd.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Argo CD applications deployed as part of the cluster bootstrap&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The installation script also bootstraps a set of &lt;em&gt;Argo CD&lt;/em&gt; applications that manage cluster-level configurations through GitOps from the start. This gives the cluster a solid, declarative baseline before any additional workloads are installed.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The combination of the AWS blank environment and the &lt;code&gt;ocp-on-aws&lt;/code&gt; repository makes it straightforward to spin up a fully functional OpenShift cluster in under an hour with minimal manual intervention. The IPI installer handles the infrastructure details, and the GitOps bootstrap ensures a consistent cluster configuration from the first login.&lt;/p&gt;
&lt;p&gt;With the cluster in place, the next step is installing OpenShift AI and enabling GPU support, which is covered in the follow-up post: &lt;a href="/2026/installing-openshift-ai-on-openshift/"&gt;Install OpenShift AI on OpenShift&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;ocp-on-aws - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;rhoai-gitops - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Red Hat Demo Platform - &lt;a href="https://catalog.demo.redhat.com/catalog"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenShift CLI - Getting started - &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;AWS CLI - Installation guide - &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;htpasswd - &lt;a href="https://httpd.apache.org/docs/current/programs/htpasswd.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Red Hat Hybrid Cloud Console - Pull Secret - &lt;a href="https://console.redhat.com/openshift/install/pull-secret"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Hermes Agent: A Personal AI That Gets More Useful Over Time</title><link>/2026/hermes-agent-a-personal-ai-that-gets-more-useful-over-time/</link><pubDate>Sat, 02 May 2026 00:00:00 +0000</pubDate><guid>/2026/hermes-agent-a-personal-ai-that-gets-more-useful-over-time/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_28/overview.png"data-src="/images/posts/post_28/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;How Hermes Agent Works: From Closed-Loop Learning to Multi-Platform Deployment - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I came across the &lt;a href="https://github.com/nousresearch/hermes-agent"&gt;&lt;em&gt;Hermes Agent project&lt;/em&gt;&lt;/a&gt; in early March 2026 and deployed it a couple of days later. A couple of weeks in I am still using it daily, and the use cases keep expanding rather than converging. Most tools settle into a narrow routine or fall off altogether. What keeps this one going is that the agent gets more useful the longer you run it. The project is young and moving fast, with new releases every few days. The initial setup requires patience: getting the configuration to a point where it actually saves time takes effort, and the frequent updates occasionally introduce breaking changes. That said, it is genuinely fun to use, and you learn a fair amount along the way.&lt;/p&gt;
&lt;p&gt;Hermes Agent is an open-source, self-hosted AI agent framework built by &lt;a href="https://nousresearch.com/"&gt;&lt;em&gt;Nous Research&lt;/em&gt;&lt;/a&gt;, an independent AI research lab based in New York. Nous Research is best known for the Hermes model family, a series of open-weight models fine-tuned on Llama that are used widely in the open-source AI community. The agent framework shares the name but is a separate project. It is MIT-licensed, model-agnostic, and runs on your own infrastructure, either as a self-hosted Python service or as a containerized deployment.&lt;/p&gt;
&lt;h2 id="how-it-works"&gt;How It Works&lt;/h2&gt;
&lt;p&gt;The part that makes Hermes Agent different from most agent frameworks is the skill system. The agent ships with a set of preconfigured skills covering common tasks. Beyond that, you can ask it to create a skill from something it just did: it writes a structured Markdown document capturing the approach, what worked, and describes possible edge cases. The next time a similar task appears, the agent loads the relevant skill rather than starting from scratch. Skills can be triggered directly by asking Hermes to run one, or set on a schedule and executed automatically at defined intervals. Over time this turns completed work into a growing library of reusable operating knowledge. Version v0.12.0 added an Autonomous Curator to keep that library from growing unwieldy. It runs on a seven-day cycle by default, grades skills by usage, consolidates overlapping ones, and removes those that have stopped being useful. A short report is written after each run, so you can see what changed and why.&lt;/p&gt;
&lt;p&gt;Alongside the skill system, the agent maintains three layers of memory: a persistent store for completed tasks and notes, a full-text search index across prior sessions, and a user model that accumulates preferences over time, coding style, communication tone, timezone, tools. The idea is that the agent gets more useful the longer you run it, not just better at individual tasks in isolation.&lt;/p&gt;
&lt;h2 id="my-setup"&gt;My Setup&lt;/h2&gt;
&lt;p&gt;Hermes Agent runs in my &lt;a href="/2026/my-homelab-a-traefik-centered-self-hosting-setup/"&gt;homelab&lt;/a&gt; as a service on a dedicated Linux host. Keeping it on a separate machine gives me direct control over what the agent has access to. Incoming traffic is routed through Traefik. I access it through three entry points depending on where I am and what I am doing. The primary interface is the &lt;a href="https://matrix.org/"&gt;&lt;em&gt;Matrix&lt;/em&gt;&lt;/a&gt; chat protocol, which means I can reach the agent from any Matrix client on any device. I also connected it to a dedicated email inbox, so it can handle certain tasks asynchronously. For longer sessions at my desk I use &lt;em&gt;Open WebUI&lt;/em&gt;, which gives a more comfortable interface for extended conversations.&lt;/p&gt;
&lt;p&gt;The model configuration is versatile: the agent supports various AI services and model providers.&lt;/p&gt;
&lt;h2 id="what-i-gave-it-access-to"&gt;What I Gave It Access To&lt;/h2&gt;
&lt;p&gt;I gave the agent access to three local knowledge sources: my bookmarks, a structured knowledge base, and a local mirror of Red Hat&amp;rsquo;s product documentation.&lt;/p&gt;
&lt;p&gt;The first is my bookmarks folder. I have been saving links as Markdown files in Obsidian for several years. The agent can search and cross-reference that collection when doing research, which means it draws on context I actually care about rather than training data alone.&lt;/p&gt;
&lt;p&gt;The second is a knowledge base built on the &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f"&gt;LLM Wiki&lt;/a&gt; principle described by Andrej Karpathy. The idea is to maintain a curated set of structured Markdown files that an AI agent helps write and update over time. Topics, entities, comparisons, each in its own file. The agent both contributes to this knowledge base and draws from it when working on research tasks.&lt;/p&gt;
&lt;p&gt;The third is a local mirror of Red Hat&amp;rsquo;s product documentation. A team mate built a tool called &lt;em&gt;rh-mastery&lt;/em&gt; that pulls documentation from &lt;em&gt;docs.redhat.com&lt;/em&gt;, converts it to Markdown, and stores it in a structured local directory. Pointed at that directory, Hermes can query accurate, version-tracked product documentation without touching the internet. For someone who spends a lot of time with Red Hat products, this closes a gap that is easy to overlook until you actually need it. More on rh-mastery in an upcomming post.&lt;/p&gt;
&lt;h2 id="practical-uses"&gt;Practical Uses&lt;/h2&gt;
&lt;p&gt;The combination of bookmarks, structured knowledge, Red Hat&amp;rsquo;s product documentation, and the skill system makes the agent genuinely useful for research. When I ask it to investigate a topic, it starts with what I have already collected: prior notes, bookmarks, and documentation. If that is not enough, and when asked, it reaches out to the web to fill the gaps. The result is something grounded in material I collected and curated myself, which makes the output in most cases very useful.&lt;/p&gt;
&lt;p&gt;One use I did not expect to find as useful: slide generation. I integrated &lt;em&gt;Marp&lt;/em&gt;, a Markdown-based presentation framework, into the workflow. When I need to put together a presentation and am staring at a blank file, I can ask the agent to draft an initial structure. Getting past that first empty screen is often the hardest part. Whether I keep most of what it produces is a different question, but having something to react to is worth more than nothing to start from.&lt;/p&gt;
&lt;h2 id="skills-and-subagents"&gt;Skills and Subagents&lt;/h2&gt;
&lt;p&gt;The agent can develop and add skills on its own as it works, but skills can also be added manually or loaded from the community hub at &lt;a href="https://agentskills.io"&gt;agentskills.io&lt;/a&gt;. More interesting to me is the subagent capability: the agent can delegate tasks to specialized subagents, each backed by a specific AI service or holding a particular context. This makes it possible to compose workflows where different parts of a task go to the most appropriate model.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Several weeks in is not a long track record, and the project is still moving fast enough that some things will break between releases. That said, the architecture is sound and the development pace is truly impressive. Whether I will keep running it long-term, I genuinely do not know. For now, it is pulling its weight. For anyone already running a homelab and looking for a self-hosted agent that gets more useful over time rather than staying flat, Hermes Agent is worth the setup time.&lt;/p&gt;
&lt;p&gt;Peter Steinberger, the creator of &lt;em&gt;OpenClaw&lt;/em&gt;, another widely-used AI agent framework, put it well in a recent &lt;a href="https://www.youtube.com/watch?v=7rzYDM6vMtI"&gt;TED talk&lt;/a&gt;: &amp;ldquo;The bottleneck is no longer typing. It&amp;rsquo;s thinking.&amp;rdquo; That observation fits. The agent handles the mechanical parts of research and structuring. The judgment about what matters and what to do with it still has to come from someone. For now, a human in the loop is still necessary.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Hermes Agent on GitHub - &lt;a href="https://github.com/nousresearch/hermes-agent"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hermes Agent Documentation - &lt;a href="https://hermes-agent.nousresearch.com/docs/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Nous Research - &lt;a href="https://nousresearch.com/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Matrix - &lt;a href="https://matrix.org/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenRouter - &lt;a href="https://openrouter.ai/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Andrej Karpathy LLM Wiki concept - &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Marp - &lt;a href="https://marp.app/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;agentskills.io - &lt;a href="https://agentskills.io/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Peter Steinberger TED talk - &lt;a href="https://www.youtube.com/watch?v=7rzYDM6vMtI"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Extending the Local AI Stack with On-Demand GPU Inference on RunPod</title><link>/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/</link><pubDate>Sat, 07 Mar 2026 00:00:00 +0000</pubDate><guid>/2026/extending-the-local-ai-stack-with-on-demand-gpu-inference-on-runpod/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_24/overview.png"data-src="/images/posts/post_24/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Conceptual illustration of the extended AI stack with elastic cloud GPU resources for running large language models on demand - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, I want to describe how I extended the local AI stack I built in my homelab with on-demand GPU-backed model inference, without adding any GPU hardware to the lab itself.&lt;/p&gt;
&lt;p&gt;The two previous posts in this series provide the context for what follows. The &lt;a href="/2026/my-homelab-a-traefik-centered-self-hosting-setup/"&gt;homelab post&lt;/a&gt; covers the base infrastructure: thin clients, Docker Compose, Traefik, and internal DNS. The &lt;a href="/2026/my-local-ai-stack-open-webui-litellm-searxng-and-docling/"&gt;local AI stack post&lt;/a&gt; describes how &lt;em&gt;Open WebUI&lt;/em&gt;, &lt;em&gt;LiteLLM&lt;/em&gt;, &lt;em&gt;SearXNG&lt;/em&gt;, and &lt;em&gt;Docling&lt;/em&gt; sit on top of that infrastructure to form a self-hosted AI environment. That stack works well, and I have been using it for a while. Keeping the lab CPU-only is a deliberate choice. For orchestration, document workflows, and routing requests to publicly available AI services, dedicated GPU hardware at home is simply not necessary. When I want to try a particular model that is not available through a managed API, or experiment with something freshly released on Hugging Face, I rent the compute on demand rather than maintain it permanently.&lt;/p&gt;
&lt;p&gt;The solution is straightforward: rent GPU capacity on demand from a specialized cloud provider, expose it as an OpenAI-compatible endpoint, and wire it into the existing stack. No new hardware, no permanent cost, no changes to the tools I already use.&lt;/p&gt;
&lt;h2 id="a-note-on-neo-clouds"&gt;A Note on Neo Clouds&lt;/h2&gt;
&lt;p&gt;The providers that specialize in this type of GPU-first infrastructure are sometimes called &lt;em&gt;Neo Clouds&lt;/em&gt;. The term emerged around 2024 to distinguish GPU-specialist vendors such as RunPod, CoreWeave and others from traditional hyperscalers. In practice, I am not sure the new term adds much. For me these are specialized cloud providers focused on GPU compute and AI workloads. Useful services, somewhat unnecessary branding.&lt;/p&gt;
&lt;h2 id="why-runpod"&gt;Why RunPod&lt;/h2&gt;
&lt;p&gt;I use &lt;a href="https://www.runpod.io/"&gt;RunPod&lt;/a&gt; for this setup for a few practical reasons. The interface is intuitive, the deployment path from template to running pod is short, and the GPU catalog is broad enough to cover most use cases. Pricing is per second with no ingress or egress fees, which makes on-demand experimentation economical. RunPod also exposes an API for its core operations, so deployments can be automated rather than driven entirely through the UI.&lt;/p&gt;
&lt;p&gt;A detailed description of all RunPod services is out of scope for this post. The focus here is on one specific workflow: deploying a &lt;em&gt;vLLM&lt;/em&gt; inference server with a model loaded from &lt;em&gt;Hugging Face&lt;/em&gt;, and connecting the resulting endpoint to Open WebUI.&lt;/p&gt;
&lt;h2 id="deploying-a-vllm-inference-server-on-runpod"&gt;Deploying a vLLM Inference Server on RunPod&lt;/h2&gt;
&lt;p&gt;RunPod uses templates to save pod configurations for reuse. A template defines the container image, the start command, the storage allocation, and other runtime parameters. I maintain a small collection of private templates, each configured for a different model.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_24/list_of_private_templates.png"data-src="/images/posts/post_24/list_of_private_templates.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;A selection of saved vLLM templates on RunPod, each using to a different model from Hugging Face&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The container image for all of these templates is &lt;code&gt;vllm/vllm-openai:latest&lt;/code&gt;, which bundles &lt;em&gt;vLLM&lt;/em&gt; with an OpenAI-compatible API server. The model itself is specified in the container start command, which means swapping models is a matter of editing a single line.&lt;/p&gt;
&lt;h2 id="creating-a-template"&gt;Creating a Template&lt;/h2&gt;
&lt;p&gt;When creating or editing a template, the key fields are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Type:&lt;/strong&gt; Pod&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute type:&lt;/strong&gt; Nvidia GPU&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Container image:&lt;/strong&gt; &lt;code&gt;vllm/vllm-openai:latest&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Container start command:&lt;/strong&gt; the vLLM arguments, including the model reference&lt;/li&gt;
&lt;/ul&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_24/vllm_start_cmd.png"data-src="/images/posts/post_24/vllm_start_cmd.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Template configuration for the vllm_gemma-3-12b template, showing the container image and start command&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Throughout the following steps, any value written in &lt;code&gt;&amp;lt;angle brackets&amp;gt;&lt;/code&gt; is a placeholder and must be replaced with your actual value before running the command.&lt;/p&gt;
&lt;p&gt;A start command for deploying the Red Hat&amp;rsquo;s validated &lt;code&gt;RedHatAI/Qwen3-8B-FP8-dynamic&lt;/code&gt; model looks like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--host 0.0.0.0 --port &lt;span class="m"&gt;8000&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --model RedHatAI/Qwen3-8B-FP8-dynamic &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --dtype bfloat16 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --enforce-eager &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --gpu-memory-utilization 0.95 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --api-key &amp;lt;api_key&amp;gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --max-model-len &lt;span class="m"&gt;8128&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The parameters worth noting:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--model&lt;/code&gt;&lt;/strong&gt;: any model available on Hugging Face can be referenced here by its repository path&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--dtype bfloat16&lt;/code&gt;&lt;/strong&gt;: sets the compute dtype; &lt;code&gt;bfloat16&lt;/code&gt; is a good default for inference on NVIDIA hardware&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--enforce-eager&lt;/code&gt;&lt;/strong&gt;: disables CUDA graph capture, which reduces memory overhead at the cost of some throughput; useful when fitting larger models on a single GPU&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--gpu-memory-utilization 0.95&lt;/code&gt;&lt;/strong&gt;: allows vLLM to use up to 95% of available GPU memory for the KV cache&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--api-key&lt;/code&gt;&lt;/strong&gt;: sets a bearer token for the OpenAI-compatible endpoint; always set this when deploying a public endpoint&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--max-model-len&lt;/code&gt;&lt;/strong&gt;: caps the maximum sequence length; reducing this frees memory and allows larger models to fit on smaller GPUs&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="selecting-a-gpu-and-deploying"&gt;Selecting a GPU and Deploying&lt;/h2&gt;
&lt;p&gt;Once the template is configured, deploying it requires selecting a GPU and clicking deploy. RunPod shows available hardware with current pricing.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_24/gpu_selection.png"data-src="/images/posts/post_24/gpu_selection.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;GPU selection on RunPod, ranging from *RTX 2000 Ada* class cards to *H200* and *B200* datacenter accelerators&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For most inference workloads with 8 to 12 billion parameter models, an RTX 4090 or L4 is a practical and cost-effective choice. Larger models with higher memory requirements will need 48 GB or 80 GB class cards. The per-hour pricing shown in the interface makes it easy to estimate cost for a session before committing.&lt;/p&gt;
&lt;p&gt;After deployment, RunPod assigns a public HTTPS endpoint to the pod. The vLLM server is reachable at that endpoint on port 8000, with the path structure matching the OpenAI API.&lt;/p&gt;
&lt;h2 id="connecting-the-endpoint-to-open-webui"&gt;Connecting the Endpoint to Open WebUI&lt;/h2&gt;
&lt;p&gt;With the pod running and the model loaded, the endpoint can be added to Open WebUI as an external connection. In Open WebUI, navigate to &lt;strong&gt;Admin Panel&lt;/strong&gt; then &lt;strong&gt;Settings&lt;/strong&gt; and add a new connection with the following values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Connection type:&lt;/strong&gt; External&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;URL:&lt;/strong&gt; &lt;code&gt;https://&amp;lt;runpod_endpoint&amp;gt;/v1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Auth:&lt;/strong&gt; API key set in the vLLM start command&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Provider type:&lt;/strong&gt; OpenAI&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;API type:&lt;/strong&gt; Chat Completions&lt;/li&gt;
&lt;/ul&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_24/open_webui_configuration.png"data-src="/images/posts/post_24/open_webui_configuration.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Adding the RunPod vLLM endpoint as an external OpenAI-compatible connection in Open WebUI&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Once saved, the model served by vLLM on RunPod appears in the model selector alongside any other configured backends. From a user perspective, the interface is identical to any other configured model, whether local or a commercial API.&lt;/p&gt;
&lt;p&gt;Alternatively, the endpoint can be added to LiteLLM as a named model alias. This is the better option if you want centralized credential management or want to expose the RunPod model alongside other backends under a consistent naming scheme across the stack.&lt;/p&gt;
&lt;h2 id="why-this-setup-works-well"&gt;Why This Setup Works Well&lt;/h2&gt;
&lt;p&gt;The combination of a self-hosted orchestration stack and on-demand GPU inference fits well with a homelab where tooling and workflows are in place but on-premises compute is intentionally kept lean.&lt;/p&gt;
&lt;p&gt;A few things make this pattern practical:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Low cost for experimentation.&lt;/strong&gt; Models run only when needed. A session of an hour or two to test a new model costs a few dollars at most.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Access to current models.&lt;/strong&gt; Many of the recently published models available on Hugging Face can be loaded into vLLM, which means it is straightforward to test recently released models without waiting for them to appear in a managed API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No changes to the existing stack.&lt;/strong&gt; Open WebUI, LiteLLM, SearXNG, and Docling continue to work exactly as before. The RunPod endpoint is just another backend.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automatable.&lt;/strong&gt; RunPod exposes an API for managing pods, so deployments can be triggered programmatically. Combined with LiteLLM&amp;rsquo;s routing, it becomes possible to bring a model endpoint up on demand and tear it down again when it is no longer needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Adding RunPod as an on-demand GPU backend closes the main gap in a CPU-only homelab AI stack. The setup requires no changes to the existing infrastructure and takes only a few minutes from template to running endpoint. The result is the ability to experiment with current, capable models at low cost, using the same interface and workflows already in place.&lt;/p&gt;
&lt;p&gt;For on-demand model access that does not warrant the cost of persistent GPU hardware, this pattern is worth considering.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;My Homelab: A Traefik-centered Self-hosting Setup - &lt;a href="/2026/my-homelab-a-traefik-centered-self-hosting-setup/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;My Local AI Stack: Open WebUI, LiteLLM, SearXNG, and Docling - &lt;a href="/2026/my-local-ai-stack-open-webui-litellm-searxng-and-docling/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;RunPod - project site - &lt;a href="https://www.runpod.io/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;RunPod - documentation - &lt;a href="https://docs.runpod.io/overview"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;vLLM - project site - &lt;a href="https://docs.vllm.ai/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hugging Face - model hub - &lt;a href="https://huggingface.co/models"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;RedHatAI models on Hugging Face - &lt;a href="https://huggingface.co/RedHatAI"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>My Local AI Stack: Open WebUI, LiteLLM, SearXNG, and Docling</title><link>/2026/my-local-ai-stack-open-webui-litellm-searxng-and-docling/</link><pubDate>Sat, 14 Feb 2026 00:00:00 +0000</pubDate><guid>/2026/my-local-ai-stack-open-webui-litellm-searxng-and-docling/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_19/overview.png"data-src="/images/posts/post_19/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Overview of the modular self-hosted AI stack - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In my previous post about my &lt;a href="/2026/my-homelab-a-traefik-centered-self-hosting-setup/"&gt;homelab&lt;/a&gt;, I described the foundation I use for self-hosted services: a small set of low-power machines, Docker Compose for deployment, Traefik as the reverse proxy, and internal DNS to expose services with clean HTTPS hostnames. I have been running this setup for several years with very little maintenance overhead. That setup turned out to be a good base not only for classic self-hosting, but also for local AI workloads. Over the past two year or so, I started extending it with tools to use and experiment with AI services.&lt;/p&gt;
&lt;p&gt;Over time, I wanted more than a single chat UI connected to a single model provider. I wanted a setup that would let me experiment with different models, keep sensitive data inside my own network, enrich prompts with live web results, and work with local documents in a structured way. I also wanted to reuse the same operational patterns I already trusted in the rest of the homelab.&lt;/p&gt;
&lt;p&gt;The result is a local AI stack built from four components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Open WebUI as the browser-based user interface&lt;/li&gt;
&lt;li&gt;LiteLLM as the OpenAI-compatible model gateway&lt;/li&gt;
&lt;li&gt;SearXNG as the privacy-friendly web search backend&lt;/li&gt;
&lt;li&gt;Docling as the document parsing layer for file-based workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Individually, each of these tools is useful. Combined, they form a practical self-hosted AI environment that fits neatly into the same Traefik-centered architecture as the rest of my homelab.&lt;/p&gt;
&lt;h2 id="base-platform-and-prerequisites"&gt;Base platform and prerequisites&lt;/h2&gt;
&lt;p&gt;The AI stack runs on the same infrastructure described in the &lt;a href="/2026/my-homelab-a-traefik-centered-self-hosting-setup/"&gt;previous post&lt;/a&gt;: refurbished thin clients running CentOS Stream 9, Docker and Docker Compose, Traefik as the reverse proxy, and internal DNS for clean HTTPS hostnames. The key design principle carries over as well: every externally reachable service joins the &lt;code&gt;external&lt;/code&gt; Docker network and is exposed through Traefik using labels, giving a consistent way to publish services under HTTPS without managing ports or certificates per application.&lt;/p&gt;
&lt;p&gt;My current setup is CPU-only. That matters. It is perfectly usable for orchestration, document processing, and web-augmented prompting, but it is not the right environment for large, latency-sensitive inference workloads. In practice, that constraint pushed me toward an architecture where the user interface, routing, tools, and document workflows run locally, while the model backend remains flexible enough to use either local or remote providers.&lt;/p&gt;
&lt;h2 id="architecture-overview"&gt;Architecture overview&lt;/h2&gt;
&lt;p&gt;At a high level, the request flow looks like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A user opens Open WebUI in the browser.&lt;/li&gt;
&lt;li&gt;Open WebUI sends model requests to LiteLLM through its OpenAI-compatible API.&lt;/li&gt;
&lt;li&gt;LiteLLM routes the request to the selected backend model.&lt;/li&gt;
&lt;li&gt;If a prompt requires live information, Open WebUI can use SearXNG as a search tool.&lt;/li&gt;
&lt;li&gt;If a prompt requires document context, uploaded files are parsed with Docling and converted into Markdown.&lt;/li&gt;
&lt;li&gt;The model response is returned to Open WebUI and displayed to the user.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This separation of concerns is what makes the stack useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Open WebUI handles the human interaction layer&lt;/li&gt;
&lt;li&gt;LiteLLM abstracts model backends and credentials&lt;/li&gt;
&lt;li&gt;SearXNG provides fresh web context&lt;/li&gt;
&lt;li&gt;Docling turns messy source documents into structured text&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Traefik remains the single public entry point. From an operations perspective, that is valuable because the AI stack behaves like any other part of the homelab.&lt;/p&gt;
&lt;h2 id="open-webui-as-the-central-interface"&gt;Open WebUI as the central interface&lt;/h2&gt;
&lt;p&gt;Open WebUI is the part of the stack I interact with every day. It provides the browser-based interface for conversations, model selection, file uploads, and tool-assisted prompting. The important point is that Open WebUI does not need to know anything about individual model providers. It only needs a single OpenAI-compatible endpoint, which in this setup is LiteLLM.&lt;/p&gt;
&lt;p&gt;That keeps the client configuration simple. If I want to add a new provider, swap one model for another, or change credentials, I do it behind the scenes in LiteLLM without having to reconfigure the user interface. Open WebUI also supports user and group management, making it straightforward to grant access to specific models or restrict certain users to a defined set of backends. A particularly useful feature is the ability to send a single prompt to multiple AI services simultaneously, which makes side-by-side model comparison a natural part of the workflow.&lt;/p&gt;
&lt;p&gt;A simplified Docker Compose service definition for Open WebUI in this setup looks like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;open-webui&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;ghcr.io/open-webui/open-webui:main&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;container_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;open-webui&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;restart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;unless-stopped&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;OPENAI_API_BASE_URL=http://litellm:4000/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;OPENAI_API_KEY=${LITELLM_MASTER_KEY}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;./data/open-webui:/app/backend/data&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;external&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;internal&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.enable=true&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.docker.network=external&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.openwebui.rule=Host(`ai.home.example.com`)&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.openwebui.entrypoints=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.openwebui.tls.certresolver=cloudflare&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.services.openwebui.loadbalancer.server.port=8080&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The exact image tag and environment variables may differ depending on the release and your setup, but the pattern stays the same: persistent storage for state, Traefik labels for routing, and a backend API endpoint that points to LiteLLM.&lt;/p&gt;
&lt;h2 id="litellm-as-the-model-gateway"&gt;LiteLLM as the model gateway&lt;/h2&gt;
&lt;p&gt;LiteLLM is the glue that makes the rest of the system flexible. It exposes a single OpenAI-style API while allowing multiple backends underneath. That means I can define logical model names and map them to either local inference backends or remote providers.&lt;/p&gt;
&lt;p&gt;This is useful for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Open WebUI only has to speak to few API endpoints&lt;/li&gt;
&lt;li&gt;I can standardize naming across models&lt;/li&gt;
&lt;li&gt;Provider credentials stay centralized&lt;/li&gt;
&lt;li&gt;Swapping backends becomes operationally cheap&lt;/li&gt;
&lt;li&gt;Logging and usage controls are easier to centralize&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Compose service definition for LiteLLM follows the same pattern:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;litellm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;litellm/litellm:main-v1.83.14-stable.patch.3&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;container_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;litellm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;restart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;unless-stopped&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;--config&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;/app/config.yaml&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;--port&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;4000&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;OPENAI_API_KEY=${OPENAI_API_KEY}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;./litellm/config.yaml:/app/config.yaml:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;internal&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;external&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.enable=true&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.docker.network=external&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.litellm.rule=Host(`litellm.home.example.com`)&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.litellm.entrypoints=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.litellm.tls.certresolver=cloudflare&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.services.litellm.loadbalancer.server.port=4000&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;style type="text/css"&gt;.notice{--root-color:#444;--root-background:#eff;--title-color:#fff;--title-background:#7bd;--warning-title:#c33;--warning-content:#fee;--info-title:#fb7;--info-content:#fec;--note-title:#6be;--note-content:#e7f2fa;--tip-title:#5a5;--tip-content:#efe}@media (prefers-color-scheme:dark){.notice{--root-color:#ddd;--root-background:#eff;--title-color:#fff;--title-background:#7bd;--warning-title:#800;--warning-content:#400;--info-title:#a50;--info-content:#420;--note-title:#069;--note-content:#023;--tip-title:#363;--tip-content:#121}}body.dark .notice{--root-color:#ddd;--root-background:#eff;--title-color:#fff;--title-background:#7bd;--warning-title:#800;--warning-content:#400;--info-title:#a50;--info-content:#420;--note-title:#069;--note-content:#023;--tip-title:#363;--tip-content:#121}.notice{line-height:24px;margin-bottom:24px;border-radius:4px;color:var(--root-color);background:var(--root-background)}.notice p:last-child{margin-bottom:0; padding: .5rem 1.2rem 1rem;}.notice-title{margin:-18px -18px 12px;padding:4px 18px;border-radius:4px 4px 0 0;font-weight:700;color:var(--title-color);background:var(--title-background)}.notice.warning .notice-title{background:var(--warning-title)}.notice.warning{background:var(--warning-content)}.notice.info .notice-title{background:var(--info-title)}.notice.info{background:var(--info-content)}.notice.note .notice-title{background:var(--note-title)}.notice.note{background:var(--note-content)}.notice.tip .notice-title{background:var(--tip-title)}.notice.tip{background:var(--tip-content)}.icon-notice{display:inline-flex;align-self:center;margin-right:8px}.icon-notice img,.icon-notice svg{height:1em;width:1em;fill:currentColor}.icon-notice img,.icon-notice.baseline svg{top:.125em;position:relative}&lt;/style&gt;
&lt;div&gt;&lt;svg width="0" height="0" display="none" xmlns="http://www.w3.org/2000/svg"&gt;&lt;symbol id="tip-notice" viewBox="0 0 512 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M504 256c0 136.967-111.033 248-248 248S8 392.967 8 256 119.033 8 256 8s248 111.033 248 248zM227.314 387.314l184-184c6.248-6.248 6.248-16.379 0-22.627l-22.627-22.627c-6.248-6.249-16.379-6.249-22.628 0L216 308.118l-70.059-70.059c-6.248-6.248-16.379-6.248-22.628 0l-22.627 22.627c-6.248 6.248-6.248 16.379 0 22.627l104 104c6.249 6.249 16.379 6.249 22.628.001z"/&gt;&lt;/symbol&gt;&lt;symbol id="note-notice" viewBox="0 0 512 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M504 256c0 136.997-111.043 248-248 248S8 392.997 8 256C8 119.083 119.043 8 256 8s248 111.083 248 248zm-248 50c-25.405 0-46 20.595-46 46s20.595 46 46 46 46-20.595 46-46-20.595-46-46-46zm-43.673-165.346l7.418 136c.347 6.364 5.609 11.346 11.982 11.346h48.546c6.373 0 11.635-4.982 11.982-11.346l7.418-136c.375-6.874-5.098-12.654-11.982-12.654h-63.383c-6.884 0-12.356 5.78-11.981 12.654z"/&gt;&lt;/symbol&gt;&lt;symbol id="warning-notice" viewBox="0 0 576 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M569.517 440.013C587.975 472.007 564.806 512 527.94 512H48.054c-36.937 0-59.999-40.055-41.577-71.987L246.423 23.985c18.467-32.009 64.72-31.951 83.154 0l239.94 416.028zM288 354c-25.405 0-46 20.595-46 46s20.595 46 46 46 46-20.595 46-46-20.595-46-46-46zm-43.673-165.346l7.418 136c.347 6.364 5.609 11.346 11.982 11.346h48.546c6.373 0 11.635-4.982 11.982-11.346l7.418-136c.375-6.874-5.098-12.654-11.982-12.654h-63.383c-6.884 0-12.356 5.78-11.981 12.654z"/&gt;&lt;/symbol&gt;&lt;symbol id="info-notice" viewBox="0 0 512 512" preserveAspectRatio="xMidYMid meet"&gt;&lt;path d="M256 8C119.043 8 8 119.083 8 256c0 136.997 111.043 248 248 248s248-111.003 248-248C504 119.083 392.957 8 256 8zm0 110c23.196 0 42 18.804 42 42s-18.804 42-42 42-42-18.804-42-42 18.804-42 42-42zm56 254c0 6.627-5.373 12-12 12h-88c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h12v-64h-12c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h64c6.627 0 12 5.373 12 12v100h12c6.627 0 12 5.373 12 12v24z"/&gt;&lt;/symbol&gt;&lt;/svg&gt;&lt;/div&gt;&lt;div class="notice warning" &gt;
&lt;p class="first notice-title"&gt;&lt;span class="icon-notice baseline"&gt;&lt;svg&gt;&lt;use href="#warning-notice"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/span&gt;Warning&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Security note:&lt;/strong&gt;&lt;br&gt;
In March 2026, LiteLLM was subject to a suspected supply chain attack in which versions v1.82.7 and v1.82.8 on PyPI contained a malicious payload designed to harvest credentials and exfiltrate them to an external domain. Users running the official LiteLLM Docker image were not affected, as that deployment path pins dependencies and does not rely on the compromised PyPI packages. If you installed LiteLLM via &lt;code&gt;pip&lt;/code&gt; during the affected window, treat any secrets on that system as compromised and rotate them immediately. See the official incident report for full details and verified safe versions.&lt;/p&gt;&lt;/div&gt;
&lt;h2 id="searxng-for-live-privacy-friendly-search"&gt;SearXNG for live, privacy-friendly search&lt;/h2&gt;
&lt;p&gt;One of the biggest limitations of a plain chat interface is the lack of current information. SearXNG solves that problem cleanly. It is a self-hosted metasearch engine that aggregates results from multiple sources and gives me a search API under my own control.&lt;/p&gt;
&lt;p&gt;Even outside the AI stack, SearXNG is useful as a search engine. Inside the stack, it becomes more interesting because it can be exposed as a tool for prompts that need fresh information.&lt;/p&gt;
&lt;p&gt;A minimal Compose service might look like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;searxng&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;docker.io/searxng/searxng:latest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;container_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;searxng&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;restart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;unless-stopped&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;./searxng:/etc/searxng&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;external&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.enable=true&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.docker.network=external&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.searxng.rule=Host(`search.home.example.com`)&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.searxng.entrypoints=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.searxng.tls.certresolver=cloudflare&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.services.searxng.loadbalancer.server.port=8080&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once connected to Open WebUI as a tool, the flow is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The user asks a question that requires current information.&lt;/li&gt;
&lt;li&gt;The model decides to call the search tool.&lt;/li&gt;
&lt;li&gt;SearXNG performs the search.&lt;/li&gt;
&lt;li&gt;Titles, snippets, and URLs are returned as context.&lt;/li&gt;
&lt;li&gt;The model synthesizes an answer grounded in current results.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="docling-for-document-parsing"&gt;Docling for document parsing&lt;/h2&gt;
&lt;p&gt;The fourth component, Docling, addresses a different problem. Large language models work best with clean text, but many real documents are messy. PDFs, slide decks, and office files often contain broken text flows, layout artifacts, or table structures that are not useful when passed to a model as-is.&lt;/p&gt;
&lt;p&gt;Docling converts these documents into a Markdown representation that is much easier to use as model context. That sounds small, but it is a major quality improvement for local document workflows.&lt;/p&gt;
&lt;p&gt;The Docling service definition is straightforward:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;docling&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;quay.io/docling-project/docling-serve:latest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;container_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;docling&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;restart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;unless-stopped&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;internal&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;external&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.enable=true&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.docker.network=external&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.docling.rule=Host(`docling.home.example.com`)&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.docling.entrypoints=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.docling.tls.certresolver=cloudflare&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.services.docling.loadbalancer.server.port=5001&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The typical usage pattern is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Upload a document in Open WebUI.&lt;/li&gt;
&lt;li&gt;Docling parses the file and converts it to Markdown.&lt;/li&gt;
&lt;li&gt;Feed that Markdown into the model as structured prompt context.&lt;/li&gt;
&lt;li&gt;Ask targeted questions against the extracted content.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is especially useful for technical notes, whitepapers, internal PDFs, or vendor documentation where the raw file format is not suitable for direct prompting.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This stack did not start as an attempt to build a local alternative to a commercial AI product. It emerged naturally from an existing homelab that already had strong building blocks: containerized services, Traefik, DNS-based routing, and a bias toward self-hosting.&lt;/p&gt;
&lt;p&gt;Adding Open WebUI, LiteLLM, SearXNG, and Docling turned that base into a practical local AI environment. It gives me a single interface for model interaction, the ability to swap backends without changing clients, a way to enrich prompts with live web data, and a better workflow for document-driven tasks.&lt;/p&gt;
&lt;p&gt;Just as important, it stays operationally consistent with the rest of the homelab. That keeps the setup understandable, maintainable, and worth using day to day.&lt;/p&gt;
&lt;p&gt;Future extensions are obvious: adding a vector database, introducing GPU-backed local inference, routing requests to model endpoints running on specialized inference platforms, or using Open WebUI as a gateway to interact with AI agents. But even without those additions, this combination already covers a large share of the AI workflows I actually care about.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;My Homelab: A Traefik-centered Self-hosting Setup - &lt;a href="/2026/my-homelab-a-traefik-centered-self-hosting-setup/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open WebUI - project site - &lt;a href="https://openwebui.com/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open WebUI - GitHub - &lt;a href="https://github.com/open-webui/open-webui"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;LiteLLM - project site - &lt;a href="https://www.litellm.ai/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;LiteLLM - GitHub - &lt;a href="https://github.com/BerriAI/litellm"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;LiteLLM - Security incident report, March 2026 - &lt;a href="https://docs.litellm.ai/blog/security-update-march-2026"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;SearXNG - documentation - &lt;a href="https://docs.searxng.org/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;SearXNG - GitHub - &lt;a href="https://github.com/searxng/searxng"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Docling - documentation - &lt;a href="https://docling-project.github.io/docling/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Docling - GitHub - &lt;a href="https://github.com/docling-project/docling"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>My Homelab: A Traefik-centered Self-hosting Setup</title><link>/2026/my-homelab-a-traefik-centered-self-hosting-setup/</link><pubDate>Sat, 24 Jan 2026 00:00:00 +0000</pubDate><guid>/2026/my-homelab-a-traefik-centered-self-hosting-setup/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/homelab.png"data-src="/images/posts/homelab.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Summary of Homelab services - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Several years ago, I began building a small homelab with two primary objectives in mind: gaining hands-on experience with containers and modern application deployment, and running selected services locally to avoid storing certain data in public cloud environments. In hindsight, this environment evolved into a solid foundation for a local AI stack as well, which I now operate alongside the rest of my setup and will detail in a future post. Although the focus here is on a homelab, the technical stack described can be deployed just as easily in any cloud environment, e.g. a VPS or or any hyperscaler, all that is required is a virtual machine running a Linux distribution of your choice and a container engine.&lt;/p&gt;
&lt;p&gt;What began as an experiment has turned into a stable setup that I use every day. At the center of this setup is Traefik, which handles all incoming HTTP and HTTPS traffic and lets me access every service over SSL with clean domains like &lt;em&gt;service-name.home.example.com&lt;/em&gt; instead of a collection of raw IP addresses and ports.&lt;/p&gt;
&lt;p&gt;In this post I will walk through how I structure this homelab, explain how Traefik ties everything together, and outline a selection of the services currently running in my lab.&lt;/p&gt;
&lt;h2 id="hardware-and-base-platform"&gt;Hardware and base platform&lt;/h2&gt;
&lt;p&gt;The homelab does not run on high-end servers. Most of the hosts are refurbished x86 thin clients with the following specifications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;16 to 32 GB of RAM per node&lt;/li&gt;
&lt;li&gt;A modest amount of storage for container images, configuration files, and selected data&lt;/li&gt;
&lt;li&gt;Low power consumption, which is important for a system that runs 24/7&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The environment uses CentOS Stream 9 as the operating system. On top of that, I run Docker and Docker Compose. Nearly every component in the homelab is containerized, with Traefik positioned in front of these containers as a reverse proxy and routing layer.&lt;/p&gt;
&lt;h2 id="architecture-overview"&gt;Architecture overview&lt;/h2&gt;
&lt;p&gt;At a high level, the architecture looks like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several containers run on the hosts&lt;/li&gt;
&lt;li&gt;A dedicated container network called &lt;code&gt;external&lt;/code&gt;, where Traefik and all services that are exposed to the home network reside&lt;/li&gt;
&lt;li&gt;An internal DNS setup and a private domain, such as &lt;code&gt;home.example.com&lt;/code&gt;, where services are exposed as subdomains like:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://pihole.home.example.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://ntfy.home.example.com&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Clients on the home network resolve these hostnames to the internal IP address of the homelab host, ensuring that traffic remains entirely within the local network. The local DNS server is automatically assigned to clients connected to the internal network, making all services immediately accessible to any device on the same network.&lt;br&gt;
Traefik acts as the single entry point for HTTP and HTTPS. It terminates TLS, routes requests to the appropriate container based on the hostname, and applies middlewares such as redirects and authentication where required.&lt;/p&gt;
&lt;h2 id="traefik-as-the-center-of-the-homelab"&gt;Traefik as the center of the homelab&lt;/h2&gt;
&lt;p&gt;Traefik is an open-source reverse proxy and edge router that integrates well with containerized environments. It monitors the container socket, automatically discovers running containers, and uses labels defined on those containers to configure routing.&lt;/p&gt;
&lt;p&gt;In my setup, Traefik provides three main benefits:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Automatic TLS for everything&lt;br&gt;
Traefik uses the DNS challenge with my DNS provider to request certificates from Let’s Encrypt. I can issue a wildcard certificate for &lt;code&gt;*.home.example.com&lt;/code&gt;, so every internal service gets proper HTTPS without having to manage individual certificates.&lt;/li&gt;
&lt;li&gt;Clean hostnames instead of ports&lt;br&gt;
Every service gets its own subdomain, such as &lt;code&gt;pihole.home.example.com&lt;/code&gt; or &lt;code&gt;ntfy.home.example.com&lt;/code&gt;. This means I do not have to remember that one service is on port 8080, another on 9090, and so on.&lt;/li&gt;
&lt;li&gt;Centralized routing and security&lt;br&gt;
Since everything goes through Traefik, I can:
&lt;ul&gt;
&lt;li&gt;Redirect all HTTP traffic to HTTPS&lt;/li&gt;
&lt;li&gt;Protect specific endpoints with basic auth or other middleware&lt;/li&gt;
&lt;li&gt;Inspect and debug routes using the Traefik dashboard&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="traefik-docker-compose-configuration"&gt;Traefik Docker Compose configuration&lt;/h2&gt;
&lt;p&gt;Here is a simplified version of the Traefik &lt;code&gt;docker-compose.yml&lt;/code&gt; I use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-YAML" data-lang="YAML"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;3&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;traefik&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;traefik:latest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;container_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;traefik&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;restart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;unless-stopped&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;security_opt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="kc"&gt;no&lt;/span&gt;-&lt;span class="l"&gt;new-privileges:true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;external&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;ports&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="m"&gt;80&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="m"&gt;80&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="m"&gt;443&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="m"&gt;443&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;CF_API_EMAIL=${CF_API_EMAIL}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;CF_DNS_API_TOKEN=${CF_DNS_API_TOKEN}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;/etc/localtime:/etc/localtime:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;/var/run/docker.sock:/var/run/docker.sock:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;./data/traefik.yml:/traefik.yml:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;./data/acme.json:/acme.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;./data/config.yml:/config.yml:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.enable=true&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# HTTP router for Traefik dashboard&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik.entrypoints=http&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik.rule=Host(`traefik.home.example.com`)&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# Redirect HTTP to HTTPS&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik.middlewares=traefik-https-redirect&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# Basic auth for the secure dashboard&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.middlewares.traefik-auth.basicauth.users=user:hashed-password&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# HTTPS router for Traefik dashboard&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.entrypoints=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.rule=Host(`traefik.home.example.com`)&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.middlewares=traefik-auth&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.tls=true&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.tls.certresolver=cloudflare&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.tls.domains[0].main=home.example.com&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.tls.domains[0].sans=*.home.example.com&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.traefik-secure.service=api@internal&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;external&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;external&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The important ideas are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Traefik listens on ports 80 and 443 and is connected to the &lt;code&gt;external&lt;/code&gt; network.&lt;/li&gt;
&lt;li&gt;It uses environment variables to access the DNS provider so it can request certificates from Let’s Encrypt.&lt;/li&gt;
&lt;li&gt;The dashboard is exposed at &lt;code&gt;https://traefik.home.example.com&lt;/code&gt;, protected by basic auth.&lt;/li&gt;
&lt;li&gt;The TLS configuration issues a wildcard certificate for &lt;code&gt;*.home.example.com&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other services join the same &lt;code&gt;external&lt;/code&gt; network and define their own labels, for example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-YAML" data-lang="YAML"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;ntfy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;binwiederhier/ntfy&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;external&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.enable=true&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.ntfy.entrypoints=https&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.ntfy.rule=Host(`ntfy.home.example.com`)&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="s2"&gt;&amp;#34;traefik.http.routers.ntfy.tls.certresolver=cloudflare&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With this pattern, every service becomes available over HTTPS under its own subdomain without additional manual configuration in Traefik.&lt;/p&gt;
&lt;h2 id="core-services-in-my-homelab"&gt;Core services in my homelab&lt;/h2&gt;
&lt;p&gt;On top of Traefik, I run a set of core services that provide DNS, monitoring, automation, messaging, logging, and secrets management. The key components are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pi-hole – DNS:&lt;/strong&gt; Provides network-wide DNS resolution and ad-blocking, and handles internal DNS for homelab hostnames such as &lt;code&gt;*.home.example.com&lt;/code&gt;. Blocking unwanted domains for devices on the network.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mafl – Dashboard:&lt;/strong&gt; A minimalistic and flexible homepage for organizing service links, grouping categories, and providing quick navigation. Mafl can perform health checks on linked services, is configured through a simple YAML file, and offers a Progressive Web App for mobile devices. Since each service sits behind Traefik with its own hostname, Mafl serves as a curated entry point to the entire environment.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ntfy – Messaging / Pub-Sub:&lt;/strong&gt; A lightweight HTTP-based publish/subscribe notification service used for event-driven messaging across the environment. Typical use cases include sending alerts when backups complete and receiving notifications when containers restart unexpectedly. Ntfy provides mobile and desktop apps, allowing access from phones and laptops both inside and outside the home network, depending on firewall and VPN settings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Doozle – Container Logs:&lt;/strong&gt; A simple web-based UI for viewing Docker logs in real time. Logs are accessible through a browser, it is possible to filter by container, and tail logs as they update. This is particularly useful when testing new services or debugging automation workflows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Beszel – Resource Monitoring:&lt;/strong&gt; A lightweight monitoring tool for tracking system metrics and container statistics across multiple machines. It provides CPU, memory, and disk usage insights, making it easy to identify overloaded or misbehaving nodes and maintain visibility into the health of thin clients and other devices.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Uptime Kuma – Service Monitoring:&lt;/strong&gt; A dashboard for monitoring the availability of both internal and external services. It checks defined endpoints, as well as public websites and APIs. If a service becomes unreachable, Uptime Kuma sends alerts, e.g. via Ntfy or other services, providing an early warning system for issues in the homelab.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;n8n – Automation Engine:&lt;/strong&gt; A workflow automation platform used to orchestrate tasks, trigger scripts or containers, and integrate events across services. Typical use cases include reacting to webhooks or scheduled triggers, executing scripts or container actions, and sending notifications through Ntfy when certain conditions are met. Instead of implementing automation logic in custom code, workflows can be modeled visually and integrated directly with containers and external services.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vaultwarden – Secrets Management:&lt;/strong&gt; A self-hosted Bitwarden-compatible server for securely managing passwords and sensitive information within the homelab. It stores credentials and secrets for services and accounts, enables secure sharing across devices.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;What began as a simple playground for learning containers and avoiding public cloud services for certain use cases has evolved into a practical, resilient platform for running everyday services at home. Centering the setup around Traefik, standardizing on containerized services, and using a wildcard domain with automated TLS have kept the architecture both manageable and extensible. The use of modest, low-power refurbished thin clients has also proven effective in keeping costs and energy consumption low while still offering sufficient resources.&lt;/p&gt;
&lt;p&gt;Over time, the homelab has also turned out to be a solid foundation for hosting local AI services, content of a future post. Depending on the criticality of individual services and one’s tolerance for risk, it can be worthwhile to distribute components across independent hosts, monitor services across nodes, or run certain workloads in parallel for redundancy. It is equally important to think carefully about backups to avoid losing data or configurations during failures or experiments. That said, this remains a homelab project rather than a production environment governed by strict service-level agreements; temporary outages are acceptable, and part of the experimentation process.&lt;/p&gt;
&lt;p&gt;With these principles such as simple routing, consistent domains and TLS, lightweight hardware, and containerized services, one can build a flexible environment that supports DNS, monitoring, automation, messaging, secrets management, and more, tailored to one&amp;rsquo;s own needs.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CentOS Stream - &lt;a href="https://www.centos.org/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Traefik - reverse proxy - &lt;a href="https://github.com/traefik/traefik"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Pi-hole - network-wide ad blocking and DNS - &lt;a href="https://github.com/pi-hole/pi-hole"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Mafl - dashboard for homelab services - &lt;a href="https://github.com/hywax/mafl"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ntfy - publish/subscribe push notifications - &lt;a href="https://github.com/binwiederhier/ntfy"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Doozle - web based interface to monitor logs - &lt;a href="https://github.com/amir20/dozzle"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Beszel - resource monitoring for multiple clients - &lt;a href="https://github.com/henrygd/beszel"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Uptime Kuma - monitoring tool - &lt;a href="https://github.com/louislam/uptime-kuma"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;n8n - workflow automation - &lt;a href="https://github.com/n8n-io/n8n"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Vaultwarden - Bitwarden-compatible server - &lt;a href="https://github.com/dani-garcia/vaultwarden"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Youtube Video - Techno Tim: Put Wildcard Certificates and SSL on EVERYTHING - &lt;a href="https://www.youtube.com/watch?v=liV3c9m_OX8"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Building Better Ideas: Leveraging Lego Serious Play</title><link>/2025/building-better-ideas-leveraging-lego-serious-play/</link><pubDate>Sat, 20 Dec 2025 00:00:00 +0000</pubDate><guid>/2025/building-better-ideas-leveraging-lego-serious-play/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_17/lsp.jpg"data-src="/images/posts/post_17/lsp.jpg"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;From Complexity to Clarity: Building Shared Understanding - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction-from-scepticism-to-breakthroughs"&gt;Introduction: From Scepticism to Breakthroughs&lt;/h2&gt;
&lt;p&gt;The first time I introduced &lt;strong&gt;LEGO® SERIOUS PLAY® (LSP)&lt;/strong&gt; in an IT strategy workshop, I was met with polite smiles and raised eyebrows.&lt;br&gt;
“LEGO and enterprise IT? Really?”&lt;/p&gt;
&lt;p&gt;But 15 minutes later, the energy in the room had shifted completely. Participants were leaning forward, hands busy building, voices animated. Skepticism turned into curiosity, and then into creativity and collaboration.&lt;/p&gt;
&lt;p&gt;Last year, I took the step to become a &lt;strong&gt;Certified Facilitator for the LEGO® SERIOUS PLAY® Method and Materials&lt;/strong&gt;. Since then, I’ve facilitated several workshops with customers, partners, and colleagues. One pattern stands out every time: once people allow themselves to engage, the method unlocks ideas fast and brings complex topics into focus.&lt;/p&gt;
&lt;p&gt;LSP is not a playful gimmick. It’s a structured facilitation method that helps teams surface hidden knowledge, build shared understanding, and make strategic decisions with clarity.&lt;/p&gt;
&lt;h2 id="why-lego-serious-play-works-in-it"&gt;Why LEGO SERIOUS PLAY Works in IT&lt;/h2&gt;
&lt;p&gt;LSP is powerful because it combines &lt;strong&gt;hands-on building&lt;/strong&gt;, &lt;strong&gt;metaphorical thinking&lt;/strong&gt;, and &lt;strong&gt;structured facilitation&lt;/strong&gt;. In IT, where teams deal with complexity, silos, and competing perspectives, this combination is uniquely effective:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Hand-Brain Principle:&lt;/strong&gt;&lt;br&gt;
Building with your hands activates parts of the brain often left idle in typical meetings. When people build a model to express an idea, they speak more openly, they’re more creative, and they explain their thinking with more depth than slides or spreadsheets ever could.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Radical Simplification:&lt;/strong&gt;&lt;br&gt;
IT systems and strategies are complex by nature. LSP forces participants to distill ideas into their essence. The LEGO models don’t make the problem simpler—they make it understandable, visible, and discussable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A Universal Language:&lt;/strong&gt;&lt;br&gt;
LEGO bricks cut through jargon and hierarchy. Whether it’s a solution architect, a product manager, or an operations engineer, everyone can build and everyone can contribute. It levels the playing field and gives each voice equal weight.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="how-the-process-works"&gt;How the Process Works&lt;/h2&gt;
&lt;p&gt;LSP isn’t free play—it’s a facilitated, structured method. As a facilitator, I guide participants through a clear process:&lt;/p&gt;
&lt;h3 id="core-steps-of-the-lego-serious-play-process"&gt;Core Steps of the LEGO SERIOUS PLAY Process&lt;/h3&gt;
&lt;p&gt;The LSP method follows a structured process that ensures all participants are actively engaged and can contribute meaningfully. Each step builds on the previous one, moving from individual ideas to a shared understanding of complex systems.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Skill Building – Getting Comfortable with the Medium:&lt;/strong&gt;&lt;br&gt;
Every session begins with a warm-up phase designed to help participants get familiar with the materials and the idea of building metaphors.&lt;br&gt;
Through simple, low-stakes exercises—such as building a tower to symbolize resilience or a bridge to represent connection—participants learn to translate abstract concepts into tangible models. &lt;br&gt;
This step lowers inhibitions, builds trust in the method, and gives everyone a shared visual language. Once participants accept and internalize this new way of communicating, the dynamic in the room shifts: conversations become more open, focused, and surprisingly productive.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Setting the Challenge – Framing the Right Question:&lt;/strong&gt;&lt;br&gt;
Next, the facilitator introduces a focused, meaningful question—such as &lt;em&gt;“What’s blocking us from scaling this platform?”&lt;/em&gt; or &lt;em&gt;“What should our future IT landscape look like?”&lt;/em&gt;.&lt;br&gt;
This prompt defines the scope and direction of the session and ensures everyone builds toward a shared objective.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Build → Share → Reflect – Unlocking Insights:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Build:&lt;/strong&gt; Each participant constructs a model that represents their perspective or answer to the challenge.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Share:&lt;/strong&gt; Every person tells the story behind their model, ensuring equal voice and deeper understanding.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reflect:&lt;/strong&gt; As a group, participants identify patterns, contradictions, gaps, and opportunities.
This structured storytelling cycle drives richer conversations and helps surface insights that often remain hidden in traditional workshop formats.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;From Individual to Shared Model – Creating Shared Understanding:&lt;/strong&gt;&lt;br&gt;
Once the team is confident and engaged, individual models are combined into a shared model that reflects the collective view.&lt;br&gt;
This step often exposes interdependencies, tensions, and opportunities that no single perspective could have revealed on its own. It’s where alignment, clarity, and actionable strategy start to take shape.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Example from practice:&lt;/strong&gt;.
In a recent architecture strategy workshop, one participant used a single red brick to symbolize a “single point of failure.” That simple metaphor shifted the discussion from vague risk statements to a concrete redesign strategy. This kind of clarity is hard to achieve just with slides.&lt;/p&gt;
&lt;h2 id="where-lego-serious-play-creates-real-value-in-it"&gt;Where LEGO SERIOUS PLAY Creates Real Value in IT&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Strategic Planning and Architecture:&lt;/strong&gt;&lt;br&gt;
When building roadmaps for complex IT transformations, LSP makes hidden assumptions visible. Business and technical perspectives align more quickly because everyone can literally see the future state in front of them.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Breaking Down Silos:&lt;/strong&gt;&lt;br&gt;
IT organizations often suffer from fragmented communication. LSP gives everyone a seat at the table. Equal speaking time ensures even quieter voices are heard—often surfacing crucial insights.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Solution Design and RFPs:&lt;/strong&gt;&lt;br&gt;
When responding to complex solution requirements, LSP allows teams to quickly prototype, test ideas, and align on the best approach. It accelerates clarity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Defining OKRs (Objectives and Key Results):&lt;/strong&gt;&lt;br&gt;
Instead of vague PowerPoint bullets, participants build tangible representations of goals, key results, and dependencies. The visual, tactile nature of these models makes alignment much more concrete.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="conclusion---a-facilitators-perspective"&gt;Conclusion - A Facilitator’s Perspective&lt;/h2&gt;
&lt;p&gt;What I find most fascinating as a facilitator is the moment of collective “aha”, when a group that began the session with crossed arms and quiet skepticism suddenly leans in. Once the first models are on the table, the conversation accelerates: people start building, storytelling, and connecting dots in ways that traditional workshops rarely achieve.&lt;/p&gt;
&lt;p&gt;More than once, I’ve heard participants say, &lt;em&gt;“I didn’t think this would work for us—but now we actually see the problem and the solution.”&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In IT, where complexity and competing perspectives are the daily reality, these moments of shared clarity are game-changing.&lt;/p&gt;
&lt;p&gt;To summarize: LSP blends the &lt;strong&gt;hand-brain principle&lt;/strong&gt;, &lt;strong&gt;radical simplification&lt;/strong&gt;, and a &lt;strong&gt;universal medium&lt;/strong&gt; into a structured process that turns abstract concepts into tangible shared understanding. &lt;br&gt;
It helps teams make assumptions visible, unlock hidden knowledge, align on what truly matters, and move forward with confidence. It levels the playing field, fosters equal participation, and turns passive discussions into active co-creation.&lt;/p&gt;
&lt;p&gt;And that skeptical start I mentioned at the beginning? It’s now one of my favorite moments—because I know what comes next.&lt;/p&gt;
&lt;p&gt;If you’re curious how LSP could help your team tackle complex IT challenges, &lt;a href="https://red.ht/meet_stephan"&gt;get in touch&lt;/a&gt; or connect with me on LinkedIn.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Per Kristiansen &amp;amp; Robert Rasmussen: Building a Better Business Using the Lego Serious Play Method - Wiley, 2014&lt;/li&gt;
&lt;li&gt;David Hillmer: PLAY! Der unverzichtbare LEGO® SERIOUS PLAY® Praxis-Guide für Workshops, Coachings und Moderation - Hanser, 2023&lt;/li&gt;
&lt;li&gt;Hello Agile - Academy and Consultancy &lt;a href="https://www.helloagile.de/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Compound Simulation – Exploring Portfolio Uncertainty</title><link>/2025/compound-simulation-exploring-portfolio-uncertainty/</link><pubDate>Sat, 22 Nov 2025 00:00:00 +0000</pubDate><guid>/2025/compound-simulation-exploring-portfolio-uncertainty/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Financial planning is often built on a deterministic story: &lt;em&gt;“If I invest X € each month at 5 % per year, I’ll have Y € in 20 years.”&lt;/em&gt; But real markets are anything but deterministic. Price fluctuations, volatility, and unexpected shocks can significantly change outcomes.&lt;/p&gt;
&lt;p&gt;This new tool builds on the foundation of the &lt;a href="https://michard.io/2025/compound-simulation-exploring-portfolio-uncertainty/"&gt;Compound Interest Calculator&lt;/a&gt;, which takes a deterministic view of capital growth. This new tool introduces a probabilistic perspective by using Monte Carlo simulation to explore a spectrum of possible portfolio trajectories based on the users assumptions. Instead of a single projected curve, it generates a fan chart that visualizes uncertainty bands, the likelihood of reaching specific targets, and how sensitive outcomes are to your savings rate.&lt;/p&gt;
&lt;p&gt;This is not a crystal ball. It’s a scenario explorer — a way to understand how uncertain markets shape financial trajectories.&lt;/p&gt;
&lt;p&gt;You can try the web tool here:&lt;br&gt;
&lt;a href="https://compound-simulation.michard.io/" target="_blank" rel="noreferrer" class="download"&gt;
&lt;svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" fill="currentcolor" class="clip" width="13" height="13" style="vertical-align: middle; margin-right: .3rem;"&gt;
&lt;path d="M352 0c-12.9 0-24.6 7.8-29.6 19.8s-2.2 25.7 6.9 34.9L370.7 96 201.4 265.4c-12.5 12.5-12.5 32.8 0 45.3s32.8 12.5 45.3 0L416 141.3l41.4 41.4c9.2 9.2 22.9 11.9 34.9 6.9s19.8-16.6 19.8-29.6V32c0-17.7-14.3-32-32-32H352zM80 32C35.8 32 0 67.8 0 112V432c0 44.2 35.8 80 80 80H400c44.2 0 80-35.8 80-80V320c0-17.7-14.3-32-32-32s-32 14.3-32 32V432c0 8.8-7.2 16-16 16H80c-8.8 0-16-7.2-16-16V112c0-8.8 7.2-16 16-16H192c17.7 0 32-14.3 32-32s-14.3-32-32-32H80z"&gt;&lt;/path&gt;
&lt;/svg&gt;Open the Compound Simulation Tool&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-the-tool-does"&gt;What the Tool Does&lt;/h2&gt;
&lt;p&gt;The simulation is based on a small set of input parameters — initial capital, monthly contributions, expected return (μ), volatility (σ), investment horizon, and optionally a target value.&lt;/p&gt;
&lt;p&gt;Using these assumptions, the app runs multiple simulation paths and provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fan chart of portfolio trajectories – median, expected path, and uncertainty bands (percentiles).&lt;/li&gt;
&lt;li&gt;Distribution of end values – showing the spread of possible outcomes at the horizon.&lt;/li&gt;
&lt;li&gt;Target probability – the likelihood of reaching (or exceeding) your goal.&lt;/li&gt;
&lt;li&gt;Stress test – a downside scenario with halved returns and doubled volatility.&lt;/li&gt;
&lt;li&gt;Savings elasticity – the effect on median outcomes from marginally increasing monthly contributions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shifts the focus from a single deterministic projection to a probabilistic view of potential futures.&lt;/p&gt;
&lt;h2 id="how-to-use-it-online"&gt;How to Use It Online&lt;/h2&gt;
&lt;p&gt;Running the hosted app is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Open the simulation tool.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enter your core parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Initial Capital [€]&lt;/li&gt;
&lt;li&gt;Monthly Savings [€]&lt;/li&gt;
&lt;li&gt;Annual Return μ&lt;/li&gt;
&lt;li&gt;Volatility σ&lt;/li&gt;
&lt;li&gt;Time Horizon (years)&lt;/li&gt;
&lt;li&gt;(Optional) Define a target and target date.&lt;/li&gt;
&lt;li&gt;(Optional) Enable the Stress Test to explore adverse scenarios.&lt;/li&gt;
&lt;li&gt;(Optional) Add a Savings Elasticity Increment (e.g. +€50/month) to assess sensitivity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The output includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A fan chart showing uncertainty over time.&lt;/li&gt;
&lt;li&gt;A distribution histogram of end values.&lt;/li&gt;
&lt;li&gt;A target probability indicator.&lt;/li&gt;
&lt;li&gt;A sensitivity summary for additional contributions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="run-locally"&gt;Run Locally&lt;/h2&gt;
&lt;p&gt;If you want to host or modify the simulation app yourself:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clone the repository&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git clone https://github.com/smichard/compound_simulation
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Navigate to the project directory&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; compound_simulation
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Build the container image
Run the following command to build an image named compound_simulation_app (or choose any name you prefer):&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman build -t compound_simulation_app -f Containerfile .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This command uses the provided Containerfile to set up the environment, including all required R packages for running the Shiny app.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Start the app locally&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman run --rm -p 3838:3838 compound_simulation_app
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This launches a container and maps port 3838 inside the container to the same port on your host system.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Access the app in your browser&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;http://localhost:3838/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You should now see the Compound Simulation app running locally.&lt;/p&gt;
&lt;h2 id="why-this-matters"&gt;Why This Matters&lt;/h2&gt;
&lt;p&gt;Uncertainty is real — any deterministic projection hides the range of plausible outcomes. Markets fluctuate, assumptions shift, and unexpected events can have a lasting impact. Probabilistic thinking helps make better decisions by accounting for both upside and downside scenarios instead of focusing on a single expected path.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Goal probability provides a tangible measure: “What are the chances I’ll reach €X by year Y?”&lt;/li&gt;
&lt;li&gt;Savings elasticity reveals whether increasing contributions might be more effective than simply chasing higher returns.&lt;/li&gt;
&lt;li&gt;For investors, educators, or anyone exploring financial planning under uncertainty, this tool complements the &lt;a href="https://michard.io/2025/compound-simulation-exploring-portfolio-uncertainty/"&gt;Compound Interest Calculator&lt;/a&gt; by adding a probabilistic layer to previously deterministic projections.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Compound Simulation brings uncertainty to the forefront. By combining Monte Carlo simulation, sensitivity analysis, and clear visualizations, it highlights that financial projections aren’t fixed—they’re distributions. The tool helps explore not only expected growth but also the range of potential outcomes and their probabilities.&lt;/p&gt;
&lt;p&gt;It can be used as a teaching aid, a scenario testing environment, or a personal planning companion. And since it’s open source, you can easily adapt it to your own assumptions, risk parameters, or visualization preferences.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Related Post - Compound Interest Calculator - &lt;a href="https://michard.io/2025/compound-simulation-exploring-portfolio-uncertainty/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Web App - &lt;a href="https://compound-simulation.michard.io/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub Repository - &lt;a href="https://github.com/smichard/compound_simulation"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Compound Interest Calculator – Visualizing Capital Growth</title><link>/2025/compound-interest-calculator-visualizing-capital-growth/</link><pubDate>Sat, 25 Oct 2025 00:00:00 +0000</pubDate><guid>/2025/compound-interest-calculator-visualizing-capital-growth/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Understanding how capital develops over time is a cornerstone of financial planning. While compound interest formulas are straightforward on paper, the interplay between savings rate, interest, and time is often less intuitive. To address this, I built the Compound Interest Calculator – a Shiny app that visualizes how capital grows based on different input parameters.&lt;/p&gt;
&lt;p&gt;The tool illustrates not only the raw numbers but also the dynamics of savings and interest accumulation. It allows you to model different scenarios, compare strategies, and identify milestones such as when your savings generate more returns than your yearly contributions.&lt;/p&gt;
&lt;p&gt;The idea for this tool was sparked after watching a &lt;a href="https://www.youtube.com/watch?v=F3Q-1W4QEVI"&gt;YouTube video&lt;/a&gt; that explains why the first €100,000 is such a critical milestone in building wealth. There are many excellent videos and articles that explore this concept in depth. But to make it truly tangible — and to experiment interactively with savings rates, interest assumptions, and time horizons — I decided to build a tool of my own. The result: a simple, hands-on way to see compound interest in action and explore how various strategies may impact the growth of your capital over time.&lt;/p&gt;
&lt;p&gt;You can try the web tool here:&lt;br&gt;
&lt;a href="https://compound-calculator.michard.io" target="_blank" rel="noreferrer" class="download"&gt;
&lt;svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" fill="currentcolor" class="clip" width="13" height="13" style="vertical-align: middle; margin-right: .3rem;"&gt;
&lt;path d="M352 0c-12.9 0-24.6 7.8-29.6 19.8s-2.2 25.7 6.9 34.9L370.7 96 201.4 265.4c-12.5 12.5-12.5 32.8 0 45.3s32.8 12.5 45.3 0L416 141.3l41.4 41.4c9.2 9.2 22.9 11.9 34.9 6.9s19.8-16.6 19.8-29.6V32c0-17.7-14.3-32-32-32H352zM80 32C35.8 32 0 67.8 0 112V432c0 44.2 35.8 80 80 80H400c44.2 0 80-35.8 80-80V320c0-17.7-14.3-32-32-32s-32 14.3-32 32V432c0 8.8-7.2 16-16 16H80c-8.8 0-16-7.2-16-16V112c0-8.8 7.2-16 16-16H192c17.7 0 32-14.3 32-32s-14.3-32-32-32H80z"&gt;&lt;/path&gt;
&lt;/svg&gt; Open the Compound Interest Calculator &lt;/a&gt;&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;The easiest way to use the calculator is online (see above).&lt;br&gt;
If you want to run it locally, you’ll need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An environment capable of running containers, e.g. Podman, or&lt;/li&gt;
&lt;li&gt;R with Shiny installed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="getting-started"&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;The online version is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open the Compound Interest Calculator.&lt;/li&gt;
&lt;li&gt;Enter your investment parameters, e.g. start year, savings rate, interest rate, investment period.&lt;/li&gt;
&lt;li&gt;Click Calculate and explore the generated charts and tables.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="input-parameters"&gt;Input Parameters&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Start Year:&lt;/strong&gt; The year when the investment begins.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Initial Capital:&lt;/strong&gt; The amount of money you start with.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings Rate:&lt;/strong&gt; The amount of money you plan to save regularly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings Interval:&lt;/strong&gt; The frequency at which you save the specified savings rate (either monthly or yearly).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Investment Period:&lt;/strong&gt; The total number of years you plan to invest.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Interest Rate:&lt;/strong&gt; The annual interest rate (as a percentage) that your capital will earn.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adjustment Rate:&lt;/strong&gt; The annual rate (as a percentage) at which your savings rate will increase.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings Suspension:&lt;/strong&gt; The number of years after which you plan to stop saving money.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Target Value:&lt;/strong&gt; A specific capital value you aim to achieve. The app will indicate when (or if) this value is reached.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="generated-diagrams"&gt;Generated Diagrams&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Overview:&lt;/strong&gt; Shows the growth of accumulated savings and total capital over time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Distribution:&lt;/strong&gt; Displays a pie chart showing the distribution between total savings and total interest earned.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings Rate:&lt;/strong&gt; Represents the annual savings rate in relation to the value of the generated interest each year. This visualization illustrates the development of both the savings rate and the generated interest over time. Additionally, it highlights the year when the generated interest surpasses the annual savings rate.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Normalized Values:&lt;/strong&gt; Displays the values of the savings rate and generated interests, both normalized based on the annual growth comprised of the savings rate and yearly interests. This provides a clearer perspective on how each component contributes to the overall growth each year.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Goals:&lt;/strong&gt; Displays the development of total capital and highlights specific milestones, such as when the capital doubles from the initial investment. It also indicates when the user-defined target value is achieved.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Values:&lt;/strong&gt; A table that provides a detailed breakdown of the capital at the beginning of the year, savings amount per year, generated interest per year, and capital at the end of the year.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="run-locally"&gt;Run Locally&lt;/h2&gt;
&lt;p&gt;If you want to host the calculator yourself:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clone this repository&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git clone https://github.com/smichard/compound_interest_calculator.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Navigate to the project directory:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; compound_interest_calculator
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Build the container image:
Run the following command to build a Docker image. Replace &lt;code&gt;my_app&lt;/code&gt; with a name of your choice for the image.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman build -t my_app -f Containerfile
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This command will use the provided Containerfile to build an image named &lt;code&gt;my_app&lt;/code&gt;. The process will install the necessary R packages and set up the environment for the Shiny app.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Run the Shiny app locally:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After building the image, you can run the Shiny app locally using the following command:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman run --rm -p 3838:3838 my_app
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This command will start a container from the &lt;code&gt;my_app&lt;/code&gt; image and map port 3838 of the container to port 3838 of your local machine.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Access the Shiny app in a browser
Open a web browser and navigate to:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;http://localhost:3838/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You should now see your Shiny app running!&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The Compound Interest Calculator helps bridge the gap between abstract formulas and practical insights. It turns the often-theoretical concept of compound growth into something tangible and interactive. By visualizing how capital evolves over time, it allows users to experiment with different savings rates, investment horizons, and interest assumptions — and to see immediately how these variables influence the trajectory of their capital.&lt;/p&gt;
&lt;p&gt;Whether used for personal financial planning, educational purposes, or illustrating investment concepts, the tool provides a clear and structured way to explore “what-if” scenarios. It highlights key inflection points — such as when generated interest surpasses annual savings — making the dynamics of compounding easier to grasp and communicate.&lt;/p&gt;
&lt;p&gt;Ultimately, the calculator is designed to make complex relationships between time, capital, and interest transparent, empowering users to make more informed, data-driven decisions about their long-term financial strategies.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;YouTube Video - Nischa: Why Net Worth Skyrockets After $100K - &lt;a href="https://www.youtube.com/watch?v=F3Q-1W4QEVI"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Web App - &lt;a href="https://compound-calculator.michard.io/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub Repository - &lt;a href="https://github.com/smichard/compound_interest_calculator"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Learning, Building, Growing: My Red Hat Journey So Far</title><link>/2025/learning-building-growing-my-red-hat-journey-so-far/</link><pubDate>Thu, 02 Oct 2025 00:00:00 +0000</pubDate><guid>/2025/learning-building-growing-my-red-hat-journey-so-far/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;On October 1st, I was promoted to Associate Principal Solution Architect at Red Hat. This milestone marks not just a new title, but also an opportunity to reflect on the journey of the past two years.&lt;/p&gt;
&lt;h2 id="reflections"&gt;Reflections&lt;/h2&gt;
&lt;p&gt;Time has passed remarkably quickly since I joined Red Hat in mid-2023. In that period, I’ve had the privilege to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Work with highly skilled and deeply technical colleagues across Europe.&lt;/li&gt;
&lt;li&gt;Engage in diverse and challenging projects, many of which operate at the forefront of technology.&lt;/li&gt;
&lt;li&gt;Learn about, conduct POCs, and deliver workshops in the exciting and fast-moving field of AI.&lt;/li&gt;
&lt;li&gt;Deepen my own expertise by pursuing and completing several Red Hat certifications.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It has been a journey of constant learning, collaboration, and growth. There were moments of trial and error, but each step forward brought valuable insights.&lt;/p&gt;
&lt;h2 id="looking-ahead"&gt;Looking Ahead&lt;/h2&gt;
&lt;p&gt;This promotion is not a finish line but rather a signal to run a bit faster, continue learning, and contribute more. I am grateful to my mentor, my manager, and my colleagues at Red Hat who have guided, challenged, and supported me throughout this journey.&lt;/p&gt;
&lt;p&gt;The path ahead is dynamic, and I look forward to building further, learning more, and — hopefully — still keeping a smile on the way.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_14/image.png"data-src="/images/posts/post_14/image.png"
/&gt;
&lt;/figure&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;LinkedIn Post - &lt;a href="https://www.linkedin.com/posts/stephanmichard_redhat-levelup-continuouslearning-activity-7379556031388884992-kWWV?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAAAtEiQcBEy8d8vSnm8NBZWZ0faicZji_MK0"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Remembering Prof. Dr. Hans-Joachim Queisser</title><link>/2025/remembering-prof.-dr.-hans-joachim-queisser/</link><pubDate>Thu, 10 Jul 2025 00:00:00 +0000</pubDate><guid>/2025/remembering-prof.-dr.-hans-joachim-queisser/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_13/hans_queisser.jpg"data-src="/images/posts/post_13/hans_queisser.jpg"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Hans-Joachim Queisser&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Sometimes, you meet someone without knowing the lasting impact they’ll have on your life.&lt;/p&gt;
&lt;p&gt;I had the privilege of meeting Prof. Queisser during my time at the University of New South Wales, where I was working as a student researcher at the ARC Photovoltaics Centre of Excellence around 2009/2010. I still vividly remember sitting at my desk when Gavin Conibeer, behind me, said, “Hans, may I introduce you to Stephan Michard.” I hadn&amp;rsquo;t known Prof. Queisser was visiting the institute —and there I was, wearing shorts and flip flops in proper Aussie style, suddenly face to face with a true legend in our field.&lt;/p&gt;
&lt;p&gt;Prof. Queisser stayed for about two weeks. During that time, we had several lunches together and some truly memorable conversations. I was deeply impressed by his openness and curiosity — he didn’t care that I was “just a student”. He took real interest in who I was and what I was working on, offering encouragement and thoughtful advice.&lt;/p&gt;
&lt;p&gt;Later, he played a pivotal role in helping me secure a PhD position at the Forschungszentrum Jülich, supporting my application with a letter of recommendation.
In a private message, he wrote:&lt;br&gt;
&lt;strong&gt;“Now make something of it.”&lt;/strong&gt;&lt;br&gt;
I took that as a challenge, a mission and a source of motivation.&lt;/p&gt;
&lt;p&gt;Prof. Queisser’s guidance opened doors for me, and his example continues to inspire me to this day. I will remember him with deep gratitude — as a towering figure in science, and as someone who took the time to care.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Max-Planck Institute for Solid State Research LinkedIn Page - &lt;a href="https://www.linkedin.com/posts/max-planck-institute-for-solid-state-research-stuttgart-germany_we-mourn-the-passing-of-prof-dr-hans-joachim-activity-7348623097874255873-2tm6?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAAAtEiQcBEy8d8vSnm8NBZWZ0faicZji_MK0"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Book Review: The Start-Up of You by Reid Hoffman and Ben Casnocha</title><link>/2025/book-review-the-start-up-of-you-by-reid-hoffman-and-ben-casnocha/</link><pubDate>Sat, 12 Apr 2025 00:00:00 +0000</pubDate><guid>/2025/book-review-the-start-up-of-you-by-reid-hoffman-and-ben-casnocha/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/books/startup_of_you.jpg"data-src="/images/posts/books/startup_of_you.jpg"
/&gt;
&lt;/figure&gt;
&lt;p&gt;This post is a little different from the usual technical content on this blog. I came across this book through Scott Galloway&amp;rsquo;s podcast, where Hoffman was a guest. Galloway has a way of making you want to read things immediately, and this was one of those cases. I picked it up while I was at a point in my career where I was actively looking for a change, which in hindsight was probably the ideal moment to read it.&lt;/p&gt;
&lt;h2 id="the-authors"&gt;The Authors&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Reid Hoffman&lt;/strong&gt; is the co-founder of LinkedIn and one of the more prominent figures in the Silicon Valley ecosystem. Before LinkedIn, he worked at Apple and PayPal and co-founded SocialNet. After LinkedIn, he became a partner at Greylock and an early investor in companies like Airbnb and Dropbox. He is also the author of &lt;em&gt;Blitzscaling&lt;/em&gt;. Whatever one thinks of his platform, his career is a credible basis for writing about professional strategy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ben Casnocha&lt;/strong&gt; is an entrepreneur and author who co-founded Comcate, a software company for local governments, and has worked extensively in the startup and venture capital world. He brings a complementary angle to Hoffman&amp;rsquo;s perspective, and the collaboration shows in how the book is structured.&lt;/p&gt;
&lt;h2 id="what-the-book-argues"&gt;What the Book Argues&lt;/h2&gt;
&lt;p&gt;The central idea is straightforward: in a world where stable, lifelong career paths have largely disappeared, the most useful mental model for managing your career is the one used by startups. That means staying in permanent beta rather than assuming you are finished developing, investing in competitive differentiation, building a strong network as a strategic asset, and maintaining the flexibility to adapt when circumstances change.&lt;/p&gt;
&lt;p&gt;One of the more interesting threads in the book is the treatment of serendipity. Hoffman does not dismiss luck as a factor in career outcomes. Instead, he argues that you can meaningfully increase your exposure to fortunate encounters and unexpected opportunities. The way to do that is to be in motion: attend things, meet people, pursue adjacent interests, build genuine relationships rather than transactional ones. You cannot manufacture luck, but you can increase the surface area for it. That is a more honest and useful framing than the usual advice to simply &amp;ldquo;network more.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="my-take"&gt;My Take&lt;/h2&gt;
&lt;p&gt;I read this book during a period when I was actively thinking about a career change. I did not implement all of its suggestions, but I took some of them seriously. I attended two conferences I might otherwise have skipped, and I doubled down on personal learning. Eventually I landed a new job.&lt;/p&gt;
&lt;p&gt;Whether the book caused any of that is genuinely unclear to me. It may have been timing, or accumulated momentum that was already building, or simply good luck. But I find it hard to believe that none of it had any influence. The mindset shift the book advocates — from passive career management to something more deliberate and active — is useful, and it was a useful nudge at a specific moment.&lt;/p&gt;
&lt;p&gt;For anyone early in their career, in the middle of a transition, or just feeling stuck, this book is worth the few hours it takes to read. It is not a work of profound original thought, but it is honest, practical, and — at the right moment — it lands.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The Start-Up of You - &lt;a href="https://brockmann-buecher.buchhandlung.de/shop/article/15354375/reid_hoffman_ben_casnocha_the_startup_of_you_revised_and_updated_.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Scott Galloway - Prof G Pod - &lt;a href="https://podcasts.apple.com/us/podcast/civility-in-tech-centrists-and-advice-from-a/id1498802610?i=1000582505419"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Why Software Carbon Intensity Matters: An Introduction to the SCI Framework</title><link>/2024/why-software-carbon-intensity-matters-an-introduction-to-the-sci-framework/</link><pubDate>Mon, 16 Dec 2024 00:00:00 +0000</pubDate><guid>/2024/why-software-carbon-intensity-matters-an-introduction-to-the-sci-framework/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The digital revolution has transformed our world, but at what cost to our environment? Greenhouse gas (GHG) emissions from data centers have already surpassed those of the global airline industry and are expected to continue rising, highlighting the urgent need to address the carbon footprint of software. This article explores the Software Carbon Intensity (SCI) framework, an approach to measuring the environmental impact of software applications. The components of the SCI, its practical applications, and its role in enabling developers, architects, and organizations to create more sustainable software solutions will be explored.&lt;/p&gt;
&lt;h2 id="from-application-to-energy-sourcing"&gt;From Application to energy sourcing&lt;/h2&gt;
&lt;p&gt;Applications are deployed to fulfill specific business needs, and their operation requires careful consideration of availability, reliability, and efficiency. Decisions regarding high availability, backup solutions, and the geographic location of data centers significantly influence the environmental impact of software systems. The underlying infrastructure supporting these applications — comprising servers, storage, and networking components — consumes energy and resources not only during operational use but also throughout their production and manufacturing lifecycle.&lt;/p&gt;
&lt;p&gt;The energy consumed for operating data centers is directly linked to the choices made in infrastructure deployment. Whether the hardware is utilized in an on-premises data center, a co-location facility, or within the public cloud, these decisions affect the overall energy demand and resource utilization. Furthermore, the origin of the electricity powering these systems plays a crucial role in determining their energy footprint and GHG emissions.&lt;/p&gt;
&lt;p&gt;Understanding the comprehensive impact of these factors — ranging from application design to data center infrastructure and energy sourcing — allows for a detailed assessment of the energy and GHG footprint associated with software operations. This holistic view enables organizations to make informed decisions aimed at reducing their environmental impact.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_12/sci_1.jpg"data-src="/images/posts/post_12/sci_1.jpg"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;This diagram outlines the various components influencing the energy and greenhouse gas footprint of software operations, including applications, data center infrastructure, operational energy consumption, and energy utilities.&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introducing-the-software-carbon-intensity-framework"&gt;Introducing the Software Carbon Intensity Framework&lt;/h2&gt;
&lt;p&gt;To address the pressing need for measuring software&amp;rsquo;s environmental impact, the Green Software Foundation (GSF) developed the &lt;strong&gt;Software Carbon Intensity (SCI) framework&lt;/strong&gt;, now recognized as an ISO standard. The SCI framework provides a standardized method to calculate the carbon emissions associated with software applications, helping organizations quantify and reduce their environmental footprint.&lt;/p&gt;
&lt;p&gt;Although the formula might appear complex at first glance, its components are straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;E (Energy Consumption)&lt;/strong&gt;: The total energy used to operate the software.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;I (Carbon Intensity of Energy Source)&lt;/strong&gt;: The amount of CO₂ emitted per kilowatt-hour during the generation of electricity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;M (Embodied Carbon)&lt;/strong&gt;: The CO₂ emissions resulting from manufacturing the hardware that runs the software.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;R (Rate of Use)&lt;/strong&gt;: How the software scales—this could be per user, per API call, or any other relevant unit.&lt;/li&gt;
&lt;/ul&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_12/sci_2.jpg"data-src="/images/posts/post_12/sci_2.jpg"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Components of the Software Carbon Intensity (SCI) Framework&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The SCI formula helps organizations derive a carbon footprint for their software applications by considering both operational and embodied emissions relative to their usage scale. It is important to recognize that the SCI framework is designed to monitor an application&amp;rsquo;s environmental impact during its ongoing operation, rather than to compare different applications. Such comparisons would require standardized testing procedures and uniform hardware—conditions typically feasible only under controlled laboratory settings, which are unlikely in realistic, real-world scenarios.&lt;/p&gt;
&lt;p&gt;The following image illustrates how the key metrics for calculating the SCI value can be derived:
&lt;figure&gt;&lt;img src="/images/posts/post_12/sci_3.jpg"data-src="/images/posts/post_12/sci_3.jpg"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Deriving the Key Components of the Software Carbon Intensity Framework&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h2 id="why-the-sci-framework-matters"&gt;Why the SCI Framework Matters&lt;/h2&gt;
&lt;p&gt;Abhishek Gupta of Microsoft, Co-Chair of the SCI Specification Project, emphasizes the practical significance of the SCI framework: &amp;ldquo;The Software Carbon Intensity specification is exciting because it is a concrete manifestation of broad—and very important!—ideas of how we measure the carbon impacts of software systems. But, more importantly, it is about what we can do to mitigate those impacts,&amp;rdquo; he explains.&lt;/p&gt;
&lt;p&gt;By providing an actionable approach, the SCI framework empowers developers, architects, and organizations to make informed decisions that reduce carbon emissions. The framework focuses on the direct elimination of emissions by encouraging modifications to software systems that use less physical hardware, consume less energy, or leverage lower-carbon energy sources. Neutralization or avoidance offsets are not considered in reducing an SCI score, emphasizing the importance of tangible emission reductions. The SCI score offers a consistent and fair measure of a software system&amp;rsquo;s carbon footprint, enhancing awareness and transparency of its sustainability credentials. This enables practitioners to set clear targets during development, make evidence-based decisions in design and deployment, and track progress over time.&lt;/p&gt;
&lt;p&gt;By systematically applying the SCI framework across their application landscape, organizations can accurately compute the carbon intensity of their software systems. This comprehensive approach enables them to identify key areas where energy efficiency can be enhanced and empowers them to make informed decisions to reduce their overall environmental impact.&lt;/p&gt;
&lt;p&gt;The GSF conducts its work openly, following open-source principles, with all discussions, meeting notes, and agenda items publicly accessible on their GitHub repository. This transparent approach allows anyone — not just GSF members — to contribute ideas and participate in discussions.&lt;/p&gt;
&lt;h2 id="challenges-and-moving-forward"&gt;Challenges and Moving Forward&lt;/h2&gt;
&lt;p&gt;One of the main challenges in adopting the SCI framework is obtaining accurate and granular data, particularly regarding energy consumption and embodied carbon. Collaboration with hardware manufacturers, data center operators, and energy providers is crucial to gather this information.&lt;/p&gt;
&lt;p&gt;The GSF is actively working on case studies to demonstrate the SCI framework&amp;rsquo;s application in real-world scenarios. These examples aim to refine the framework further and encourage widespread adoption.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The SCI framework represents a significant step forward in promoting transparency by enabling organizations to monitor their software&amp;rsquo;s carbon emissions. This standardized method for measuring and understanding the carbon footprint associated with software allows companies to see the tangible consequences of their actions. As a result, organizations can make informed decisions and take meaningful steps to reduce their environmental impact.&lt;/p&gt;
&lt;p&gt;As our reliance on software continues to grow, integrating sustainability into software development is not just beneficial—it&amp;rsquo;s imperative. The SCI framework offers a clear path for organizations committed to making a positive environmental difference.&lt;/p&gt;
&lt;p&gt;A future blog post will explore practical methods for measuring and reducing the energy and carbon footprint of software applications, utilizing open-source tools and projects from the CNCF ecosystem.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Measuring greenhouse gas emissions in data centres: the environmental impact of cloud computing - &lt;a href="https://www.climatiq.io/blog/measure-greenhouse-gas-emissions-carbon-data-centres-cloud-computing"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Software Carbon Intensity (SCI) Specification - &lt;a href="https://sci.greensoftware.foundation/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub repository of the Green Software Foundation - &lt;a href="https://github.com/Green-Software-Foundation"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Software Carbon Intensity (SCI) Specification Achieves ISO Standard Status, Advancing Green Software Development - &lt;a href="https://greensoftware.foundation/articles/sci-specification-achieves-iso-standard-status"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Interview with the Co-Chairs of the SCI Specification Project - &lt;a href="https://greensoftware.foundation/articles/software-carbon-intensity-sci-specification-project"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Note: All links were accessed and verified as of the date of this post.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Enhancing Code Project Documentation through Automated Changelogs</title><link>/2024/enhancing-code-project-documentation-through-automated-changelogs/</link><pubDate>Tue, 26 Mar 2024 00:00:00 +0000</pubDate><guid>/2024/enhancing-code-project-documentation-through-automated-changelogs/</guid><description>&lt;p&gt;This article was published on March 25, 2024, on &lt;a href="https://www.opensourcerers.org/2024/03/25/enhancing-code-project-documentation-through-automated-changelogs/"&gt;opensourcerers.org&lt;/a&gt;:&lt;/p&gt;
&lt;h2 id="abstract"&gt;Abstract&lt;/h2&gt;
&lt;p&gt;In the rapidly evolving landscape of software development, documentation of modifications and updates is crucial for maintaining project continuity and ensuring team alignment. This blog article introduces &lt;strong&gt;Conventional Changelog&lt;/strong&gt;, a tool developed to address this very challenge. The tool transforms a project’s commit history into a detailed, readable changelog. Its adherence to the Conventional Commits and Semantic Versioning practices fosters a well-structured documentation that enhances transparency for users and contributors alike. Versatile by design, it integrates seamlessly into various deployment environments, from local IDEs to continuous integration pipelines like GitHub Actions and Tekton Tasks.&lt;/p&gt;
&lt;h2 id="motivation"&gt;Motivation&lt;/h2&gt;
&lt;p&gt;As projects evolve, maintaining a clear history of changes becomes a challenge. Traditional methods often fall short, leading to overlooked updates or a cluttered changelog. The need for a solution that not only automates this process but also aligns with best practices in software development — such as Semantic Versioning and Conventional Commits — sparked the idea to develop the proposed tool. &lt;strong&gt;Conventional Changelog&lt;/strong&gt; addresses this gap, offering a solution that is both comprehensive and easy to adopt, ensuring no code commit goes unrecorded.&lt;/p&gt;
&lt;p&gt;The proposed approach integrates three foundational best practices to enhance a project’s documentation:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://semver.org/"&gt;&lt;strong&gt;Semantic Versioning:&lt;/strong&gt;&lt;/a&gt; This practice involves structuring version numbers as MAJOR.MINOR.PATCH. Each segment signifies the nature of changes: MAJOR versions indicate incompatible API changes, MINOR versions add features in a backward-compatible manner, and PATCH versions address backward-compatible bug fixes. This method provides a clear, incremental structure for versioning that reflects the scope and impact of changes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.conventionalcommits.org/en/v1.0.0/#specification"&gt;&lt;strong&gt;Conventional Commits:&lt;/strong&gt;&lt;/a&gt; Building on the idea of structured commit messages, this practice categorizes code changes to clearly communicate their intent. Based on the &lt;a href="https://github.com/angular/angular/blob/22b96b9/CONTRIBUTING.md#-commit-message-guidelines"&gt;Angular Convention&lt;/a&gt; for code commits, valid categories are: feat:, fix:, build:, chore:, ci:, docs:, style:, refactor:, perf:, and test:. The proposed tool introduces additional categories such as deploy:, gitops:, and demo:. The motivation is to cover code changes of deployment files (e.g. Kubernetes manifests), code changes which trigger automated GitOps-driven deployments, code changes which are motivated by demonstration purposes. This ensures a well-organized commit history.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://keepachangelog.com/en/1.1.0/"&gt;&lt;strong&gt;Keep a Changelog:&lt;/strong&gt;&lt;/a&gt; Advocates for maintaining a changelog as a curated list of notable changes for each project version. It emphasizes structuring the changelog in a way that is accessible and informative for users, grouping changes by type and listing them chronologically. Including an “Unreleased” section helps to offer visibility into the latest code commit which might be part of upcoming software releases.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Together, these practices offer a comprehensive framework for managing software versioning, commit documentation, and changelog maintenance, making it easier for teams to navigate the complexities of project development and for users to stay informed about significant updates.&lt;/p&gt;
&lt;h2 id="executing-the-script-a-multifaceted-approach"&gt;Executing the Script: A Multifaceted Approach&lt;/h2&gt;
&lt;p&gt;The tool can be operated in various ways. These methods are explained in more detail below. To avoid exceeding the scope, minimal examples for the individual options will be used. This flexibility allows developers to choose the best approach for their individual workflow, enhancing productivity and ensuring accurate documentation of project evolution.&lt;/p&gt;
&lt;h3 id="local-execution"&gt;Local execution&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Conventional Changelog&lt;/strong&gt; stands out for its adaptability, easily incorporating into the local development environment. Developers can execute the script directly, ensuring their changelog remains up-to-date with every commit.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./generate_changelog_local.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alternatively, utilizing a container engine like Podman or Docker offers an isolated setup, guaranteeing consistent execution across different environments Independent of the underlying operating system.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;First, build the container image using the provided Dockerfile. This step creates an image with the necessary environment to run the script:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman build -t &amp;lt;image-name&amp;gt; -f Dockerfile
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;After building the image, run the container. This step mounts the current working directory into the container, allowing the script to access and update the changelog file within the project directory:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman run -it --rm -v &lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;:/repo&amp;#34;&lt;/span&gt; &amp;lt;image-name&amp;gt; sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Inside the container, navigate to the mounted repository directory and execute the script. This process generates the changelog within the containerized environment, reflecting the changes back to the local repository:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; repo
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./generate_changelog_local.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="github-action"&gt;GitHub Action&lt;/h3&gt;
&lt;p&gt;Integrating &lt;strong&gt;Conventional Changelog&lt;/strong&gt; into a CI/CD pipeline as a GitHub Action streamlines the process of keeping your changelog current and comprehensive. The configuration of the GitHub Actions workflow allows for the changelog generation to be initiated based on certain git operations, targeted branches, or through workflow dispatch, providing flexibility in how and when updates are documented.&lt;br&gt;
The following GitHub Actions workflow example is designed to trigger the automatic generation of an updated changelog with every code push to the main branch. For this functionality to operate correctly, it’s necessary to adjust the GitHub workflow permissions to have both read and write access in the repository settings (Settings -&amp;gt; Actions -&amp;gt; General -&amp;gt; Workflow permissions).&lt;/p&gt;
&lt;div class="collapsable-code"&gt;
&lt;input id="863157942" class="toggle" type="checkbox"checked /&gt;
&lt;label for="863157942" class="lbl-toggle"&gt;
&lt;span class="collapsable-code__language"&gt;YAML&lt;/span&gt;
&lt;span class="collapsable-code__title"&gt;GitHub Action Workflow&lt;/span&gt;
&lt;span class="collapsable-code__toggle" data-label-expand="△" data-label-collapse="▽"&gt;&lt;/span&gt;
&lt;/label&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-YAML" data-lang="YAML"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Generate Changelog&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;on&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;push&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;branches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;main ]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;changelog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;runs-on&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;ubuntu-latest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Generate and Commit Changelog&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Checkout Repository&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;uses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;actions/checkout@v4&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Generate Changelog&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;uses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;smichard/conventional_changelog@2.0.0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;with&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Set Git User Info&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="sd"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; git config user.name &amp;#39;GitHub Actions Bot&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; git config user.email &amp;#39;actions@github.com&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Commit Changelog&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="sd"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; git add CHANGELOG.md
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; git commit -m &amp;#34;docs: :robot: changelog file generated&amp;#34; || echo &amp;#34;No changes to commit&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; git push&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This automation streamlines the maintenance of the project’s documentation, ensuring a real-time, accurate account of changes, fixes, and new features. It’s a seamless process that saves time and improves accuracy, crucial for projects with frequent updates.&lt;/p&gt;
&lt;h3 id="tekton-task"&gt;Tekton Task&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Conventional Changelog&lt;/strong&gt; extends its versatility by offering seamless integration as a task within &lt;a href="https://tekton.dev/docs/"&gt;Tekton pipelines&lt;/a&gt;. This feature is particularly beneficial for users operating in Kubernetes and OpenShift environments, allowing for the automation of changelog generation as part of a deployment workflow.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Begin by applying the provided &lt;code&gt;tekton/task_generate_changelog.yml&lt;/code&gt; configuration. This step enables using the provided Task as part of a Tekton Pipeline. Make sure to have the git-clone Task installed in your &lt;a href="https://hub.tekton.dev/tekton/task/git-clone"&gt;cluster&lt;/a&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc apply -f tekton/task_generate_changelog.yml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Integrate the provided task into a Tekton pipeline. Find below a minimal pipeline configuration. This pipeline illustrates a minimal configuration which retrieves a Git repository and generates the changelog. However, the provided pipeline can serve as a blueprint to be adopted in a larger context. If the generated changelog file needs to be committed back to the repository, additional steps are required to handle the commit process:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="collapsable-code"&gt;
&lt;input id="183924576" class="toggle" type="checkbox"checked /&gt;
&lt;label for="183924576" class="lbl-toggle"&gt;
&lt;span class="collapsable-code__language"&gt;yaml&lt;/span&gt;
&lt;span class="collapsable-code__title"&gt;Minimal Tekton Task&lt;/span&gt;
&lt;span class="collapsable-code__toggle" data-label-expand="△" data-label-collapse="▽"&gt;&lt;/span&gt;
&lt;/label&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;tekton.dev/v1beta1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Pipeline&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;minimal-pipeline&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;workspaces&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;source&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;git-url&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;URL of the git repository&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;fetch-repository&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;taskRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;git-clone&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Task&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;workspaces&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;output&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;source&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;url&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;$(params.git-url)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;revision&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;main&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;generate-changelog&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;taskRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;generate-changelog&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;workspaces&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;source&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;source&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;runAfter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;fetch-repository&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Apply the pipeline:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc apply -f tekton/pipeline
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Integrating the solution as part of a Tekton pipeline, just as with a GitHub Action workflow, demonstrates the solution’s flexibility and ensures a timely and accurate record of changes, bug fixes, and new features.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In a dynamic software development world, maintaining an accurate and comprehensive project history is pivotal for team alignment and project continuity. The introduction of &lt;strong&gt;Conventional Changelog&lt;/strong&gt; offers a robust solution to this challenge, transforming commit histories into detailed, structured changelogs. This tool marries the principles of Conventional Commits and Semantic Versioning with the best practices of changelog maintenance, ensuring a transparent and accessible documentation process. Versatile enough to integrate with local IDEs, containerized environments, GitHub Actions, and Tekton Tasks, &lt;strong&gt;Conventional Changelog&lt;/strong&gt; streamlines documentation workflows, making it an essential tool for developers seeking to automate and enhance their project documentation practices. This post presented the motivation behind &lt;strong&gt;Conventional Changelog&lt;/strong&gt;, outlined its background, and provided practical guidance on its multifaceted execution strategies, demonstrating its utility in modern software development environments.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;GitHub Repository of Conventional Changelog - &lt;a href="https://github.com/smichard/conventional_changelog"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub Action Marketplace - &lt;a href="https://github.com/marketplace/actions/generate-changelog-based-on-conventional-commits"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Semantic Versioning Specification - &lt;a href="https://semver.org/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Conventional Commits Specification - &lt;a href="https://www.conventionalcommits.org/en/v1.0.0/#specification"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Angular Commit Message Guidelines - &lt;a href="https://github.com/angular/angular/blob/22b96b9/CONTRIBUTING.md#-commit-message-guidelines"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Keep A Changelog Specification - &lt;a href="https://keepachangelog.com/en/1.1.0/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Tekton Documentation - &lt;a href="https://tekton.dev/docs/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Documentation for the git clone Tekton Task - &lt;a href="https://hub.tekton.dev/tekton/task/git-clone"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</description></item></channel></rss>