Posts on Home

Running the Red Hat AI Inference Server on OpenShift

Sun, 17 May 2026 00:00:00 +0000

Drop-in OpenAI-compatible inference on OpenShift — RHAIIS packages vLLM for production, with hardware flexibility and a secure external endpoint out of the box - AI generated

Introduction

In this post, I want to describe how to deploy the Red Hat AI Inference Server (RHAIIS) on OpenShift and expose it as an OpenAI-compatible API endpoint. This post builds on Deploying OpenShift on AWS with Automated Cluster Provisioning, which covers getting a working OpenShift cluster into place. If you already have a cluster running, you can skip directly to the deployment steps.

The inference server will load a model from Hugging Face Hub and expose a /v1/chat/completions endpoint that any OpenAI-compatible client can talk to. At the end, I show how to connect the endpoint to the Open WebUI setup described in My Local AI Stack.

What is Red Hat AI Inference Server

vLLM is an open-source inference engine designed for high-throughput LLM serving. It handles memory-efficient attention via PagedAttention, continuous batching, and GPU-optimized execution, and it exposes an OpenAI-compatible HTTP API out of the box. I covered how to run vLLM on the GPU cloud provider RunPod in a previous post.

The Red Hat AI Inference Server is the supported, enterprise-packaged distribution of vLLM. Red Hat provides a hardened container image distributed through registry.redhat.io, tested against specific GPU driver and CUDA versions and with a defined support lifecycle. The API surface is identical to upstream vLLM. Any client that works against a plain vLLM inference server works against RHAIIS without modification.

Deploying RHAIIS directly on OpenShift is one way to reach a running inference endpoint through Red Hat technology. Red Hat OpenShift AI offers other paths, e.g. model serving through KServe, where OpenShift AI manages the deployment lifecycle via a web dashboard and exposes RHAIIS through a ServingRuntime, or a Model as a Service approach that provisions shared inference endpoints across a cluster, so teams can consume models without operating their own deployment. The approach in this post is the most direct option, suited for cases where you want a single inference endpoint.

Prerequisites

This setup requires the following:

A running OpenShift cluster with at least one GPU-enabled worker node. The post Deploying OpenShift on AWS covers one way to get there.
Node Feature Discovery (NFD) Operator installed and running to detect GPU hardware on the node.
NVIDIA GPU Operator installed to provide the CUDA runtime and device plugin.
OpenShift CLI (oc) – required to interact with the OpenShift cluster, installed and logged into the cluster.
A Hugging Face access token if you intend to use a gated model. Publicly available models like Granite do not require one.

Deploying the Red Hat AI Inference Server

The deployment consists of a namespace, two secrets, a PersistentVolumeClaim for model caching, a Deployment, a Service, and a Route. All deployment files are available in the smichard/agent_on_ocp GitHub repository. The steps below apply them in sequence.

Clone the repository:

git clone https://github.com/smichard/agent_on_ocp.git
cd rhaiis

Create a Namespace

oc new-project rhaiis

Create the required Secrets

Hugging Face access token:

oc create secret generic hf-secret \
 --from-literal=HF_TOKEN=<your_huggingface_token> \
 -n rhaiis

API key for the inference endpoint:

The server requires clients to present an API key as a bearer token. Storing it as a secret keeps it out of the Deployment spec.

oc create secret generic vllm-api-key-secret \
 --from-literal=VLLM_API_KEY=$(openssl rand -hex 32) \
 -n rhaiis

Create the ConfigMap

Set the Hugging Face model ID you want to serve. Research which model fits your use case before settling on one, the only hard requirement is that the model is supported by the vLLM inference server. The ConfigMap also carries the tool call parser name, which the deployment references to set the correct parsing mode for the chosen model.

apiVersion: v1
kind: ConfigMap
metadata:
 name: vllm-config
 namespace: rhaiis
data:
 MODEL_NAME: "Qwen/Qwen3-Coder-30B-A3B-Instruct"
 TOOL_CALL_PARSER: "qwen3_coder"

Apply the file to create the ConfigMap:

oc apply -f configmap.yaml

Create a PersistentVolumeClaim

The model weights are downloaded once on first startup and cached on a persistent volume. This avoids re-downloading the model on every pod restart.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: model-cache
 namespace: rhaiis
spec:
 accessModes:
 - ReadWriteOnce
 resources:
 requests:
 storage: 150Gi

Apply the file to create the PVC:

oc apply -f pvc.yaml

Deploy the Inference Server

The Deployment below references the RHAIIS container image and pulls the model ID from the ConfigMap created in step 4. To serve a different model, update the ConfigMap rather than editing the Deployment spec. The HF_TOKEN and VLLM_API_KEY values are injected from the secrets created in step 3.

Note

Depending on the model size, the number of GPUs and the CPU and memory allocations will need to be adjusted. The example below was tested on an AWS g5.12xlarge node (4x NVIDIA A10G, 24 GB VRAM per GPU) and uses all four GPUs via tensor parallelism.

apiVersion: apps/v1
kind: Deployment
metadata:
 name: rhaiis-vllm
 namespace: rhaiis
 labels:
 app: rhaiis-vllm
spec:
 replicas: 1
 selector:
 matchLabels:
 app: rhaiis-vllm
 template:
 metadata:
 labels:
 app: rhaiis-vllm
 spec:
 tolerations:
 - key: nvidia.com/gpu
 effect: NoSchedule
 operator: Exists
 serviceAccountName: default
 volumes:
 - name: model-cache
 persistentVolumeClaim:
 claimName: model-cache
 - name: shm
 emptyDir:
 medium: Memory
 sizeLimit: "16Gi"
 containers:
 - name: vllm
 image: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.1-1775680192
 imagePullPolicy: Always
 env:
 - name: HF_TOKEN
 valueFrom:
 secretKeyRef:
 name: hf-secret
 key: HF_TOKEN
 - name: VLLM_API_KEY
 valueFrom:
 secretKeyRef:
 name: vllm-api-key-secret
 key: VLLM_API_KEY
 - name: MODEL_NAME
 valueFrom:
 configMapKeyRef:
 name: vllm-config
 key: MODEL_NAME
 - name: HF_HOME
 value: /cache
 - name: HF_HUB_OFFLINE
 value: '0'
 - name: VLLM_ALLOW_LONG_MAX_MODEL_LEN
 value: '1'
 - name: TOOL_CALL_PARSER
 valueFrom:
 configMapKeyRef:
 name: vllm-config
 key: TOOL_CALL_PARSER
 command:
 - python
 - '-m'
 - vllm.entrypoints.openai.api_server
 args:
 - '--port=8000'
 - '--model=$(MODEL_NAME)'
 - '--served-model-name=$(MODEL_NAME)'
 - '--tensor-parallel-size=4'
 - '--gpu-memory-utilization=0.85'
 - '--max-model-len=65536'
 - '--enable-auto-tool-choice'
 - '--tool-call-parser=$(TOOL_CALL_PARSER)'
 resources:
 limits:
 cpu: '10'
 nvidia.com/gpu: '4'
 memory: 128Gi
 requests:
 cpu: '2'
 memory: 32Gi
 nvidia.com/gpu: '4'
 volumeMounts:
 - name: model-cache
 mountPath: /cache
 - name: shm
 mountPath: /dev/shm
 restartPolicy: Always

Apply the file to create the deployment:

oc apply -f deployment.yaml

The container reads the model ID from the ConfigMap at startup and downloads it from HuggingFace into /cache (backed by the PVC). Initial startup takes several minutes depending on model size and network speed. Follow the progress with:

oc logs -f deployment/rhaiis-vllm -n rhaiis

The server is ready when the log shows Application startup complete.

vLLM server log output on startup, showing all registered API routes and the final Application startup complete confirmation

Once the pod is running, you can verify GPU access from the pod terminal with nvidia-smi. All four GPUs should be visible, each running a tensor-parallel worker process.

nvidia-smi output from inside the vLLM pod, confirming all four A10G GPUs are visible and each tensor-parallel worker has allocated approximately 20 GB of VRAM

Create a Service and Route

Create a Service that maps port 80 to port 8000 on the pod:

apiVersion: v1
kind: Service
metadata:
 name: rhaiis-vllm
 namespace: rhaiis
 labels:
 app: rhaiis-vllm
spec:
 selector:
 app: rhaiis-vllm
 ports:
 - name: http
 protocol: TCP
 port: 8000
 targetPort: 8000

Create a TLS-terminated Route if you want to expose the endpoint outside the cluster:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
 name: rhaiis-vllm
 namespace: rhaiis
 labels:
 app: rhaiis-vllm
spec:
 to:
 kind: Service
 name: rhaiis-vllm
 port:
 targetPort: http
 tls:
 termination: edge
 insecureEdgeTerminationPolicy: Redirect

Apply both and retrieve the assigned hostname:

oc apply -f service.yaml
oc apply -f route.yaml
oc get route rhaiis-vllm -n rhaii-namespace -o jsonpath='{.spec.host}'

OpenShift builds the hostname from the route and namespace names following the pattern <route-name>-<namespace>.apps.<cluster-domain>. The result looks something like rhaiis-vllm-rhaiis-namespace.apps.ocp.example.com.

Testing the Endpoint

Store the hostname and API key in shell variables to keep the commands readable:

Set environment variables once:

export RHAIIS_HOST=$(oc get route rhaiis-vllm -n rhaiis -o jsonpath='{.spec.host}')
export RHAIIS_API_KEY=$(oc get secret vllm-api-key-secret -n rhaiis \
 -o jsonpath='{.data.VLLM_API_KEY}' | base64 -d)
export MODEL=$(oc get configmap vllm-config -n rhaiis \
 -o jsonpath='{.data.MODEL_NAME}')

Verify all three are populated before proceeding:

echo "RHAIIS_HOST : ${RHAIIS_HOST}"
echo "RHAIIS_API_KEY : ${RHAIIS_API_KEY}"
echo "Model: ${MODEL}"

**List available models:**

```bash
curl -s https://$RHAIIS_HOST/v1/models \
 -H "Authorization: Bearer $RHAIIS_API_KEY" | jq .

Send a chat completion request:

curl -sS \
 "https://${RHAIIS_HOST}/v1/chat/completions" \
 -H "Authorization: Bearer ${RHAIIS_API_KEY}" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "'"${MODEL}"'",
 "messages": [{"role": "user", "content": "What is OpenShift?"}],
 "temperature": 0.1,
 "max_tokens": 200
 }' | jq -r '.choices[0].message.content'

A successful response confirms the server is running, the model is loaded, and the API key authentication is working.

Connecting to Open WebUI

The inference server exposes a standard OpenAI-compatible API, which means Open WebUI can connect to it directly as an external provider. The setup in My Local AI Stack already runs Open WebUI. Adding the RHAIIS endpoint as a direct external connection requires no changes to the existing stack.

In Open WebUI, go to Settings > Connections and add a new external connection. Set the URL to the route hostname with the /v1 suffix, add the API key created in step 3 as a bearer token, set the provider type to OpenAI, and the API type to Chat Completions. Leave the model ID field empty so Open WebUI queries the /v1/models endpoint and discovers available models automatically.

Open WebUI external connection configured against the Red Hat AI Inference Server endpoint

Once saved, the deployed model appears in the model selector alongside any other configured providers.

Conclusion

The Red Hat AI Inference Server puts the vLLM engine into OpenShift, or any other supported platform, with a supported container image and a deployment pattern that fits standard Kubernetes workflows. The outcome is an OpenAI-compatible endpoint running on your own cluster, backed by a model from Hugging Face Hub, secured with an API key, and accessible over a TLS-terminated OpenShift Route. Any client that speaks the OpenAI Chat Completions format can talk to it, including Open WebUI, which connects to it the same way it connects to any other provider.

References

GitHub repository with eployment files - link
Deploying OpenShift on AWS with Automated Cluster Provisioning - link
My Local AI Stack: Open WebUI, LiteLLM, SearXNG, and Docling - link
Extending the Local AI Stack with On-Demand GPU Inference on RunPod - link
Model as a Service GitHub repository - link
Node Feature Discovery Operator - link
NVIDIA GPU Operator - link
OpenShift CLI (oc) - link
Granite family of models on Hugging Face - link
smichard/agent_on_ocp - GitHub repository - link
Red Hat AI Inference Server - Documentation - link
Deploying Red Hat AI Inference Server on OpenShift - link
vLLM - upstream project - link
vLLM - OpenAI-compatible server documentation - link
Open WebUI - project site - link

Installing OpenShift AI on OpenShift

Thu, 14 May 2026 00:00:00 +0000

From GitOps repo to OpenShift AI deployment with verified GPU access in minutes - AI generated]

Introduction

In this post, I want to describe how to install Red Hat OpenShift AI on an existing OpenShift cluster and configure it to run GPU-accelerated workloads. The approach uses the rhoai-gitops repository, created and maintained by my team mate Álvaro López Medina, which automates the installation of OpenShift AI, the required operators, and the NVIDIA GPU stack through a single script backed by a GitOps approach.

If you do not have an OpenShift cluster available yet and want to provision one on AWS, a previous post Deploying OpenShift on AWS with Automated Cluster Provisioning covers exactly that. The steps below pick up where that post leaves off, though they apply equally to any running OpenShift cluster.

Prerequisites

Before proceeding, ensure the following are in place:

A running OpenShift cluster with sufficient compute capacity
The OpenShift CLI (oc) installed and available on your workstation
Cluster-admin access
If GPU support is needed: sufficient AWS quota for GPU instance types

Selecting the correct GPU instance node type

Selecting the right GPU instance type for your workload is a decision that is worth getting right before you provision anything, the instance family determines not just raw performance but also memory capacity, which directly constrains which models you can load and at what precision. Undersizing leads to out-of-memory failures, oversizing means paying for capacity you do not use.

Consult the AWS recommended GPU instances for deep learning to identify instance families suited to your workload, then cross-reference with the EC2 instance type availability by region to confirm that your target region actually offers the instance type you need. GPU instance availability varies significantly across regions and is a common source of unexpected quota errors at deployment time.

The following AWS instance types are commonly used in OpenShift AI GPU deployments:

Instance Name	GPU	GPU RAM	vCPUs	RAM
g5.4xlarge	1x NVIDIA A10G	24 GiB	16	64 GiB
g5.12xlarge	4x NVIDIA A10G	96 GiB	48	192 GiB
g5.24xlarge	4x NVIDIA A10G	96 GiB	96	384 GiB
g5.48xlarge	8x NVIDIA A10G	192 GiB	192	768 GiB
p4d.24xlarge	8x NVIDIA A100	320 GiB	96	1,152 GiB

Installing OpenShift AI

Clone the rhoai-gitops repository:

git clone https://github.com/alvarolop/rhoai-gitops
cd rhoai-gitops

Open the installation script and review the GPU-related configuration:

vi auto-install.sh

The three parameters that matter most for GPU-enabled deployments:

CREATE_GPU_MACHINESETS (Line 9): When set to true, the script automatically creates MachineSets for GPU nodes. Set to false if you do not need GPU support initially.
GPU_NODE_COUNT (Line 10): Total number of GPU nodes to provision. The nodes are distributed across Availability Zones a, b, and c for resilience.
AWS_GPU_INSTANCE (Line 18): Defaults to g5.4xlarge, which provides an NVIDIA A10G GPU per node. Adjust based on the workload requirements and available quota.

Throughout the following steps, any value written in <angle brackets> is a placeholder and must be replaced with your actual value before running the command.

oc login -u <user_name> <cluster_api_url>

Run the installation script:

./auto-install.sh

The script installs the required operators — including the OpenShift AI Operator, the Node Feature Discovery Operator, and the NVIDIA GPU Operator — and provisions GPU MachineSets if configured to do so. Depending on node provisioning times, the complete process takes 15 to 30 minutes.

Confirm that the GPU worker nodes have joined the cluster:

oc get machineset -n openshift-machine-api
oc get machine -n openshift-machine-api
oc get nodes

Verify that the NVIDIA driver is loaded and that the GPU is accessible:

oc exec -it -n nvidia-gpu-operator \
 $(oc get pod -o wide -l openshift.driver-toolkit=true \
 -o jsonpath="{.items[0].metadata.name}" \
 -n nvidia-gpu-operator) \
 -- nvidia-smi

nvidia-smi output confirming GPU access from within the NVIDIA GPU Operator pod

Check the Argo CD applications deployed as part of the GitOps installation:

Argo CD application overview after the rhoai-gitops installation completes

All applications should be in a healthy and synced state before proceeding to configuration.

Configuring OpenShift AI for GPU Workloads

With OpenShift AI installed, a small amount of configuration is needed to allow workbenches to schedule onto the GPU nodes. GPU nodes in OpenShift are typically tainted with nvidia.com/gpu:NoSchedule to prevent standard workloads from landing on them accidentally. Workbenches that need GPU access must be configured with a matching toleration.

Check the taints applied to the GPU nodes:

oc get nodes
oc describe node <gpu_node_name>

The relevant taint will appear as nvidia.com/gpu=:NoSchedule in the node description.

In the OpenShift AI console, navigate to Settings > Hardware Profiles and create a new profile (for example, nvidia-gpu).
Add a Toleration with the following values:

Field	Value
Key	`nvidia.com/gpu`
Effect	`NoSchedule`
Operator	`Exists`

Configuring a toleration for the NVIDIA GPU taint in the Hardware Profile

This toleration allows workbenches assigned to this profile to be scheduled onto GPU nodes while keeping those nodes unavailable to other workloads.

Create a new workbench and select the nvidia-gpu hardware profile. The workbench pod will be scheduled on a GPU node.
Once the workbench is running, open a terminal and confirm GPU access:

nvidia-smi

nvidia-smi output from inside an OpenShift AI workbench, confirming direct access to the NVIDIA A10G GPU

For a complete reference on hardware profiles and toleration configuration, the Red Hat OpenShift AI documentation covers the options in detail.

Conclusion

The rhoai-gitops repository makes the Red Hat OpenShift AI installation genuinely straightforward: one script handles the operator stack, the GPU node provisioning, and the GitOps wiring. The manual steps that remain — creating the hardware profile and configuring the workbench — are minimal and need to be done only once per cluster.

The end result is an OpenShift AI environment with full GPU access, ready for running Jupyter notebooks, training jobs, or serving models. If you provisioned the underlying cluster using the approach described in Deploying OpenShift on AWS with Automated Cluster Provisioning, the two repositories together cover the entire path from a blank AWS account to a working AI platform within a short timeframe of approximately two hours.

References

rhoai-gitops - GitHub repository by Álvaro López Medina - link
ocp-on-aws - GitHub repository by Álvaro López Medina - link
Red Hat OpenShift AI - Managing Hardware Profiles - link
OpenShift AI - Product documentation - link
OpenShift CLI (oc) - Getting started - link
NVIDIA GPU Operator documentation - link
AWS EC2 instance type availability by region - link
AWS recommended GPU instances for deep learning - link
G5-Instances von Amazon EC2 - link
Amazon-EC2-P4-Instances - link

Deploying OpenShift on AWS with Automated Cluster Provisioning

Sat, 09 May 2026 00:00:00 +0000

The full provisioning pipeline: CLI setup, ocp-on-aws config, and a single script that spins up VPCs, EC2 instances, DNS records, and an Argo CD baseline - AI generated

Introduction

In this post, I want to describe how to deploy Red Hat OpenShift in a blank Amazon Web Services (AWS) environment using a fully automated and repeatable approach. This post is part of a series of two posts: 1. This post covers the cluster provisioning step. 2. The installation of OpenShift AI on top of the running OpenShift cluster is covered in a separate post: Install OpenShift AI on OpenShift. If you already have an OpenShift cluster available, feel free to jump straight to that post. Both workflows build on two GitHub repositories that cover both infrastructure provisioning and the installation of the AI platform components, and they reduce what could easily be a multi-hour manual effort to a handful of shell commands.

I should be upfront: one purpose of this post is also to serve as a personal reference for future me, who will inevitably return here after six months asking “wait, what was the exact command again?” Consider this the written documentation I should have filed away the first time.

A special thanks goes to my team mate Álvaro López Medina, who created and maintains the ocp-on-aws and rhoai-gitops repositories. Without his work and support, setting up this environment would have been significantly more involved.

Prerequisites

Before starting, a Linux workstation or jump host is recommended for running the commands. The following command line tools must be installed and configured:

OpenShift CLI (oc) – required to interact with the OpenShift cluster
AWS CLI – required to provision and manage AWS infrastructure
htpasswd – required to generate user credentials for the cluster

These are fundamental prerequisites. The installation scripts will fail or behave unexpectedly without them.

Ordering an AWS Blank Environment

For Red Hat employees and Red Hat partners, the easiest starting point is an AWS Blank Open Environment from the Red Hat Demo Platform (RHDP). Otherwise, an existing AWS account accessed through the AWS Web Console works just as well.

This tutorial was validated against eu-west-1. The blank environment provides a clean, ephemeral AWS account with the necessary IAM permissions and service quotas to support an Installer-Provisioned Infrastructure (IPI) deployment of OpenShift.

Once the environment is provisioned, the service overview page contains the AWS access credentials and the base DNS zone that will be needed in the configuration step below.

Deploying OpenShift on AWS

With the AWS environment in place, the ocp-on-aws repository handles the rest of the cluster provisioning. The repository wraps the OpenShift IPI installer in a shell script and manages user creation, cluster-admin group configuration, and the pull secret in a structured, repeatable way.

Preparing the repository

Throughout the following steps, any value written in <angle brackets> is a placeholder and must be replaced with your actual value before running the command.

Clone the repository:

git clone https://github.com/alvarolop/ocp-on-aws
cd ocp-on-aws

Copy the authentication file templates:

cp auth/users.htpasswd.example auth/users.htpasswd
cp auth/group-cluster-admins.yaml.example auth/group-cluster-admins.yaml

Generate a password hash for your user:

htpasswd -b -B auth/users.htpasswd <user_name> <password>

Adjust auth/group-cluster-admins.yaml to list the users that should receive cluster-admin privileges:

apiVersion: user.openshift.io/v1
kind: Group
metadata:
 name: cluster-admins
users:
 - redhat
 - <user_name>

Configuring the installation

Copy the configuration template:

cp aws-ocp4-config aws-ocp4-config-labs

Open the configuration file and adjust the following parameters:

vi aws-ocp4-config-labs

The key values to review:

OPENSHIFT_VERSION (Line 6): Set this to match your local oc client version for maximum compatibility.
RHPDS_TOP_LEVEL_ROUTE53_DOMAIN (Line 9): The base DNS zone for your cluster; find this in the RHDP service overview.
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (Lines 16–18): The programmatic access credentials from the RHDP environment, required to create the VPC and EC2 instances.
RHOCM_PULL_SECRET (Line 31): Retrieve this from the Hybrid Cloud Console.
WORKER_REPLICAS (Line 47): Set to the number of worker nodes required for your workload.

Running the installation

Start the cluster installation:

./aws-ocp4-install.sh aws-ocp4-config-labs

The script invokes the OpenShift IPI installer and creates all required AWS infrastructure: VPC, subnets, EC2 instances, Elastic Load Balancers, and Route53 DNS records. The process typically takes 30 to 45 minutes. It is worth monitoring the AWS console in the corresponding region during this time to observe the resources coming up.

EC2 instances and load balancers provisioned in AWS after the installation completes

Once the installer finishes, the cluster API and console URLs, along with the kubeconfig file, will be available in the output and in the auth/ directory of the repository.

Argo CD applications deployed as part of the cluster bootstrap

The installation script also bootstraps a set of Argo CD applications that manage cluster-level configurations through GitOps from the start. This gives the cluster a solid, declarative baseline before any additional workloads are installed.

Conclusion

The combination of the AWS blank environment and the ocp-on-aws repository makes it straightforward to spin up a fully functional OpenShift cluster in under an hour with minimal manual intervention. The IPI installer handles the infrastructure details, and the GitOps bootstrap ensures a consistent cluster configuration from the first login.

With the cluster in place, the next step is installing OpenShift AI and enabling GPU support, which is covered in the follow-up post: Install OpenShift AI on OpenShift.

References

ocp-on-aws - GitHub repository by Álvaro López Medina - link
rhoai-gitops - GitHub repository by Álvaro López Medina - link
Red Hat Demo Platform - link
OpenShift CLI - Getting started - link
AWS CLI - Installation guide - link
htpasswd - link
Red Hat Hybrid Cloud Console - Pull Secret - link

Hermes Agent: A Personal AI That Gets More Useful Over Time

Sat, 02 May 2026 00:00:00 +0000

How Hermes Agent Works: From Closed-Loop Learning to Multi-Platform Deployment - AI generated

Introduction

I came across the Hermes Agent project in early March 2026 and deployed it a couple of days later. A couple of weeks in I am still using it daily, and the use cases keep expanding rather than converging. Most tools settle into a narrow routine or fall off altogether. What keeps this one going is that the agent gets more useful the longer you run it. The project is young and moving fast, with new releases every few days. The initial setup requires patience: getting the configuration to a point where it actually saves time takes effort, and the frequent updates occasionally introduce breaking changes. That said, it is genuinely fun to use, and you learn a fair amount along the way.

Hermes Agent is an open-source, self-hosted AI agent framework built by Nous Research, an independent AI research lab based in New York. Nous Research is best known for the Hermes model family, a series of open-weight models fine-tuned on Llama that are used widely in the open-source AI community. The agent framework shares the name but is a separate project. It is MIT-licensed, model-agnostic, and runs on your own infrastructure, either as a self-hosted Python service or as a containerized deployment.

How It Works

The part that makes Hermes Agent different from most agent frameworks is the skill system. The agent ships with a set of preconfigured skills covering common tasks. Beyond that, you can ask it to create a skill from something it just did: it writes a structured Markdown document capturing the approach, what worked, and describes possible edge cases. The next time a similar task appears, the agent loads the relevant skill rather than starting from scratch. Skills can be triggered directly by asking Hermes to run one, or set on a schedule and executed automatically at defined intervals. Over time this turns completed work into a growing library of reusable operating knowledge. Version v0.12.0 added an Autonomous Curator to keep that library from growing unwieldy. It runs on a seven-day cycle by default, grades skills by usage, consolidates overlapping ones, and removes those that have stopped being useful. A short report is written after each run, so you can see what changed and why.

Alongside the skill system, the agent maintains three layers of memory: a persistent store for completed tasks and notes, a full-text search index across prior sessions, and a user model that accumulates preferences over time, coding style, communication tone, timezone, tools. The idea is that the agent gets more useful the longer you run it, not just better at individual tasks in isolation.

My Setup

Hermes Agent runs in my homelab as a service on a dedicated Linux host. Keeping it on a separate machine gives me direct control over what the agent has access to. Incoming traffic is routed through Traefik. I access it through three entry points depending on where I am and what I am doing. The primary interface is the Matrix chat protocol, which means I can reach the agent from any Matrix client on any device. I also connected it to a dedicated email inbox, so it can handle certain tasks asynchronously. For longer sessions at my desk I use Open WebUI, which gives a more comfortable interface for extended conversations.

The model configuration is versatile: the agent supports various AI services and model providers.

What I Gave It Access To

I gave the agent access to three local knowledge sources: my bookmarks, a structured knowledge base, and a local mirror of Red Hat’s product documentation.

The first is my bookmarks folder. I have been saving links as Markdown files in Obsidian for several years. The agent can search and cross-reference that collection when doing research, which means it draws on context I actually care about rather than training data alone.

The second is a knowledge base built on the LLM Wiki principle described by Andrej Karpathy. The idea is to maintain a curated set of structured Markdown files that an AI agent helps write and update over time. Topics, entities, comparisons, each in its own file. The agent both contributes to this knowledge base and draws from it when working on research tasks.

The third is a local mirror of Red Hat’s product documentation. A team mate built a tool called rh-mastery that pulls documentation from docs.redhat.com, converts it to Markdown, and stores it in a structured local directory. Pointed at that directory, Hermes can query accurate, version-tracked product documentation without touching the internet. For someone who spends a lot of time with Red Hat products, this closes a gap that is easy to overlook until you actually need it. More on rh-mastery in an upcomming post.

Practical Uses

The combination of bookmarks, structured knowledge, Red Hat’s product documentation, and the skill system makes the agent genuinely useful for research. When I ask it to investigate a topic, it starts with what I have already collected: prior notes, bookmarks, and documentation. If that is not enough, and when asked, it reaches out to the web to fill the gaps. The result is something grounded in material I collected and curated myself, which makes the output in most cases very useful.

One use I did not expect to find as useful: slide generation. I integrated Marp, a Markdown-based presentation framework, into the workflow. When I need to put together a presentation and am staring at a blank file, I can ask the agent to draft an initial structure. Getting past that first empty screen is often the hardest part. Whether I keep most of what it produces is a different question, but having something to react to is worth more than nothing to start from.

Skills and Subagents

The agent can develop and add skills on its own as it works, but skills can also be added manually or loaded from the community hub at agentskills.io. More interesting to me is the subagent capability: the agent can delegate tasks to specialized subagents, each backed by a specific AI service or holding a particular context. This makes it possible to compose workflows where different parts of a task go to the most appropriate model.

Conclusion

Several weeks in is not a long track record, and the project is still moving fast enough that some things will break between releases. That said, the architecture is sound and the development pace is truly impressive. Whether I will keep running it long-term, I genuinely do not know. For now, it is pulling its weight. For anyone already running a homelab and looking for a self-hosted agent that gets more useful over time rather than staying flat, Hermes Agent is worth the setup time.

Peter Steinberger, the creator of OpenClaw, another widely-used AI agent framework, put it well in a recent TED talk: “The bottleneck is no longer typing. It’s thinking.” That observation fits. The agent handles the mechanical parts of research and structuring. The judgment about what matters and what to do with it still has to come from someone. For now, a human in the loop is still necessary.

References

Hermes Agent on GitHub - link
Hermes Agent Documentation - link
Nous Research - link
Matrix - link
OpenRouter - link
Andrej Karpathy LLM Wiki concept - link
Marp - link
agentskills.io - link
Peter Steinberger TED talk - link

Extending the Local AI Stack with On-Demand GPU Inference on RunPod

Sat, 07 Mar 2026 00:00:00 +0000

Conceptual illustration of the extended AI stack with elastic cloud GPU resources for running large language models on demand - AI generated

Introduction

In this post, I want to describe how I extended the local AI stack I built in my homelab with on-demand GPU-backed model inference, without adding any GPU hardware to the lab itself.

The two previous posts in this series provide the context for what follows. The homelab post covers the base infrastructure: thin clients, Docker Compose, Traefik, and internal DNS. The local AI stack post describes how Open WebUI, LiteLLM, SearXNG, and Docling sit on top of that infrastructure to form a self-hosted AI environment. That stack works well, and I have been using it for a while. Keeping the lab CPU-only is a deliberate choice. For orchestration, document workflows, and routing requests to publicly available AI services, dedicated GPU hardware at home is simply not necessary. When I want to try a particular model that is not available through a managed API, or experiment with something freshly released on Hugging Face, I rent the compute on demand rather than maintain it permanently.

The solution is straightforward: rent GPU capacity on demand from a specialized cloud provider, expose it as an OpenAI-compatible endpoint, and wire it into the existing stack. No new hardware, no permanent cost, no changes to the tools I already use.

A Note on Neo Clouds

The providers that specialize in this type of GPU-first infrastructure are sometimes called Neo Clouds. The term emerged around 2024 to distinguish GPU-specialist vendors such as RunPod, CoreWeave and others from traditional hyperscalers. In practice, I am not sure the new term adds much. For me these are specialized cloud providers focused on GPU compute and AI workloads. Useful services, somewhat unnecessary branding.

Why RunPod

I use RunPod for this setup for a few practical reasons. The interface is intuitive, the deployment path from template to running pod is short, and the GPU catalog is broad enough to cover most use cases. Pricing is per second with no ingress or egress fees, which makes on-demand experimentation economical. RunPod also exposes an API for its core operations, so deployments can be automated rather than driven entirely through the UI.

A detailed description of all RunPod services is out of scope for this post. The focus here is on one specific workflow: deploying a vLLM inference server with a model loaded from Hugging Face, and connecting the resulting endpoint to Open WebUI.

Deploying a vLLM Inference Server on RunPod

RunPod uses templates to save pod configurations for reuse. A template defines the container image, the start command, the storage allocation, and other runtime parameters. I maintain a small collection of private templates, each configured for a different model.

A selection of saved vLLM templates on RunPod, each using to a different model from Hugging Face

The container image for all of these templates is vllm/vllm-openai:latest, which bundles vLLM with an OpenAI-compatible API server. The model itself is specified in the container start command, which means swapping models is a matter of editing a single line.

Creating a Template

When creating or editing a template, the key fields are:

Type: Pod
Compute type: Nvidia GPU
Container image: vllm/vllm-openai:latest
Container start command: the vLLM arguments, including the model reference

Template configuration for the vllm_gemma-3-12b template, showing the container image and start command

Throughout the following steps, any value written in <angle brackets> is a placeholder and must be replaced with your actual value before running the command.

A start command for deploying the Red Hat’s validated RedHatAI/Qwen3-8B-FP8-dynamic model looks like this:

--host 0.0.0.0 --port 8000 \
 --model RedHatAI/Qwen3-8B-FP8-dynamic \
 --dtype bfloat16 \
 --enforce-eager \
 --gpu-memory-utilization 0.95 \
 --api-key <api_key> \
 --max-model-len 8128

The parameters worth noting:

--model: any model available on Hugging Face can be referenced here by its repository path
--dtype bfloat16: sets the compute dtype; bfloat16 is a good default for inference on NVIDIA hardware
--enforce-eager: disables CUDA graph capture, which reduces memory overhead at the cost of some throughput; useful when fitting larger models on a single GPU
--gpu-memory-utilization 0.95: allows vLLM to use up to 95% of available GPU memory for the KV cache
--api-key: sets a bearer token for the OpenAI-compatible endpoint; always set this when deploying a public endpoint
--max-model-len: caps the maximum sequence length; reducing this frees memory and allows larger models to fit on smaller GPUs

Selecting a GPU and Deploying

Once the template is configured, deploying it requires selecting a GPU and clicking deploy. RunPod shows available hardware with current pricing.

GPU selection on RunPod, ranging from RTX 2000 Ada class cards to H200 and B200 datacenter accelerators

For most inference workloads with 8 to 12 billion parameter models, an RTX 4090 or L4 is a practical and cost-effective choice. Larger models with higher memory requirements will need 48 GB or 80 GB class cards. The per-hour pricing shown in the interface makes it easy to estimate cost for a session before committing.

After deployment, RunPod assigns a public HTTPS endpoint to the pod. The vLLM server is reachable at that endpoint on port 8000, with the path structure matching the OpenAI API.

Connecting the Endpoint to Open WebUI

With the pod running and the model loaded, the endpoint can be added to Open WebUI as an external connection. In Open WebUI, navigate to Admin Panel then Settings and add a new connection with the following values:

Connection type: External
URL: https://<runpod_endpoint>/v1
Auth: API key set in the vLLM start command
Provider type: OpenAI
API type: Chat Completions

Adding the RunPod vLLM endpoint as an external OpenAI-compatible connection in Open WebUI

Once saved, the model served by vLLM on RunPod appears in the model selector alongside any other configured backends. From a user perspective, the interface is identical to any other configured model, whether local or a commercial API.

Alternatively, the endpoint can be added to LiteLLM as a named model alias. This is the better option if you want centralized credential management or want to expose the RunPod model alongside other backends under a consistent naming scheme across the stack.

Why This Setup Works Well

The combination of a self-hosted orchestration stack and on-demand GPU inference fits well with a homelab where tooling and workflows are in place but on-premises compute is intentionally kept lean.

A few things make this pattern practical:

Low cost for experimentation. Models run only when needed. A session of an hour or two to test a new model costs a few dollars at most.
Access to current models. Many of the recently published models available on Hugging Face can be loaded into vLLM, which means it is straightforward to test recently released models without waiting for them to appear in a managed API.
No changes to the existing stack. Open WebUI, LiteLLM, SearXNG, and Docling continue to work exactly as before. The RunPod endpoint is just another backend.
Automatable. RunPod exposes an API for managing pods, so deployments can be triggered programmatically. Combined with LiteLLM’s routing, it becomes possible to bring a model endpoint up on demand and tear it down again when it is no longer needed.

Conclusion

Adding RunPod as an on-demand GPU backend closes the main gap in a CPU-only homelab AI stack. The setup requires no changes to the existing infrastructure and takes only a few minutes from template to running endpoint. The result is the ability to experiment with current, capable models at low cost, using the same interface and workflows already in place.

For on-demand model access that does not warrant the cost of persistent GPU hardware, this pattern is worth considering.

References

My Homelab: A Traefik-centered Self-hosting Setup - link
My Local AI Stack: Open WebUI, LiteLLM, SearXNG, and Docling - link
RunPod - project site - link
RunPod - documentation - link
vLLM - project site - link
Hugging Face - model hub - link
RedHatAI models on Hugging Face - link

My Local AI Stack: Open WebUI, LiteLLM, SearXNG, and Docling

Sat, 14 Feb 2026 00:00:00 +0000

Overview of the modular self-hosted AI stack - AI generated

Introduction

In my previous post about my homelab, I described the foundation I use for self-hosted services: a small set of low-power machines, Docker Compose for deployment, Traefik as the reverse proxy, and internal DNS to expose services with clean HTTPS hostnames. I have been running this setup for several years with very little maintenance overhead. That setup turned out to be a good base not only for classic self-hosting, but also for local AI workloads. Over the past two year or so, I started extending it with tools to use and experiment with AI services.

Over time, I wanted more than a single chat UI connected to a single model provider. I wanted a setup that would let me experiment with different models, keep sensitive data inside my own network, enrich prompts with live web results, and work with local documents in a structured way. I also wanted to reuse the same operational patterns I already trusted in the rest of the homelab.

The result is a local AI stack built from four components:

Open WebUI as the browser-based user interface
LiteLLM as the OpenAI-compatible model gateway
SearXNG as the privacy-friendly web search backend
Docling as the document parsing layer for file-based workflows

Individually, each of these tools is useful. Combined, they form a practical self-hosted AI environment that fits neatly into the same Traefik-centered architecture as the rest of my homelab.

Base platform and prerequisites

The AI stack runs on the same infrastructure described in the previous post: refurbished thin clients running CentOS Stream 9, Docker and Docker Compose, Traefik as the reverse proxy, and internal DNS for clean HTTPS hostnames. The key design principle carries over as well: every externally reachable service joins the external Docker network and is exposed through Traefik using labels, giving a consistent way to publish services under HTTPS without managing ports or certificates per application.

My current setup is CPU-only. That matters. It is perfectly usable for orchestration, document processing, and web-augmented prompting, but it is not the right environment for large, latency-sensitive inference workloads. In practice, that constraint pushed me toward an architecture where the user interface, routing, tools, and document workflows run locally, while the model backend remains flexible enough to use either local or remote providers.

Architecture overview

At a high level, the request flow looks like this:

A user opens Open WebUI in the browser.
Open WebUI sends model requests to LiteLLM through its OpenAI-compatible API.
LiteLLM routes the request to the selected backend model.
If a prompt requires live information, Open WebUI can use SearXNG as a search tool.
If a prompt requires document context, uploaded files are parsed with Docling and converted into Markdown.
The model response is returned to Open WebUI and displayed to the user.

This separation of concerns is what makes the stack useful:

Open WebUI handles the human interaction layer
LiteLLM abstracts model backends and credentials
SearXNG provides fresh web context
Docling turns messy source documents into structured text

Traefik remains the single public entry point. From an operations perspective, that is valuable because the AI stack behaves like any other part of the homelab.

Open WebUI as the central interface

Open WebUI is the part of the stack I interact with every day. It provides the browser-based interface for conversations, model selection, file uploads, and tool-assisted prompting. The important point is that Open WebUI does not need to know anything about individual model providers. It only needs a single OpenAI-compatible endpoint, which in this setup is LiteLLM.

That keeps the client configuration simple. If I want to add a new provider, swap one model for another, or change credentials, I do it behind the scenes in LiteLLM without having to reconfigure the user interface. Open WebUI also supports user and group management, making it straightforward to grant access to specific models or restrict certain users to a defined set of backends. A particularly useful feature is the ability to send a single prompt to multiple AI services simultaneously, which makes side-by-side model comparison a natural part of the workflow.

A simplified Docker Compose service definition for Open WebUI in this setup looks like this:

services:
 open-webui:
 image: ghcr.io/open-webui/open-webui:main
 container_name: open-webui
 restart: unless-stopped
 environment:
 - OPENAI_API_BASE_URL=http://litellm:4000/v1
 - OPENAI_API_KEY=${LITELLM_MASTER_KEY}
 volumes:
 - ./data/open-webui:/app/backend/data
 networks:
 - external
 - internal
 labels:
 - "traefik.enable=true"
 - "traefik.docker.network=external"
 - "traefik.http.routers.openwebui.rule=Host(`ai.home.example.com`)"
 - "traefik.http.routers.openwebui.entrypoints=https"
 - "traefik.http.routers.openwebui.tls.certresolver=cloudflare"
 - "traefik.http.services.openwebui.loadbalancer.server.port=8080"

The exact image tag and environment variables may differ depending on the release and your setup, but the pattern stays the same: persistent storage for state, Traefik labels for routing, and a backend API endpoint that points to LiteLLM.

LiteLLM as the model gateway

LiteLLM is the glue that makes the rest of the system flexible. It exposes a single OpenAI-style API while allowing multiple backends underneath. That means I can define logical model names and map them to either local inference backends or remote providers.

This is useful for several reasons:

Open WebUI only has to speak to few API endpoints
I can standardize naming across models
Provider credentials stay centralized
Swapping backends becomes operationally cheap
Logging and usage controls are easier to centralize

The Compose service definition for LiteLLM follows the same pattern:

services:
 litellm:
 image: litellm/litellm:main-v1.83.14-stable.patch.3
 container_name: litellm
 restart: unless-stopped
 command: ["--config", "/app/config.yaml", "--port", "4000"]
 environment:
 - LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY}
 - OPENAI_API_KEY=${OPENAI_API_KEY}
 volumes:
 - ./litellm/config.yaml:/app/config.yaml:ro
 networks:
 - internal
 - external
 labels:
 - "traefik.enable=true"
 - "traefik.docker.network=external"
 - "traefik.http.routers.litellm.rule=Host(`litellm.home.example.com`)"
 - "traefik.http.routers.litellm.entrypoints=https"
 - "traefik.http.routers.litellm.tls.certresolver=cloudflare"
 - "traefik.http.services.litellm.loadbalancer.server.port=4000"

Warning

Security note:
In March 2026, LiteLLM was subject to a suspected supply chain attack in which versions v1.82.7 and v1.82.8 on PyPI contained a malicious payload designed to harvest credentials and exfiltrate them to an external domain. Users running the official LiteLLM Docker image were not affected, as that deployment path pins dependencies and does not rely on the compromised PyPI packages. If you installed LiteLLM via pip during the affected window, treat any secrets on that system as compromised and rotate them immediately. See the official incident report for full details and verified safe versions.

SearXNG for live, privacy-friendly search

One of the biggest limitations of a plain chat interface is the lack of current information. SearXNG solves that problem cleanly. It is a self-hosted metasearch engine that aggregates results from multiple sources and gives me a search API under my own control.

Even outside the AI stack, SearXNG is useful as a search engine. Inside the stack, it becomes more interesting because it can be exposed as a tool for prompts that need fresh information.

A minimal Compose service might look like this:

services:
 searxng:
 image: docker.io/searxng/searxng:latest
 container_name: searxng
 restart: unless-stopped
 volumes:
 - ./searxng:/etc/searxng
 networks:
 - external
 labels:
 - "traefik.enable=true"
 - "traefik.docker.network=external"
 - "traefik.http.routers.searxng.rule=Host(`search.home.example.com`)"
 - "traefik.http.routers.searxng.entrypoints=https"
 - "traefik.http.routers.searxng.tls.certresolver=cloudflare"
 - "traefik.http.services.searxng.loadbalancer.server.port=8080"

Once connected to Open WebUI as a tool, the flow is straightforward:

The user asks a question that requires current information.
The model decides to call the search tool.
SearXNG performs the search.
Titles, snippets, and URLs are returned as context.
The model synthesizes an answer grounded in current results.

Docling for document parsing

The fourth component, Docling, addresses a different problem. Large language models work best with clean text, but many real documents are messy. PDFs, slide decks, and office files often contain broken text flows, layout artifacts, or table structures that are not useful when passed to a model as-is.

Docling converts these documents into a Markdown representation that is much easier to use as model context. That sounds small, but it is a major quality improvement for local document workflows.

The Docling service definition is straightforward:

services:
 docling:
 image: quay.io/docling-project/docling-serve:latest
 container_name: docling
 restart: unless-stopped
 networks:
 - internal
 - external
 labels:
 - "traefik.enable=true"
 - "traefik.docker.network=external"
 - "traefik.http.routers.docling.rule=Host(`docling.home.example.com`)"
 - "traefik.http.routers.docling.entrypoints=https"
 - "traefik.http.routers.docling.tls.certresolver=cloudflare"
 - "traefik.http.services.docling.loadbalancer.server.port=5001"

The typical usage pattern is:

Upload a document in Open WebUI.
Docling parses the file and converts it to Markdown.
Feed that Markdown into the model as structured prompt context.
Ask targeted questions against the extracted content.

This is especially useful for technical notes, whitepapers, internal PDFs, or vendor documentation where the raw file format is not suitable for direct prompting.

Conclusion

This stack did not start as an attempt to build a local alternative to a commercial AI product. It emerged naturally from an existing homelab that already had strong building blocks: containerized services, Traefik, DNS-based routing, and a bias toward self-hosting.

Adding Open WebUI, LiteLLM, SearXNG, and Docling turned that base into a practical local AI environment. It gives me a single interface for model interaction, the ability to swap backends without changing clients, a way to enrich prompts with live web data, and a better workflow for document-driven tasks.

Just as important, it stays operationally consistent with the rest of the homelab. That keeps the setup understandable, maintainable, and worth using day to day.

Future extensions are obvious: adding a vector database, introducing GPU-backed local inference, routing requests to model endpoints running on specialized inference platforms, or using Open WebUI as a gateway to interact with AI agents. But even without those additions, this combination already covers a large share of the AI workflows I actually care about.

References

My Homelab: A Traefik-centered Self-hosting Setup - link
Open WebUI - project site - link
Open WebUI - GitHub - link
LiteLLM - project site - link
LiteLLM - GitHub - link
LiteLLM - Security incident report, March 2026 - link
SearXNG - documentation - link
SearXNG - GitHub - link
Docling - documentation - link
Docling - GitHub - link

My Homelab: A Traefik-centered Self-hosting Setup

Sat, 24 Jan 2026 00:00:00 +0000

Summary of Homelab services - AI generated

Introduction

Several years ago, I began building a small homelab with two primary objectives in mind: gaining hands-on experience with containers and modern application deployment, and running selected services locally to avoid storing certain data in public cloud environments. In hindsight, this environment evolved into a solid foundation for a local AI stack as well, which I now operate alongside the rest of my setup and will detail in a future post. Although the focus here is on a homelab, the technical stack described can be deployed just as easily in any cloud environment, e.g. a VPS or or any hyperscaler, all that is required is a virtual machine running a Linux distribution of your choice and a container engine.

What began as an experiment has turned into a stable setup that I use every day. At the center of this setup is Traefik, which handles all incoming HTTP and HTTPS traffic and lets me access every service over SSL with clean domains like service-name.home.example.com instead of a collection of raw IP addresses and ports.

In this post I will walk through how I structure this homelab, explain how Traefik ties everything together, and outline a selection of the services currently running in my lab.

Hardware and base platform

The homelab does not run on high-end servers. Most of the hosts are refurbished x86 thin clients with the following specifications:

16 to 32 GB of RAM per node
A modest amount of storage for container images, configuration files, and selected data
Low power consumption, which is important for a system that runs 24/7

The environment uses CentOS Stream 9 as the operating system. On top of that, I run Docker and Docker Compose. Nearly every component in the homelab is containerized, with Traefik positioned in front of these containers as a reverse proxy and routing layer.

Architecture overview

At a high level, the architecture looks like this:

Several containers run on the hosts
A dedicated container network called external, where Traefik and all services that are exposed to the home network reside
An internal DNS setup and a private domain, such as home.example.com, where services are exposed as subdomains like:
- https://pihole.home.example.com
- https://ntfy.home.example.com

Clients on the home network resolve these hostnames to the internal IP address of the homelab host, ensuring that traffic remains entirely within the local network. The local DNS server is automatically assigned to clients connected to the internal network, making all services immediately accessible to any device on the same network.
Traefik acts as the single entry point for HTTP and HTTPS. It terminates TLS, routes requests to the appropriate container based on the hostname, and applies middlewares such as redirects and authentication where required.

Traefik as the center of the homelab

Traefik is an open-source reverse proxy and edge router that integrates well with containerized environments. It monitors the container socket, automatically discovers running containers, and uses labels defined on those containers to configure routing.

In my setup, Traefik provides three main benefits:

Automatic TLS for everything
Traefik uses the DNS challenge with my DNS provider to request certificates from Let’s Encrypt. I can issue a wildcard certificate for *.home.example.com, so every internal service gets proper HTTPS without having to manage individual certificates.
Clean hostnames instead of ports
Every service gets its own subdomain, such as pihole.home.example.com or ntfy.home.example.com. This means I do not have to remember that one service is on port 8080, another on 9090, and so on.
Centralized routing and security
Since everything goes through Traefik, I can:
- Redirect all HTTP traffic to HTTPS
- Protect specific endpoints with basic auth or other middleware
- Inspect and debug routes using the Traefik dashboard

Traefik Docker Compose configuration

Here is a simplified version of the Traefik docker-compose.yml I use:

version: "3"

services:
 traefik:
 image: traefik:latest
 container_name: traefik
 restart: unless-stopped
 security_opt:
 - no-new-privileges:true
 networks:
 - external
 ports:
 - 80:80
 - 443:443
 environment:
 - CF_API_EMAIL=${CF_API_EMAIL}
 - CF_DNS_API_TOKEN=${CF_DNS_API_TOKEN}
 volumes:
 - /etc/localtime:/etc/localtime:ro
 - /var/run/docker.sock:/var/run/docker.sock:ro
 - ./data/traefik.yml:/traefik.yml:ro
 - ./data/acme.json:/acme.json
 - ./data/config.yml:/config.yml:ro
 labels:
 - "traefik.enable=true"

 # HTTP router for Traefik dashboard
 - "traefik.http.routers.traefik.entrypoints=http"
 - "traefik.http.routers.traefik.rule=Host(`traefik.home.example.com`)"

 # Redirect HTTP to HTTPS
 - "traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https"
 - "traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https"
 - "traefik.http.routers.traefik.middlewares=traefik-https-redirect"

 # Basic auth for the secure dashboard
 - "traefik.http.middlewares.traefik-auth.basicauth.users=user:hashed-password"

 # HTTPS router for Traefik dashboard
 - "traefik.http.routers.traefik-secure.entrypoints=https"
 - "traefik.http.routers.traefik-secure.rule=Host(`traefik.home.example.com`)"
 - "traefik.http.routers.traefik-secure.middlewares=traefik-auth"
 - "traefik.http.routers.traefik-secure.tls=true"
 - "traefik.http.routers.traefik-secure.tls.certresolver=cloudflare"
 - "traefik.http.routers.traefik-secure.tls.domains[0].main=home.example.com"
 - "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.home.example.com"
 - "traefik.http.routers.traefik-secure.service=api@internal"

networks:
 external:
 external: true

The important ideas are:

Traefik listens on ports 80 and 443 and is connected to the external network.
It uses environment variables to access the DNS provider so it can request certificates from Let’s Encrypt.
The dashboard is exposed at https://traefik.home.example.com, protected by basic auth.
The TLS configuration issues a wildcard certificate for *.home.example.com.

Other services join the same external network and define their own labels, for example:

services:
 ntfy:
 image: binwiederhier/ntfy
 networks:
 - external
 labels:
 - "traefik.enable=true"
 - "traefik.http.routers.ntfy.entrypoints=https"
 - "traefik.http.routers.ntfy.rule=Host(`ntfy.home.example.com`)"
 - "traefik.http.routers.ntfy.tls.certresolver=cloudflare"

With this pattern, every service becomes available over HTTPS under its own subdomain without additional manual configuration in Traefik.

Core services in my homelab

On top of Traefik, I run a set of core services that provide DNS, monitoring, automation, messaging, logging, and secrets management. The key components are:

Pi-hole – DNS: Provides network-wide DNS resolution and ad-blocking, and handles internal DNS for homelab hostnames such as *.home.example.com. Blocking unwanted domains for devices on the network.
Mafl – Dashboard: A minimalistic and flexible homepage for organizing service links, grouping categories, and providing quick navigation. Mafl can perform health checks on linked services, is configured through a simple YAML file, and offers a Progressive Web App for mobile devices. Since each service sits behind Traefik with its own hostname, Mafl serves as a curated entry point to the entire environment.
Ntfy – Messaging / Pub-Sub: A lightweight HTTP-based publish/subscribe notification service used for event-driven messaging across the environment. Typical use cases include sending alerts when backups complete and receiving notifications when containers restart unexpectedly. Ntfy provides mobile and desktop apps, allowing access from phones and laptops both inside and outside the home network, depending on firewall and VPN settings.
Doozle – Container Logs: A simple web-based UI for viewing Docker logs in real time. Logs are accessible through a browser, it is possible to filter by container, and tail logs as they update. This is particularly useful when testing new services or debugging automation workflows.
Beszel – Resource Monitoring: A lightweight monitoring tool for tracking system metrics and container statistics across multiple machines. It provides CPU, memory, and disk usage insights, making it easy to identify overloaded or misbehaving nodes and maintain visibility into the health of thin clients and other devices.
Uptime Kuma – Service Monitoring: A dashboard for monitoring the availability of both internal and external services. It checks defined endpoints, as well as public websites and APIs. If a service becomes unreachable, Uptime Kuma sends alerts, e.g. via Ntfy or other services, providing an early warning system for issues in the homelab.
n8n – Automation Engine: A workflow automation platform used to orchestrate tasks, trigger scripts or containers, and integrate events across services. Typical use cases include reacting to webhooks or scheduled triggers, executing scripts or container actions, and sending notifications through Ntfy when certain conditions are met. Instead of implementing automation logic in custom code, workflows can be modeled visually and integrated directly with containers and external services.
Vaultwarden – Secrets Management: A self-hosted Bitwarden-compatible server for securely managing passwords and sensitive information within the homelab. It stores credentials and secrets for services and accounts, enables secure sharing across devices.

Conclusion

What began as a simple playground for learning containers and avoiding public cloud services for certain use cases has evolved into a practical, resilient platform for running everyday services at home. Centering the setup around Traefik, standardizing on containerized services, and using a wildcard domain with automated TLS have kept the architecture both manageable and extensible. The use of modest, low-power refurbished thin clients has also proven effective in keeping costs and energy consumption low while still offering sufficient resources.

Over time, the homelab has also turned out to be a solid foundation for hosting local AI services, content of a future post. Depending on the criticality of individual services and one’s tolerance for risk, it can be worthwhile to distribute components across independent hosts, monitor services across nodes, or run certain workloads in parallel for redundancy. It is equally important to think carefully about backups to avoid losing data or configurations during failures or experiments. That said, this remains a homelab project rather than a production environment governed by strict service-level agreements; temporary outages are acceptable, and part of the experimentation process.

With these principles such as simple routing, consistent domains and TLS, lightweight hardware, and containerized services, one can build a flexible environment that supports DNS, monitoring, automation, messaging, secrets management, and more, tailored to one’s own needs.

References

CentOS Stream - link
Traefik - reverse proxy - link
Pi-hole - network-wide ad blocking and DNS - link
Mafl - dashboard for homelab services - link
ntfy - publish/subscribe push notifications - link
Doozle - web based interface to monitor logs - link
Beszel - resource monitoring for multiple clients - link
Uptime Kuma - monitoring tool - link
n8n - workflow automation - link
Vaultwarden - Bitwarden-compatible server - link
Youtube Video - Techno Tim: Put Wildcard Certificates and SSL on EVERYTHING - link

Building Better Ideas: Leveraging Lego Serious Play

Sat, 20 Dec 2025 00:00:00 +0000

From Complexity to Clarity: Building Shared Understanding - AI generated

Introduction: From Scepticism to Breakthroughs

The first time I introduced LEGO® SERIOUS PLAY® (LSP) in an IT strategy workshop, I was met with polite smiles and raised eyebrows.
“LEGO and enterprise IT? Really?”

But 15 minutes later, the energy in the room had shifted completely. Participants were leaning forward, hands busy building, voices animated. Skepticism turned into curiosity, and then into creativity and collaboration.

Last year, I took the step to become a Certified Facilitator for the LEGO® SERIOUS PLAY® Method and Materials. Since then, I’ve facilitated several workshops with customers, partners, and colleagues. One pattern stands out every time: once people allow themselves to engage, the method unlocks ideas fast and brings complex topics into focus.

LSP is not a playful gimmick. It’s a structured facilitation method that helps teams surface hidden knowledge, build shared understanding, and make strategic decisions with clarity.

Why LEGO SERIOUS PLAY Works in IT

LSP is powerful because it combines hands-on building, metaphorical thinking, and structured facilitation. In IT, where teams deal with complexity, silos, and competing perspectives, this combination is uniquely effective:

Hand-Brain Principle:
Building with your hands activates parts of the brain often left idle in typical meetings. When people build a model to express an idea, they speak more openly, they’re more creative, and they explain their thinking with more depth than slides or spreadsheets ever could.
Radical Simplification:
IT systems and strategies are complex by nature. LSP forces participants to distill ideas into their essence. The LEGO models don’t make the problem simpler—they make it understandable, visible, and discussable.
A Universal Language:
LEGO bricks cut through jargon and hierarchy. Whether it’s a solution architect, a product manager, or an operations engineer, everyone can build and everyone can contribute. It levels the playing field and gives each voice equal weight.

How the Process Works

LSP isn’t free play—it’s a facilitated, structured method. As a facilitator, I guide participants through a clear process:

Core Steps of the LEGO SERIOUS PLAY Process

The LSP method follows a structured process that ensures all participants are actively engaged and can contribute meaningfully. Each step builds on the previous one, moving from individual ideas to a shared understanding of complex systems.

Skill Building – Getting Comfortable with the Medium:
Every session begins with a warm-up phase designed to help participants get familiar with the materials and the idea of building metaphors.
Through simple, low-stakes exercises—such as building a tower to symbolize resilience or a bridge to represent connection—participants learn to translate abstract concepts into tangible models.
This step lowers inhibitions, builds trust in the method, and gives everyone a shared visual language. Once participants accept and internalize this new way of communicating, the dynamic in the room shifts: conversations become more open, focused, and surprisingly productive.
Setting the Challenge – Framing the Right Question:
Next, the facilitator introduces a focused, meaningful question—such as “What’s blocking us from scaling this platform?” or “What should our future IT landscape look like?”.
This prompt defines the scope and direction of the session and ensures everyone builds toward a shared objective.
Build → Share → Reflect – Unlocking Insights:
- Build: Each participant constructs a model that represents their perspective or answer to the challenge.
- Share: Every person tells the story behind their model, ensuring equal voice and deeper understanding.
- Reflect: As a group, participants identify patterns, contradictions, gaps, and opportunities. This structured storytelling cycle drives richer conversations and helps surface insights that often remain hidden in traditional workshop formats.
From Individual to Shared Model – Creating Shared Understanding:
Once the team is confident and engaged, individual models are combined into a shared model that reflects the collective view.
This step often exposes interdependencies, tensions, and opportunities that no single perspective could have revealed on its own. It’s where alignment, clarity, and actionable strategy start to take shape.

Example from practice:. In a recent architecture strategy workshop, one participant used a single red brick to symbolize a “single point of failure.” That simple metaphor shifted the discussion from vague risk statements to a concrete redesign strategy. This kind of clarity is hard to achieve just with slides.

Where LEGO SERIOUS PLAY Creates Real Value in IT

Strategic Planning and Architecture:
When building roadmaps for complex IT transformations, LSP makes hidden assumptions visible. Business and technical perspectives align more quickly because everyone can literally see the future state in front of them.
Breaking Down Silos:
IT organizations often suffer from fragmented communication. LSP gives everyone a seat at the table. Equal speaking time ensures even quieter voices are heard—often surfacing crucial insights.
Solution Design and RFPs:
When responding to complex solution requirements, LSP allows teams to quickly prototype, test ideas, and align on the best approach. It accelerates clarity.
Defining OKRs (Objectives and Key Results):
Instead of vague PowerPoint bullets, participants build tangible representations of goals, key results, and dependencies. The visual, tactile nature of these models makes alignment much more concrete.

Conclusion - A Facilitator’s Perspective

What I find most fascinating as a facilitator is the moment of collective “aha”, when a group that began the session with crossed arms and quiet skepticism suddenly leans in. Once the first models are on the table, the conversation accelerates: people start building, storytelling, and connecting dots in ways that traditional workshops rarely achieve.

More than once, I’ve heard participants say, “I didn’t think this would work for us—but now we actually see the problem and the solution.”.

In IT, where complexity and competing perspectives are the daily reality, these moments of shared clarity are game-changing.

To summarize: LSP blends the hand-brain principle, radical simplification, and a universal medium into a structured process that turns abstract concepts into tangible shared understanding.
It helps teams make assumptions visible, unlock hidden knowledge, align on what truly matters, and move forward with confidence. It levels the playing field, fosters equal participation, and turns passive discussions into active co-creation.

And that skeptical start I mentioned at the beginning? It’s now one of my favorite moments—because I know what comes next.

If you’re curious how LSP could help your team tackle complex IT challenges, get in touch or connect with me on LinkedIn.

References

Per Kristiansen & Robert Rasmussen: Building a Better Business Using the Lego Serious Play Method - Wiley, 2014
David Hillmer: PLAY! Der unverzichtbare LEGO® SERIOUS PLAY® Praxis-Guide für Workshops, Coachings und Moderation - Hanser, 2023
Hello Agile - Academy and Consultancy link

Compound Simulation – Exploring Portfolio Uncertainty

Sat, 22 Nov 2025 00:00:00 +0000

Introduction

Financial planning is often built on a deterministic story: “If I invest X € each month at 5 % per year, I’ll have Y € in 20 years.” But real markets are anything but deterministic. Price fluctuations, volatility, and unexpected shocks can significantly change outcomes.

This new tool builds on the foundation of the Compound Interest Calculator, which takes a deterministic view of capital growth. This new tool introduces a probabilistic perspective by using Monte Carlo simulation to explore a spectrum of possible portfolio trajectories based on the users assumptions. Instead of a single projected curve, it generates a fan chart that visualizes uncertainty bands, the likelihood of reaching specific targets, and how sensitive outcomes are to your savings rate.

This is not a crystal ball. It’s a scenario explorer — a way to understand how uncertain markets shape financial trajectories.

You can try the web tool here:
Open the Compound Simulation Tool

What the Tool Does

The simulation is based on a small set of input parameters — initial capital, monthly contributions, expected return (μ), volatility (σ), investment horizon, and optionally a target value.

Using these assumptions, the app runs multiple simulation paths and provides:

Fan chart of portfolio trajectories – median, expected path, and uncertainty bands (percentiles).
Distribution of end values – showing the spread of possible outcomes at the horizon.
Target probability – the likelihood of reaching (or exceeding) your goal.
Stress test – a downside scenario with halved returns and doubled volatility.
Savings elasticity – the effect on median outcomes from marginally increasing monthly contributions.

This shifts the focus from a single deterministic projection to a probabilistic view of potential futures.

How to Use It Online

Running the hosted app is straightforward:

Open the simulation tool.
Enter your core parameters:
- Initial Capital [€]
- Monthly Savings [€]
- Annual Return μ
- Volatility σ
- Time Horizon (years)
- (Optional) Define a target and target date.
- (Optional) Enable the Stress Test to explore adverse scenarios.
- (Optional) Add a Savings Elasticity Increment (e.g. +€50/month) to assess sensitivity.

The output includes:

A fan chart showing uncertainty over time.
A distribution histogram of end values.
A target probability indicator.
A sensitivity summary for additional contributions.

Run Locally

If you want to host or modify the simulation app yourself:

Clone the repository

git clone https://github.com/smichard/compound_simulation

Navigate to the project directory

cd compound_simulation

Build the container image Run the following command to build an image named compound_simulation_app (or choose any name you prefer):

podman build -t compound_simulation_app -f Containerfile .

This command uses the provided Containerfile to set up the environment, including all required R packages for running the Shiny app.

Start the app locally

podman run --rm -p 3838:3838 compound_simulation_app

This launches a container and maps port 3838 inside the container to the same port on your host system.

Access the app in your browser

http://localhost:3838/

You should now see the Compound Simulation app running locally.

Why This Matters

Uncertainty is real — any deterministic projection hides the range of plausible outcomes. Markets fluctuate, assumptions shift, and unexpected events can have a lasting impact. Probabilistic thinking helps make better decisions by accounting for both upside and downside scenarios instead of focusing on a single expected path.

Goal probability provides a tangible measure: “What are the chances I’ll reach €X by year Y?”
Savings elasticity reveals whether increasing contributions might be more effective than simply chasing higher returns.
For investors, educators, or anyone exploring financial planning under uncertainty, this tool complements the Compound Interest Calculator by adding a probabilistic layer to previously deterministic projections.

Summary

Compound Simulation brings uncertainty to the forefront. By combining Monte Carlo simulation, sensitivity analysis, and clear visualizations, it highlights that financial projections aren’t fixed—they’re distributions. The tool helps explore not only expected growth but also the range of potential outcomes and their probabilities.

It can be used as a teaching aid, a scenario testing environment, or a personal planning companion. And since it’s open source, you can easily adapt it to your own assumptions, risk parameters, or visualization preferences.

References

Related Post - Compound Interest Calculator - link
Web App - link
GitHub Repository - link

Compound Interest Calculator – Visualizing Capital Growth

Sat, 25 Oct 2025 00:00:00 +0000

Introduction

Understanding how capital develops over time is a cornerstone of financial planning. While compound interest formulas are straightforward on paper, the interplay between savings rate, interest, and time is often less intuitive. To address this, I built the Compound Interest Calculator – a Shiny app that visualizes how capital grows based on different input parameters.

The tool illustrates not only the raw numbers but also the dynamics of savings and interest accumulation. It allows you to model different scenarios, compare strategies, and identify milestones such as when your savings generate more returns than your yearly contributions.

The idea for this tool was sparked after watching a YouTube video that explains why the first €100,000 is such a critical milestone in building wealth. There are many excellent videos and articles that explore this concept in depth. But to make it truly tangible — and to experiment interactively with savings rates, interest assumptions, and time horizons — I decided to build a tool of my own. The result: a simple, hands-on way to see compound interest in action and explore how various strategies may impact the growth of your capital over time.

You can try the web tool here:
Open the Compound Interest Calculator

Prerequisites

The easiest way to use the calculator is online (see above).
If you want to run it locally, you’ll need:

An environment capable of running containers, e.g. Podman, or
R with Shiny installed.

Getting Started

The online version is straightforward:

Open the Compound Interest Calculator.
Enter your investment parameters, e.g. start year, savings rate, interest rate, investment period.
Click Calculate and explore the generated charts and tables.

Input Parameters

Start Year: The year when the investment begins.
Initial Capital: The amount of money you start with.
Savings Rate: The amount of money you plan to save regularly.
Savings Interval: The frequency at which you save the specified savings rate (either monthly or yearly).
Investment Period: The total number of years you plan to invest.
Interest Rate: The annual interest rate (as a percentage) that your capital will earn.
Adjustment Rate: The annual rate (as a percentage) at which your savings rate will increase.
Savings Suspension: The number of years after which you plan to stop saving money.
Target Value: A specific capital value you aim to achieve. The app will indicate when (or if) this value is reached.

Generated Diagrams

Overview: Shows the growth of accumulated savings and total capital over time.
Distribution: Displays a pie chart showing the distribution between total savings and total interest earned.
Savings Rate: Represents the annual savings rate in relation to the value of the generated interest each year. This visualization illustrates the development of both the savings rate and the generated interest over time. Additionally, it highlights the year when the generated interest surpasses the annual savings rate.
Normalized Values: Displays the values of the savings rate and generated interests, both normalized based on the annual growth comprised of the savings rate and yearly interests. This provides a clearer perspective on how each component contributes to the overall growth each year.
Goals: Displays the development of total capital and highlights specific milestones, such as when the capital doubles from the initial investment. It also indicates when the user-defined target value is achieved.
Values: A table that provides a detailed breakdown of the capital at the beginning of the year, savings amount per year, generated interest per year, and capital at the end of the year.

Run Locally

If you want to host the calculator yourself:

Clone this repository

git clone https://github.com/smichard/compound_interest_calculator.git

Navigate to the project directory:

cd compound_interest_calculator

Build the container image: Run the following command to build a Docker image. Replace my_app with a name of your choice for the image.

podman build -t my_app -f Containerfile

This command will use the provided Containerfile to build an image named my_app. The process will install the necessary R packages and set up the environment for the Shiny app.

Run the Shiny app locally:

After building the image, you can run the Shiny app locally using the following command:

podman run --rm -p 3838:3838 my_app

This command will start a container from the my_app image and map port 3838 of the container to port 3838 of your local machine.

Access the Shiny app in a browser Open a web browser and navigate to:

http://localhost:3838/

You should now see your Shiny app running!

Summary

The Compound Interest Calculator helps bridge the gap between abstract formulas and practical insights. It turns the often-theoretical concept of compound growth into something tangible and interactive. By visualizing how capital evolves over time, it allows users to experiment with different savings rates, investment horizons, and interest assumptions — and to see immediately how these variables influence the trajectory of their capital.

Whether used for personal financial planning, educational purposes, or illustrating investment concepts, the tool provides a clear and structured way to explore “what-if” scenarios. It highlights key inflection points — such as when generated interest surpasses annual savings — making the dynamics of compounding easier to grasp and communicate.

Ultimately, the calculator is designed to make complex relationships between time, capital, and interest transparent, empowering users to make more informed, data-driven decisions about their long-term financial strategies.

References

YouTube Video - Nischa: Why Net Worth Skyrockets After $100K - link
Web App - link
GitHub Repository - link

Learning, Building, Growing: My Red Hat Journey So Far

Thu, 02 Oct 2025 00:00:00 +0000

Introduction

On October 1st, I was promoted to Associate Principal Solution Architect at Red Hat. This milestone marks not just a new title, but also an opportunity to reflect on the journey of the past two years.

Reflections

Time has passed remarkably quickly since I joined Red Hat in mid-2023. In that period, I’ve had the privilege to:

Work with highly skilled and deeply technical colleagues across Europe.
Engage in diverse and challenging projects, many of which operate at the forefront of technology.
Learn about, conduct POCs, and deliver workshops in the exciting and fast-moving field of AI.
Deepen my own expertise by pursuing and completing several Red Hat certifications.

It has been a journey of constant learning, collaboration, and growth. There were moments of trial and error, but each step forward brought valuable insights.

Looking Ahead

This promotion is not a finish line but rather a signal to run a bit faster, continue learning, and contribute more. I am grateful to my mentor, my manager, and my colleagues at Red Hat who have guided, challenged, and supported me throughout this journey.

The path ahead is dynamic, and I look forward to building further, learning more, and — hopefully — still keeping a smile on the way.

References

LinkedIn Post - link

Remembering Prof. Dr. Hans-Joachim Queisser

Thu, 10 Jul 2025 00:00:00 +0000

Hans-Joachim Queisser

Sometimes, you meet someone without knowing the lasting impact they’ll have on your life.

I had the privilege of meeting Prof. Queisser during my time at the University of New South Wales, where I was working as a student researcher at the ARC Photovoltaics Centre of Excellence around 2009/2010. I still vividly remember sitting at my desk when Gavin Conibeer, behind me, said, “Hans, may I introduce you to Stephan Michard.” I hadn’t known Prof. Queisser was visiting the institute —and there I was, wearing shorts and flip flops in proper Aussie style, suddenly face to face with a true legend in our field.

Prof. Queisser stayed for about two weeks. During that time, we had several lunches together and some truly memorable conversations. I was deeply impressed by his openness and curiosity — he didn’t care that I was “just a student”. He took real interest in who I was and what I was working on, offering encouragement and thoughtful advice.

Later, he played a pivotal role in helping me secure a PhD position at the Forschungszentrum Jülich, supporting my application with a letter of recommendation. In a private message, he wrote:
“Now make something of it.”
I took that as a challenge, a mission and a source of motivation.

Prof. Queisser’s guidance opened doors for me, and his example continues to inspire me to this day. I will remember him with deep gratitude — as a towering figure in science, and as someone who took the time to care.

References

Max-Planck Institute for Solid State Research LinkedIn Page - link

Book Review: The Start-Up of You by Reid Hoffman and Ben Casnocha

Sat, 12 Apr 2025 00:00:00 +0000

This post is a little different from the usual technical content on this blog. I came across this book through Scott Galloway’s podcast, where Hoffman was a guest. Galloway has a way of making you want to read things immediately, and this was one of those cases. I picked it up while I was at a point in my career where I was actively looking for a change, which in hindsight was probably the ideal moment to read it.

The Authors

Reid Hoffman is the co-founder of LinkedIn and one of the more prominent figures in the Silicon Valley ecosystem. Before LinkedIn, he worked at Apple and PayPal and co-founded SocialNet. After LinkedIn, he became a partner at Greylock and an early investor in companies like Airbnb and Dropbox. He is also the author of Blitzscaling. Whatever one thinks of his platform, his career is a credible basis for writing about professional strategy.

Ben Casnocha is an entrepreneur and author who co-founded Comcate, a software company for local governments, and has worked extensively in the startup and venture capital world. He brings a complementary angle to Hoffman’s perspective, and the collaboration shows in how the book is structured.

What the Book Argues

The central idea is straightforward: in a world where stable, lifelong career paths have largely disappeared, the most useful mental model for managing your career is the one used by startups. That means staying in permanent beta rather than assuming you are finished developing, investing in competitive differentiation, building a strong network as a strategic asset, and maintaining the flexibility to adapt when circumstances change.

One of the more interesting threads in the book is the treatment of serendipity. Hoffman does not dismiss luck as a factor in career outcomes. Instead, he argues that you can meaningfully increase your exposure to fortunate encounters and unexpected opportunities. The way to do that is to be in motion: attend things, meet people, pursue adjacent interests, build genuine relationships rather than transactional ones. You cannot manufacture luck, but you can increase the surface area for it. That is a more honest and useful framing than the usual advice to simply “network more.”

My Take

I read this book during a period when I was actively thinking about a career change. I did not implement all of its suggestions, but I took some of them seriously. I attended two conferences I might otherwise have skipped, and I doubled down on personal learning. Eventually I landed a new job.

Whether the book caused any of that is genuinely unclear to me. It may have been timing, or accumulated momentum that was already building, or simply good luck. But I find it hard to believe that none of it had any influence. The mindset shift the book advocates — from passive career management to something more deliberate and active — is useful, and it was a useful nudge at a specific moment.

For anyone early in their career, in the middle of a transition, or just feeling stuck, this book is worth the few hours it takes to read. It is not a work of profound original thought, but it is honest, practical, and — at the right moment — it lands.

References

The Start-Up of You - link
Scott Galloway - Prof G Pod - link

Why Software Carbon Intensity Matters: An Introduction to the SCI Framework

Mon, 16 Dec 2024 00:00:00 +0000

Introduction

The digital revolution has transformed our world, but at what cost to our environment? Greenhouse gas (GHG) emissions from data centers have already surpassed those of the global airline industry and are expected to continue rising, highlighting the urgent need to address the carbon footprint of software. This article explores the Software Carbon Intensity (SCI) framework, an approach to measuring the environmental impact of software applications. The components of the SCI, its practical applications, and its role in enabling developers, architects, and organizations to create more sustainable software solutions will be explored.

From Application to energy sourcing

Applications are deployed to fulfill specific business needs, and their operation requires careful consideration of availability, reliability, and efficiency. Decisions regarding high availability, backup solutions, and the geographic location of data centers significantly influence the environmental impact of software systems. The underlying infrastructure supporting these applications — comprising servers, storage, and networking components — consumes energy and resources not only during operational use but also throughout their production and manufacturing lifecycle.

The energy consumed for operating data centers is directly linked to the choices made in infrastructure deployment. Whether the hardware is utilized in an on-premises data center, a co-location facility, or within the public cloud, these decisions affect the overall energy demand and resource utilization. Furthermore, the origin of the electricity powering these systems plays a crucial role in determining their energy footprint and GHG emissions.

Understanding the comprehensive impact of these factors — ranging from application design to data center infrastructure and energy sourcing — allows for a detailed assessment of the energy and GHG footprint associated with software operations. This holistic view enables organizations to make informed decisions aimed at reducing their environmental impact.

This diagram outlines the various components influencing the energy and greenhouse gas footprint of software operations, including applications, data center infrastructure, operational energy consumption, and energy utilities.

Introducing the Software Carbon Intensity Framework

To address the pressing need for measuring software’s environmental impact, the Green Software Foundation (GSF) developed the Software Carbon Intensity (SCI) framework, now recognized as an ISO standard. The SCI framework provides a standardized method to calculate the carbon emissions associated with software applications, helping organizations quantify and reduce their environmental footprint.

Although the formula might appear complex at first glance, its components are straightforward:

E (Energy Consumption): The total energy used to operate the software.
I (Carbon Intensity of Energy Source): The amount of CO₂ emitted per kilowatt-hour during the generation of electricity.
M (Embodied Carbon): The CO₂ emissions resulting from manufacturing the hardware that runs the software.
R (Rate of Use): How the software scales—this could be per user, per API call, or any other relevant unit.

Components of the Software Carbon Intensity (SCI) Framework

The SCI formula helps organizations derive a carbon footprint for their software applications by considering both operational and embodied emissions relative to their usage scale. It is important to recognize that the SCI framework is designed to monitor an application’s environmental impact during its ongoing operation, rather than to compare different applications. Such comparisons would require standardized testing procedures and uniform hardware—conditions typically feasible only under controlled laboratory settings, which are unlikely in realistic, real-world scenarios.

The following image illustrates how the key metrics for calculating the SCI value can be derived:

Deriving the Key Components of the Software Carbon Intensity Framework

Why the SCI Framework Matters

Abhishek Gupta of Microsoft, Co-Chair of the SCI Specification Project, emphasizes the practical significance of the SCI framework: “The Software Carbon Intensity specification is exciting because it is a concrete manifestation of broad—and very important!—ideas of how we measure the carbon impacts of software systems. But, more importantly, it is about what we can do to mitigate those impacts,” he explains.

By providing an actionable approach, the SCI framework empowers developers, architects, and organizations to make informed decisions that reduce carbon emissions. The framework focuses on the direct elimination of emissions by encouraging modifications to software systems that use less physical hardware, consume less energy, or leverage lower-carbon energy sources. Neutralization or avoidance offsets are not considered in reducing an SCI score, emphasizing the importance of tangible emission reductions. The SCI score offers a consistent and fair measure of a software system’s carbon footprint, enhancing awareness and transparency of its sustainability credentials. This enables practitioners to set clear targets during development, make evidence-based decisions in design and deployment, and track progress over time.

By systematically applying the SCI framework across their application landscape, organizations can accurately compute the carbon intensity of their software systems. This comprehensive approach enables them to identify key areas where energy efficiency can be enhanced and empowers them to make informed decisions to reduce their overall environmental impact.

The GSF conducts its work openly, following open-source principles, with all discussions, meeting notes, and agenda items publicly accessible on their GitHub repository. This transparent approach allows anyone — not just GSF members — to contribute ideas and participate in discussions.

Challenges and Moving Forward

One of the main challenges in adopting the SCI framework is obtaining accurate and granular data, particularly regarding energy consumption and embodied carbon. Collaboration with hardware manufacturers, data center operators, and energy providers is crucial to gather this information.

The GSF is actively working on case studies to demonstrate the SCI framework’s application in real-world scenarios. These examples aim to refine the framework further and encourage widespread adoption.

Conclusion

The SCI framework represents a significant step forward in promoting transparency by enabling organizations to monitor their software’s carbon emissions. This standardized method for measuring and understanding the carbon footprint associated with software allows companies to see the tangible consequences of their actions. As a result, organizations can make informed decisions and take meaningful steps to reduce their environmental impact.

As our reliance on software continues to grow, integrating sustainability into software development is not just beneficial—it’s imperative. The SCI framework offers a clear path for organizations committed to making a positive environmental difference.

A future blog post will explore practical methods for measuring and reducing the energy and carbon footprint of software applications, utilizing open-source tools and projects from the CNCF ecosystem.

References

Measuring greenhouse gas emissions in data centres: the environmental impact of cloud computing - link
Software Carbon Intensity (SCI) Specification - link
GitHub repository of the Green Software Foundation - link
Software Carbon Intensity (SCI) Specification Achieves ISO Standard Status, Advancing Green Software Development - link
Interview with the Co-Chairs of the SCI Specification Project - link

Note: All links were accessed and verified as of the date of this post.

Enhancing Code Project Documentation through Automated Changelogs

Tue, 26 Mar 2024 00:00:00 +0000

This article was published on March 25, 2024, on opensourcerers.org:

Abstract

In the rapidly evolving landscape of software development, documentation of modifications and updates is crucial for maintaining project continuity and ensuring team alignment. This blog article introduces Conventional Changelog, a tool developed to address this very challenge. The tool transforms a project’s commit history into a detailed, readable changelog. Its adherence to the Conventional Commits and Semantic Versioning practices fosters a well-structured documentation that enhances transparency for users and contributors alike. Versatile by design, it integrates seamlessly into various deployment environments, from local IDEs to continuous integration pipelines like GitHub Actions and Tekton Tasks.

Motivation

As projects evolve, maintaining a clear history of changes becomes a challenge. Traditional methods often fall short, leading to overlooked updates or a cluttered changelog. The need for a solution that not only automates this process but also aligns with best practices in software development — such as Semantic Versioning and Conventional Commits — sparked the idea to develop the proposed tool. Conventional Changelog addresses this gap, offering a solution that is both comprehensive and easy to adopt, ensuring no code commit goes unrecorded.

The proposed approach integrates three foundational best practices to enhance a project’s documentation:

Semantic Versioning: This practice involves structuring version numbers as MAJOR.MINOR.PATCH. Each segment signifies the nature of changes: MAJOR versions indicate incompatible API changes, MINOR versions add features in a backward-compatible manner, and PATCH versions address backward-compatible bug fixes. This method provides a clear, incremental structure for versioning that reflects the scope and impact of changes.
Conventional Commits: Building on the idea of structured commit messages, this practice categorizes code changes to clearly communicate their intent. Based on the Angular Convention for code commits, valid categories are: feat:, fix:, build:, chore:, ci:, docs:, style:, refactor:, perf:, and test:. The proposed tool introduces additional categories such as deploy:, gitops:, and demo:. The motivation is to cover code changes of deployment files (e.g. Kubernetes manifests), code changes which trigger automated GitOps-driven deployments, code changes which are motivated by demonstration purposes. This ensures a well-organized commit history.
Keep a Changelog: Advocates for maintaining a changelog as a curated list of notable changes for each project version. It emphasizes structuring the changelog in a way that is accessible and informative for users, grouping changes by type and listing them chronologically. Including an “Unreleased” section helps to offer visibility into the latest code commit which might be part of upcoming software releases.

Together, these practices offer a comprehensive framework for managing software versioning, commit documentation, and changelog maintenance, making it easier for teams to navigate the complexities of project development and for users to stay informed about significant updates.

Executing the Script: A Multifaceted Approach

The tool can be operated in various ways. These methods are explained in more detail below. To avoid exceeding the scope, minimal examples for the individual options will be used. This flexibility allows developers to choose the best approach for their individual workflow, enhancing productivity and ensuring accurate documentation of project evolution.

Local execution

Conventional Changelog stands out for its adaptability, easily incorporating into the local development environment. Developers can execute the script directly, ensuring their changelog remains up-to-date with every commit.

./generate_changelog_local.sh

Alternatively, utilizing a container engine like Podman or Docker offers an isolated setup, guaranteeing consistent execution across different environments Independent of the underlying operating system.

First, build the container image using the provided Dockerfile. This step creates an image with the necessary environment to run the script:

podman build -t <image-name> -f Dockerfile

After building the image, run the container. This step mounts the current working directory into the container, allowing the script to access and update the changelog file within the project directory:

podman run -it --rm -v "$(pwd):/repo" <image-name> sh

Inside the container, navigate to the mounted repository directory and execute the script. This process generates the changelog within the containerized environment, reflecting the changes back to the local repository:

cd repo
./generate_changelog_local.sh

GitHub Action

Integrating Conventional Changelog into a CI/CD pipeline as a GitHub Action streamlines the process of keeping your changelog current and comprehensive. The configuration of the GitHub Actions workflow allows for the changelog generation to be initiated based on certain git operations, targeted branches, or through workflow dispatch, providing flexibility in how and when updates are documented.
The following GitHub Actions workflow example is designed to trigger the automatic generation of an updated changelog with every code push to the main branch. For this functionality to operate correctly, it’s necessary to adjust the GitHub workflow permissions to have both read and write access in the repository settings (Settings -> Actions -> General -> Workflow permissions).

YAML GitHub Action Workflow


name: Generate Changelog

on:
 push:
 branches: [ main ]

jobs:
 changelog:
 runs-on: ubuntu-latest
 name: Generate and Commit Changelog

 steps:
 - name: Checkout Repository
 uses: actions/checkout@v4

 - name: Generate Changelog
 uses: smichard/conventional_changelog@2.0.0
 with:
 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

 - name: Set Git User Info
 run: |
 git config user.name 'GitHub Actions Bot'
 git config user.email 'actions@github.com'

 - name: Commit Changelog
 run: |
 git add CHANGELOG.md
 git commit -m "docs: :robot: changelog file generated" || echo "No changes to commit"
 git push

This automation streamlines the maintenance of the project’s documentation, ensuring a real-time, accurate account of changes, fixes, and new features. It’s a seamless process that saves time and improves accuracy, crucial for projects with frequent updates.

Tekton Task

Conventional Changelog extends its versatility by offering seamless integration as a task within Tekton pipelines. This feature is particularly beneficial for users operating in Kubernetes and OpenShift environments, allowing for the automation of changelog generation as part of a deployment workflow.

Begin by applying the provided tekton/task_generate_changelog.yml configuration. This step enables using the provided Task as part of a Tekton Pipeline. Make sure to have the git-clone Task installed in your cluster:

oc apply -f tekton/task_generate_changelog.yml

Integrate the provided task into a Tekton pipeline. Find below a minimal pipeline configuration. This pipeline illustrates a minimal configuration which retrieves a Git repository and generates the changelog. However, the provided pipeline can serve as a blueprint to be adopted in a larger context. If the generated changelog file needs to be committed back to the repository, additional steps are required to handle the commit process:

yaml Minimal Tekton Task

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
 name: minimal-pipeline
spec:
 workspaces:
 - name: source
params:
 - name: git-url
 type: string
 description: "URL of the git repository"
tasks:
 - name: fetch-repository
 taskRef:
 name: git-clone
 kind: Task
 workspaces:
 - name: output
 workspace: source
 params:
 - name: url
 value: $(params.git-url)
 - name: revision
 value: "main"
 - name: generate-changelog
 taskRef:
 name: generate-changelog
 workspaces:
 - name: source
 workspace: source
 runAfter:
 - fetch-repository

Apply the pipeline:

oc apply -f tekton/pipeline

Integrating the solution as part of a Tekton pipeline, just as with a GitHub Action workflow, demonstrates the solution’s flexibility and ensures a timely and accurate record of changes, bug fixes, and new features.

Summary

In a dynamic software development world, maintaining an accurate and comprehensive project history is pivotal for team alignment and project continuity. The introduction of Conventional Changelog offers a robust solution to this challenge, transforming commit histories into detailed, structured changelogs. This tool marries the principles of Conventional Commits and Semantic Versioning with the best practices of changelog maintenance, ensuring a transparent and accessible documentation process. Versatile enough to integrate with local IDEs, containerized environments, GitHub Actions, and Tekton Tasks, Conventional Changelog streamlines documentation workflows, making it an essential tool for developers seeking to automate and enhance their project documentation practices. This post presented the motivation behind Conventional Changelog, outlined its background, and provided practical guidance on its multifaceted execution strategies, demonstrating its utility in modern software development environments.

References

GitHub Repository of Conventional Changelog - link
GitHub Action Marketplace - link
Semantic Versioning Specification - link
Conventional Commits Specification - link
Angular Commit Message Guidelines - link
Keep A Changelog Specification - link
Tekton Documentation - link
Documentation for the git clone Tekton Task - link

Posts on Home

Running the Red Hat AI Inference Server on OpenShift

Drop-in OpenAI-compatible inference on OpenShift — RHAIIS packages vLLM for production, with hardware flexibility and a secure external endpoint out of the box - AI generated

Introduction

What is Red Hat AI Inference Server

Prerequisites

Deploying the Red Hat AI Inference Server

vLLM server log output on startup, showing all registered API routes and the final Application startup complete confirmation

nvidia-smi output from inside the vLLM pod, confirming all four A10G GPUs are visible and each tensor-parallel worker has allocated approximately 20 GB of VRAM

Testing the Endpoint

Connecting to Open WebUI

Open WebUI external connection configured against the Red Hat AI Inference Server endpoint

Conclusion

References

Installing OpenShift AI on OpenShift

From GitOps repo to OpenShift AI deployment with verified GPU access in minutes - AI generated]

Introduction

Prerequisites

Selecting the correct GPU instance node type

Installing OpenShift AI

nvidia-smi output confirming GPU access from within the NVIDIA GPU Operator pod

Argo CD application overview after the rhoai-gitops installation completes

Configuring OpenShift AI for GPU Workloads

Configuring a toleration for the NVIDIA GPU taint in the Hardware Profile

nvidia-smi output from inside an OpenShift AI workbench, confirming direct access to the NVIDIA A10G GPU

Conclusion

References

Deploying OpenShift on AWS with Automated Cluster Provisioning

The full provisioning pipeline: CLI setup, ocp-on-aws config, and a single script that spins up VPCs, EC2 instances, DNS records, and an Argo CD baseline - AI generated

Introduction

Prerequisites

Ordering an AWS Blank Environment

Deploying OpenShift on AWS

Preparing the repository

Configuring the installation

Running the installation

EC2 instances and load balancers provisioned in AWS after the installation completes

Argo CD applications deployed as part of the cluster bootstrap

Conclusion

References

Hermes Agent: A Personal AI That Gets More Useful Over Time

How Hermes Agent Works: From Closed-Loop Learning to Multi-Platform Deployment - AI generated

Introduction

How It Works

My Setup

What I Gave It Access To

Practical Uses

Skills and Subagents

Conclusion

References

Extending the Local AI Stack with On-Demand GPU Inference on RunPod

Conceptual illustration of the extended AI stack with elastic cloud GPU resources for running large language models on demand - AI generated

Introduction

A Note on Neo Clouds

Why RunPod

Deploying a vLLM Inference Server on RunPod

A selection of saved vLLM templates on RunPod, each using to a different model from Hugging Face

Creating a Template

Template configuration for the vllm_gemma-3-12b template, showing the container image and start command

Selecting a GPU and Deploying

GPU selection on RunPod, ranging from *RTX 2000 Ada* class cards to *H200* and *B200* datacenter accelerators

Connecting the Endpoint to Open WebUI

Adding the RunPod vLLM endpoint as an external OpenAI-compatible connection in Open WebUI

Why This Setup Works Well

Conclusion

References

My Local AI Stack: Open WebUI, LiteLLM, SearXNG, and Docling

Overview of the modular self-hosted AI stack - AI generated

Introduction

Base platform and prerequisites

Architecture overview

Open WebUI as the central interface

LiteLLM as the model gateway

SearXNG for live, privacy-friendly search

Docling for document parsing

Conclusion

References

My Homelab: A Traefik-centered Self-hosting Setup

Summary of Homelab services - AI generated

Introduction

GPU selection on RunPod, ranging from RTX 2000 Ada class cards to H200 and B200 datacenter accelerators