Building a Self-Hosted AI Assistant with VergeOS, Kubernetes, and Cloudflare

David Vincent

12 Dec 2025 — 9 min read

Running AI models in your infrastructure is exciting, but providing easy access to internal customers through a beautiful, secure web interface takes it to the next level. In this post, I'll walk you through how HappyNoises.work (a fictional IT company) built a complete AI assistant web application to provide AI inference capabilities to internal users. The solution connects to AI models running on VergeOS infrastructure, deploys to Kubernetes, and is secured with Cloudflare Tunnel and Access.

🎯 The Goal

At HappyNoises.work, a fictional IT company, we wanted to provide our internal customers with easy access to AI inference capabilities. The goal was to create a ChatGPT-like interface for our self-hosted AI models with these requirements:

Beautiful UI: Modern, responsive design with real-time streaming responses
Self-hosted Backend: Connect to AI models running on our VergeOS infrastructure
Kubernetes Native: Deploy as a containerized application in our cluster
Secure Access: Protect with Cloudflare Access authentication for internal users
Zero Trust: No exposed ports, everything through Cloudflare Tunnel
Internal Access: Provide a simple, familiar interface for employees to leverage AI capabilities

🏗️ Architecture Overview

Here's the complete architecture:

User Browser
    ↓
https://ai.happynoises.work (Cloudflare Access)
    ↓
Cloudflare Tunnel (0c70ff17-acd4-4f9f-bfc4-ac9563a09d4f)
    ↓
Kubernetes Service (vergeos-ai.vergeos-ai:3001)
    ↓
Node.js/Express Backend
    ↓
VergeOS AI API (https://192.168.1.111/v1)
    ↓
SmolM3 Model (Self-hosted)

Key Components

Frontend: HTML/CSS/JavaScript with streaming support
Backend: Node.js with Express and OpenAI SDK
AI Models: Self-hosted on VergeOS (SmolM3, qwen3-coder-14B)
Deployment: Kubernetes with ConfigMaps for code
Security: Cloudflare Tunnel + Access for Zero Trust authentication

🤖 VergeOS: The AI Backend

What is VergeOS?

VergeOS is a hyperconverged infrastructure platform that I'm using to run my AI models. With the latest version of VergeOS, they've added exciting new AI capabilities that provide an OpenAI-compatible API endpoint. This means I can use the standard OpenAI SDK to interact with my self-hosted models without any modifications.

Note: These AI features are brand new in the latest VergeOS release. For more details on VergeOS AI capabilities, check out the official documentation at docs.verge.io.

OpenAI-Compatible API

The beauty of VergeOS is that it exposes an OpenAI-compatible endpoint:

const client = new OpenAI({
    baseURL: 'https://192.168.1.111/v1',
    apiKey: 'your-api-key'
});

const response = await client.chat.completions.create({
    model: 'SmolM3',
    messages: [
        { role: 'user', content: 'Hello!' }
    ]
});

This compatibility means I can use the official OpenAI SDK without modifications, making development much easier.

Models Available

I'm currently running:

SmolM3: A compact, efficient model great for general tasks
qwen3-coder-14B: A 14B parameter model specialized for coding tasks

💻 Building the Web Interface

Frontend Design

I wanted a modern, ChatGPT-like interface with these features:

Gradient Design: Purple/blue gradient theme
Real-time Streaming: Word-by-word responses using Server-Sent Events (SSE)
Conversation History: Saved in browser localStorage
Code Highlighting: Automatic syntax highlighting for code blocks
Quick Actions: Pre-configured prompts for common tasks
Mobile Responsive: Works on all devices

Here's a snippet of the streaming implementation:

async function sendStreamingRequest() {
    const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages: conversationHistory })
    });
    
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    
    while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        
        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.trim() !== '');
        
        for (const line of lines) {
            if (line.startsWith('data: ')) {
                const data = line.slice(6);
                if (data === '[DONE]') continue;
                
                const parsed = JSON.parse(data);
                if (parsed.content) {
                    // Display content in real-time
                    contentDiv.innerHTML += parsed.content;
                }
            }
        }
    }
}

Backend Implementation

The backend is a simple Express.js server that acts as a proxy between the frontend and VergeOS:

const express = require('express');
const { OpenAI } = require('openai');
const https = require('https');

const app = express();

// Initialize OpenAI client with VergeOS endpoint
const httpsAgent = new https.Agent({ rejectUnauthorized: false });
const client = new OpenAI({
    baseURL: process.env.VERGEOS_BASE_URL,
    apiKey: process.env.VERGEOS_API_KEY,
    httpAgent: httpsAgent,
    httpsAgent: httpsAgent
});

// Streaming endpoint
app.post('/api/chat/stream', async (req, res) => {
    const { messages } = req.body;
    
    res.setHeader('Content-Type', 'text/event-stream');
    res.setHeader('Cache-Control', 'no-cache');
    res.setHeader('Connection', 'keep-alive');
    
    const stream = await client.chat.completions.create({
        model: 'SmolM3',
        messages: messages,
        stream: true
    });
    
    for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        if (content) {
            res.write(`data: ${JSON.stringify({ content })}\n\n`);
        }
    }
    
    res.write('data: [DONE]\n\n');
    res.end();
});

Handling Self-Signed Certificates

Since my VergeOS instance uses a self-signed certificate, I needed to disable SSL verification:

// Environment variable approach (recommended)
process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';

// Or via HTTPS agent
const httpsAgent = new https.Agent({ 
    rejectUnauthorized: false 
});

In production, you'd want to use proper certificates, but for a homelab, this works perfectly.

☸️ Kubernetes Deployment

Why Kubernetes?

Deploying to Kubernetes gives me:

High Availability: Automatic restarts if the pod crashes
Easy Updates: Rolling deployments with zero downtime
Resource Management: CPU and memory limits
Service Discovery: Internal DNS for service communication
Scalability: Easy to add more replicas if needed

Deployment Strategy

I used an interesting approach: deploying code via ConfigMaps instead of building Docker images. This makes updates incredibly fast:

apiVersion: v1
kind: ConfigMap
metadata:
  name: vergeos-ai-server
  namespace: vergeos-ai
data:
  index.js: |
    const express = require('express');
    // ... entire server code here ...

The deployment then mounts this ConfigMap:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vergeos-ai
spec:
  template:
    spec:
      containers:
      - name: vergeos-ai
        image: node:18-alpine
        command:
        - sh
        - -c
        - |
          cd /app
          npm install express cors openai dotenv
          node server/index.js
        volumeMounts:
        - name: server-code
          mountPath: /app/server
        - name: public-files
          mountPath: /app/public
      volumes:
      - name: server-code
        configMap:
          name: vergeos-ai-server
      - name: public-files
        configMap:
          name: vergeos-ai-public

Benefits of This Approach

Fast Updates: Just update the ConfigMap and restart the pod
No Docker Build: No need to build and push images
Version Control: Code is in Git, ConfigMaps generated from files
Easy Debugging: Can exec into pod and edit code live

Secrets Management

API keys and sensitive data are stored in Kubernetes Secrets:

apiVersion: v1
kind: Secret
metadata:
  name: vergeos-ai-config
type: Opaque
stringData:
  VERGEOS_BASE_URL: "https://192.168.1.111/v1"
  VERGEOS_API_KEY: "your-api-key"
  VERGEOS_MODEL: "SmolM3"
  NODE_TLS_REJECT_UNAUTHORIZED: "0"

🔐 Cloudflare Tunnel & Access

Why Cloudflare Tunnel?

Traditional approaches to exposing homelab services involve:

Port forwarding (security risk)
VPNs (inconvenient)
Reverse proxies with dynamic DNS (complex)

Cloudflare Tunnel solves all of this:

No Open Ports: Outbound connection from your network
Automatic HTTPS: SSL/TLS handled by Cloudflare
DDoS Protection: Cloudflare's network protects you
Easy Setup: Just run cloudflared in your cluster

Setting Up the Tunnel

First, I added the route to my existing tunnel configuration:

tunnel: 0c70ff17-acd4-4f9f-bfc4-ac9563a09d4f
credentials-file: /etc/cloudflared/creds/credentials.json

ingress:
  - hostname: ai.happynoises.work
    service: http://vergeos-ai.vergeos-ai.svc.cluster.local:3001
  - hostname: dashboard.happynoises.work
    service: http://k8s-dashboard.k8s-dashboard.svc.cluster.local:3000
  # ... other routes ...
  - service: http_status:404

Then created the DNS record:

# CNAME record pointing to tunnel
ai.happynoises.work -> 0c70ff17-acd4-4f9f-bfc4-ac9563a09d4f.cfargotunnel.com

Cloudflare Access: Zero Trust Authentication

Instead of managing passwords or API keys, I used Cloudflare Access for email-based authentication:

# Create Access Application
curl -X POST "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/access/apps" \
  -H "X-Auth-Email: ${CF_EMAIL}" \
  -H "X-Auth-Key: ${CF_API_KEY}" \
  --data '{
    "name": "VergeOS AI Assistant",
    "domain": "ai.happynoises.work",
    "type": "self_hosted",
    "session_duration": "24h",
    "auto_redirect_to_identity": true
  }'

# Create Access Policy
curl -X POST "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/access/apps/${APP_ID}/policies" \
  --data '{
    "name": "Allow Admin Email",
    "decision": "allow",
    "include": [
      {
        "email": {
          "email": "[email protected]"
        }
      }
    ]
  }'

Now when I visit https://ai.happynoises.work:

Cloudflare intercepts the request
Prompts for my email
Sends a one-time code to my email
I enter the code and get 24 hours of access

No passwords to remember, no VPN to connect to!

🚀 Deployment Process

Here's my complete deployment script:

#!/bin/bash

# Create namespace
kubectl create namespace vergeos-ai

# Create ConfigMaps from files
kubectl create configmap vergeos-ai-public \
    --from-file=public/ \
    --namespace=vergeos-ai

# Copy TLS secret
kubectl get secret happynoises-wildcard-tls -n default -o yaml | \
    sed 's/namespace: default/namespace: vergeos-ai/' | \
    kubectl apply -f -

# Apply manifests
kubectl apply -f k8s-deployment.yaml

# Wait for deployment
kubectl wait --for=condition=available --timeout=120s \
    deployment/vergeos-ai -n vergeos-ai

# Setup Cloudflare Tunnel & Access
./setup-cloudflare-tunnel.sh

echo "Deployment complete!"
echo "Access: https://ai.happynoises.work"

📊 Monitoring & Observability

Health Checks

The backend includes a health endpoint:

app.get('/api/health', (req, res) => {
    res.json({ 
        status: 'ok', 
        vergeosUrl: process.env.VERGEOS_BASE_URL,
        model: MODEL
    });
});

Kubernetes uses this for liveness and readiness probes:

livenessProbe:
  httpGet:
    path: /api/health
    port: 3001
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /api/health
    port: 3001
  initialDelaySeconds: 10
  periodSeconds: 5

Logging

All requests are logged with timestamps:

console.log(`[${new Date().toISOString()}] Chat request with ${messages.length} messages`);

View logs with:

kubectl logs -f -n vergeos-ai -l app=vergeos-ai

Resource Usage

Monitor resource consumption:

kubectl top pod -n vergeos-ai

Current allocation:

Requests: 256Mi RAM, 100m CPU
Limits: 512Mi RAM, 500m CPU

🎨 Features & User Experience

Streaming Responses

The most impressive feature is real-time streaming. As the AI generates text, it appears word-by-word in the interface, just like ChatGPT:

// Frontend handles streaming chunks
for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    if (content) {
        fullContent += content;
        contentDiv.innerHTML = formatMessage(fullContent);
        scrollToBottom();
    }
}

Code Formatting

The interface automatically detects and formats code blocks:

function formatMessage(content) {
    // Format code blocks (```language\ncode\n```)
    formatted = formatted.replace(/```(\w+)?\n([\s\S]*?)```/g, 
        (match, lang, code) => {
            return `<pre><code>${code.trim()}</code></pre>`;
        }
    );
    
    // Format inline code (`code`)
    formatted = formatted.replace(/`([^`]+)`/g, '<code>$1</code>');
    
    return formatted;
}

Conversation History

Conversations are automatically saved to localStorage:

function saveConversationHistory() {
    localStorage.setItem('vergeosConversation', 
        JSON.stringify(conversationHistory));
}

function loadConversationHistory() {
    const saved = localStorage.getItem('vergeosConversation');
    if (saved) {
        conversationHistory = JSON.parse(saved);
        // Restore messages to UI
        conversationHistory.forEach(msg => {
            addMessage(msg.role, msg.content);
        });
    }
}

Quick Actions

Pre-configured prompts help users get started:

const quickActions = [
    "Explain what VergeOS is",
    "Write a Python script to list files",
    "Explain Kubernetes pods",
    "Write a bash script for backups"
];

🔧 Troubleshooting & Lessons Learned

Issue 1: Static Files Not Serving

Problem: Getting "Cannot GET /" when accessing the site.

Cause: When using ConfigMaps to mount code, __dirname resolves differently than expected. The path path.join(__dirname, '../public') was looking in the wrong location.

Solution: Use absolute paths instead:

const publicPath = path.resolve('/app/public');
app.use(express.static(publicPath));

Issue 2: Self-Signed Certificate Errors

Problem: APIConnectionError: Connection error with DEPTH_ZERO_SELF_SIGNED_CERT.

Cause: The OpenAI SDK uses fetch (undici) internally, which doesn't respect httpAgent options for SSL verification.

Solution: Set the Node.js environment variable globally:

env:
- name: NODE_TLS_REJECT_UNAUTHORIZED
  value: "0"

Issue 3: DNS Propagation

Problem: DNS not resolving immediately after creation.

Solution: Wait 1-5 minutes for DNS propagation. Use nslookup to verify:

nslookup ai.happynoises.work 8.8.8.8

📈 Performance & Scalability

Current Performance

Response Time: ~60-120 seconds for first token (streaming)
Throughput: Handles multiple concurrent users (slow)
Memory Usage: ~150-200Mi per pod
CPU Usage: Higher (mostly due to cpu inference)

Horizontal Scaling Options

If I need more capacity:

# Horizontal scaling
kubectl scale deployment vergeos-ai --replicas=3 -n vergeos-ai

# Vertical scaling (edit deployment)
resources:
  requests:
    memory: "512Mi"
    cpu: "200m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

Caching Considerations

For frequently asked questions, I could add Redis caching:

const redis = require('redis');
const client = redis.createClient();

app.post('/api/chat', async (req, res) => {
    const cacheKey = `chat:${JSON.stringify(req.body.messages)}`;
    
    // Check cache
    const cached = await client.get(cacheKey);
    if (cached) {
        return res.json(JSON.parse(cached));
    }
    
    // Make API call
    const response = await makeAIRequest(req.body.messages);
    
    // Cache for 1 hour
    await client.setEx(cacheKey, 3600, JSON.stringify(response));
    
    res.json(response);
});

🔮 Future Enhancements (as API coverage permits)

Planned Features

Multi-Model Support: Switch between different AI models in the UI
File Uploads: Support for document analysis and image understanding
Conversation Export: Download chat history as markdown
Custom System Prompts: User-defined AI personalities
RAG Integration: Connect to vector databases for knowledge retrieval
API Rate Limiting: Prevent abuse with rate limiting middleware
Usage Analytics: Track token usage and costs

Advanced Features

Multi-User Support: User accounts with separate conversation histories
Team Workspaces: Shared conversations for collaboration
Prompt Templates: Library of reusable prompts
Integration Webhooks: Connect to other services (Slack, Discord, etc.)

💡 Key Takeaways

What Worked Well

OpenAI SDK Compatibility: Using the standard SDK made development easy
ConfigMap Deployment: Fast iteration without Docker builds
Cloudflare Tunnel: Zero-hassle secure access without port forwarding
Streaming Responses: Great user experience with real-time feedback
Zero Trust Security: Email-based auth is simple and secure

What I'd Do Differently

Use Proper Certificates: Set up Let's Encrypt for VergeOS
Add Monitoring: Prometheus metrics for better observability
Better Error Handling: More graceful degradation on API failures
User Management: Proper multi-user support from the start

📚 Resources & References

Documentation

🎯 Conclusion

Building a self-hosted AI assistant has been an incredible learning experience. By combining VergeOS for AI model hosting, Kubernetes for orchestration, and Cloudflare for security and access.

The best part? Everything runs in my homelab, giving me complete control over my data and infrastructure. No monthly subscriptions, no data leaving my network (except through the encrypted tunnel), and the ability to customize every aspect of the experience.same goes for your datacentre!

If you're running a homelab and want to experiment with AI, I highly recommend this approach. The combination of modern tools and best practices makes it surprisingly straightforward to build something truly impressive.

Scripts on github!

https://github.com/dvvincent/vergeos-ai-interface

Questions?

Feel free to reach out if you have questions about the setup or want to discuss homelab AI deployments. I'm always happy to help fellow homelabbers!

Tags: #homelab #kubernetes #ai #vergeos #cloudflare #self-hosted #openai #rke2 #docker #devops

Published: November 29, 2025

Last Updated: November 29, 2025