Building a Self-Hosted AI Assistant with VergeOS, Kubernetes, and Cloudflare
Running AI models in your infrastructure is exciting, but providing easy access to internal customers through a beautiful, secure web interface takes it to the next level. In this post, I'll walk you through how HappyNoises.work (a fictional IT company) built a complete AI assistant web application to provide AI inference capabilities to internal users. The solution connects to AI models running on VergeOS infrastructure, deploys to Kubernetes, and is secured with Cloudflare Tunnel and Access.
๐ฏ The Goal
At HappyNoises.work, a fictional IT company, we wanted to provide our internal customers with easy access to AI inference capabilities. The goal was to create a ChatGPT-like interface for our self-hosted AI models with these requirements:
- Beautiful UI: Modern, responsive design with real-time streaming responses
- Self-hosted Backend: Connect to AI models running on our VergeOS infrastructure
- Kubernetes Native: Deploy as a containerized application in our cluster
- Secure Access: Protect with Cloudflare Access authentication for internal users
- Zero Trust: No exposed ports, everything through Cloudflare Tunnel
- Internal Access: Provide a simple, familiar interface for employees to leverage AI capabilities
๐๏ธ Architecture Overview
Here's the complete architecture:
User Browser
โ
https://ai.happynoises.work (Cloudflare Access)
โ
Cloudflare Tunnel (0c70ff17-acd4-4f9f-bfc4-ac9563a09d4f)
โ
Kubernetes Service (vergeos-ai.vergeos-ai:3001)
โ
Node.js/Express Backend
โ
VergeOS AI API (https://192.168.1.111/v1)
โ
SmolM3 Model (Self-hosted)
Key Components
- Frontend: HTML/CSS/JavaScript with streaming support
- Backend: Node.js with Express and OpenAI SDK
- AI Models: Self-hosted on VergeOS (SmolM3, qwen3-coder-14B)
- Deployment: Kubernetes with ConfigMaps for code
- Security: Cloudflare Tunnel + Access for Zero Trust authentication
๐ค VergeOS: The AI Backend
What is VergeOS?
VergeOS is a hyperconverged infrastructure platform that I'm using to run my AI models. With the latest version of VergeOS, they've added exciting new AI capabilities that provide an OpenAI-compatible API endpoint. This means I can use the standard OpenAI SDK to interact with my self-hosted models without any modifications.
Note: These AI features are brand new in the latest VergeOS release. For more details on VergeOS AI capabilities, check out the official documentation at docs.verge.io.
OpenAI-Compatible API
The beauty of VergeOS is that it exposes an OpenAI-compatible endpoint:
const client = new OpenAI({
baseURL: 'https://192.168.1.111/v1',
apiKey: 'your-api-key'
});
const response = await client.chat.completions.create({
model: 'SmolM3',
messages: [
{ role: 'user', content: 'Hello!' }
]
});
This compatibility means I can use the official OpenAI SDK without modifications, making development much easier.
Models Available
I'm currently running:
- SmolM3: A compact, efficient model great for general tasks
- qwen3-coder-14B: A 14B parameter model specialized for coding tasks
๐ป Building the Web Interface
Frontend Design
I wanted a modern, ChatGPT-like interface with these features:
- Gradient Design: Purple/blue gradient theme
- Real-time Streaming: Word-by-word responses using Server-Sent Events (SSE)
- Conversation History: Saved in browser localStorage
- Code Highlighting: Automatic syntax highlighting for code blocks
- Quick Actions: Pre-configured prompts for common tasks
- Mobile Responsive: Works on all devices
Here's a snippet of the streaming implementation:
async function sendStreamingRequest() {
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: conversationHistory })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim() !== '');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
const parsed = JSON.parse(data);
if (parsed.content) {
// Display content in real-time
contentDiv.innerHTML += parsed.content;
}
}
}
}
}
Backend Implementation
The backend is a simple Express.js server that acts as a proxy between the frontend and VergeOS:
const express = require('express');
const { OpenAI } = require('openai');
const https = require('https');
const app = express();
// Initialize OpenAI client with VergeOS endpoint
const httpsAgent = new https.Agent({ rejectUnauthorized: false });
const client = new OpenAI({
baseURL: process.env.VERGEOS_BASE_URL,
apiKey: process.env.VERGEOS_API_KEY,
httpAgent: httpsAgent,
httpsAgent: httpsAgent
});
// Streaming endpoint
app.post('/api/chat/stream', async (req, res) => {
const { messages } = req.body;
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const stream = await client.chat.completions.create({
model: 'SmolM3',
messages: messages,
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
res.write(`data: ${JSON.stringify({ content })}\n\n`);
}
}
res.write('data: [DONE]\n\n');
res.end();
});
Handling Self-Signed Certificates
Since my VergeOS instance uses a self-signed certificate, I needed to disable SSL verification:
// Environment variable approach (recommended)
process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';
// Or via HTTPS agent
const httpsAgent = new https.Agent({
rejectUnauthorized: false
});
In production, you'd want to use proper certificates, but for a homelab, this works perfectly.
โธ๏ธ Kubernetes Deployment
Why Kubernetes?
Deploying to Kubernetes gives me:
- High Availability: Automatic restarts if the pod crashes
- Easy Updates: Rolling deployments with zero downtime
- Resource Management: CPU and memory limits
- Service Discovery: Internal DNS for service communication
- Scalability: Easy to add more replicas if needed
Deployment Strategy
I used an interesting approach: deploying code via ConfigMaps instead of building Docker images. This makes updates incredibly fast:
apiVersion: v1
kind: ConfigMap
metadata:
name: vergeos-ai-server
namespace: vergeos-ai
data:
index.js: |
const express = require('express');
// ... entire server code here ...
The deployment then mounts this ConfigMap:
apiVersion: apps/v1
kind: Deployment
metadata:
name: vergeos-ai
spec:
template:
spec:
containers:
- name: vergeos-ai
image: node:18-alpine
command:
- sh
- -c
- |
cd /app
npm install express cors openai dotenv
node server/index.js
volumeMounts:
- name: server-code
mountPath: /app/server
- name: public-files
mountPath: /app/public
volumes:
- name: server-code
configMap:
name: vergeos-ai-server
- name: public-files
configMap:
name: vergeos-ai-public
Benefits of This Approach
- Fast Updates: Just update the ConfigMap and restart the pod
- No Docker Build: No need to build and push images
- Version Control: Code is in Git, ConfigMaps generated from files
- Easy Debugging: Can exec into pod and edit code live
Secrets Management
API keys and sensitive data are stored in Kubernetes Secrets:
apiVersion: v1
kind: Secret
metadata:
name: vergeos-ai-config
type: Opaque
stringData:
VERGEOS_BASE_URL: "https://192.168.1.111/v1"
VERGEOS_API_KEY: "your-api-key"
VERGEOS_MODEL: "SmolM3"
NODE_TLS_REJECT_UNAUTHORIZED: "0"
๐ Cloudflare Tunnel & Access
Why Cloudflare Tunnel?
Traditional approaches to exposing homelab services involve:
- Port forwarding (security risk)
- VPNs (inconvenient)
- Reverse proxies with dynamic DNS (complex)
Cloudflare Tunnel solves all of this:
- No Open Ports: Outbound connection from your network
- Automatic HTTPS: SSL/TLS handled by Cloudflare
- DDoS Protection: Cloudflare's network protects you
- Easy Setup: Just run cloudflared in your cluster
Setting Up the Tunnel
First, I added the route to my existing tunnel configuration:
tunnel: 0c70ff17-acd4-4f9f-bfc4-ac9563a09d4f
credentials-file: /etc/cloudflared/creds/credentials.json
ingress:
- hostname: ai.happynoises.work
service: http://vergeos-ai.vergeos-ai.svc.cluster.local:3001
- hostname: dashboard.happynoises.work
service: http://k8s-dashboard.k8s-dashboard.svc.cluster.local:3000
# ... other routes ...
- service: http_status:404
Then created the DNS record:
# CNAME record pointing to tunnel
ai.happynoises.work -> 0c70ff17-acd4-4f9f-bfc4-ac9563a09d4f.cfargotunnel.com
Cloudflare Access: Zero Trust Authentication
Instead of managing passwords or API keys, I used Cloudflare Access for email-based authentication:
# Create Access Application
curl -X POST "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/access/apps" \
-H "X-Auth-Email: ${CF_EMAIL}" \
-H "X-Auth-Key: ${CF_API_KEY}" \
--data '{
"name": "VergeOS AI Assistant",
"domain": "ai.happynoises.work",
"type": "self_hosted",
"session_duration": "24h",
"auto_redirect_to_identity": true
}'
# Create Access Policy
curl -X POST "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/access/apps/${APP_ID}/policies" \
--data '{
"name": "Allow Admin Email",
"decision": "allow",
"include": [
{
"email": {
"email": "[email protected]"
}
}
]
}'
Now when I visit https://ai.happynoises.work:
- Cloudflare intercepts the request
- Prompts for my email
- Sends a one-time code to my email
- I enter the code and get 24 hours of access
No passwords to remember, no VPN to connect to!
๐ Deployment Process
Here's my complete deployment script:
#!/bin/bash
# Create namespace
kubectl create namespace vergeos-ai
# Create ConfigMaps from files
kubectl create configmap vergeos-ai-public \
--from-file=public/ \
--namespace=vergeos-ai
# Copy TLS secret
kubectl get secret happynoises-wildcard-tls -n default -o yaml | \
sed 's/namespace: default/namespace: vergeos-ai/' | \
kubectl apply -f -
# Apply manifests
kubectl apply -f k8s-deployment.yaml
# Wait for deployment
kubectl wait --for=condition=available --timeout=120s \
deployment/vergeos-ai -n vergeos-ai
# Setup Cloudflare Tunnel & Access
./setup-cloudflare-tunnel.sh
echo "Deployment complete!"
echo "Access: https://ai.happynoises.work"
๐ Monitoring & Observability
Health Checks
The backend includes a health endpoint:
app.get('/api/health', (req, res) => {
res.json({
status: 'ok',
vergeosUrl: process.env.VERGEOS_BASE_URL,
model: MODEL
});
});
Kubernetes uses this for liveness and readiness probes:
livenessProbe:
httpGet:
path: /api/health
port: 3001
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/health
port: 3001
initialDelaySeconds: 10
periodSeconds: 5
Logging
All requests are logged with timestamps:
console.log(`[${new Date().toISOString()}] Chat request with ${messages.length} messages`);
View logs with:
kubectl logs -f -n vergeos-ai -l app=vergeos-ai
Resource Usage
Monitor resource consumption:
kubectl top pod -n vergeos-ai
Current allocation:
- Requests: 256Mi RAM, 100m CPU
- Limits: 512Mi RAM, 500m CPU
๐จ Features & User Experience
Streaming Responses
The most impressive feature is real-time streaming. As the AI generates text, it appears word-by-word in the interface, just like ChatGPT:
// Frontend handles streaming chunks
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
fullContent += content;
contentDiv.innerHTML = formatMessage(fullContent);
scrollToBottom();
}
}
Code Formatting
The interface automatically detects and formats code blocks:
function formatMessage(content) {
// Format code blocks (```language\ncode\n```)
formatted = formatted.replace(/```(\w+)?\n([\s\S]*?)```/g,
(match, lang, code) => {
return `<pre><code>${code.trim()}</code></pre>`;
}
);
// Format inline code (`code`)
formatted = formatted.replace(/`([^`]+)`/g, '<code>$1</code>');
return formatted;
}
Conversation History
Conversations are automatically saved to localStorage:
function saveConversationHistory() {
localStorage.setItem('vergeosConversation',
JSON.stringify(conversationHistory));
}
function loadConversationHistory() {
const saved = localStorage.getItem('vergeosConversation');
if (saved) {
conversationHistory = JSON.parse(saved);
// Restore messages to UI
conversationHistory.forEach(msg => {
addMessage(msg.role, msg.content);
});
}
}
Quick Actions
Pre-configured prompts help users get started:
const quickActions = [
"Explain what VergeOS is",
"Write a Python script to list files",
"Explain Kubernetes pods",
"Write a bash script for backups"
];
๐ง Troubleshooting & Lessons Learned
Issue 1: Static Files Not Serving
Problem: Getting "Cannot GET /" when accessing the site.
Cause: When using ConfigMaps to mount code, __dirname resolves differently than expected. The path path.join(__dirname, '../public') was looking in the wrong location.
Solution: Use absolute paths instead:
const publicPath = path.resolve('/app/public');
app.use(express.static(publicPath));
Issue 2: Self-Signed Certificate Errors
Problem: APIConnectionError: Connection error with DEPTH_ZERO_SELF_SIGNED_CERT.
Cause: The OpenAI SDK uses fetch (undici) internally, which doesn't respect httpAgent options for SSL verification.
Solution: Set the Node.js environment variable globally:
env:
- name: NODE_TLS_REJECT_UNAUTHORIZED
value: "0"
Issue 3: DNS Propagation
Problem: DNS not resolving immediately after creation.
Solution: Wait 1-5 minutes for DNS propagation. Use nslookup to verify:
nslookup ai.happynoises.work 8.8.8.8
๐ Performance & Scalability
Current Performance
- Response Time: ~60-120 seconds for first token (streaming)
- Throughput: Handles multiple concurrent users (slow)
- Memory Usage: ~150-200Mi per pod
- CPU Usage: Higher (mostly due to cpu inference)
Horizontal Scaling Options
If I need more capacity:
# Horizontal scaling
kubectl scale deployment vergeos-ai --replicas=3 -n vergeos-ai
# Vertical scaling (edit deployment)
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "1Gi"
cpu: "1000m"
Caching Considerations
For frequently asked questions, I could add Redis caching:
const redis = require('redis');
const client = redis.createClient();
app.post('/api/chat', async (req, res) => {
const cacheKey = `chat:${JSON.stringify(req.body.messages)}`;
// Check cache
const cached = await client.get(cacheKey);
if (cached) {
return res.json(JSON.parse(cached));
}
// Make API call
const response = await makeAIRequest(req.body.messages);
// Cache for 1 hour
await client.setEx(cacheKey, 3600, JSON.stringify(response));
res.json(response);
});
๐ฎ Future Enhancements (as API coverage permits)
Planned Features
- Multi-Model Support: Switch between different AI models in the UI
- File Uploads: Support for document analysis and image understanding
- Conversation Export: Download chat history as markdown
- Custom System Prompts: User-defined AI personalities
- RAG Integration: Connect to vector databases for knowledge retrieval
- API Rate Limiting: Prevent abuse with rate limiting middleware
- Usage Analytics: Track token usage and costs
Advanced Features
- Multi-User Support: User accounts with separate conversation histories
- Team Workspaces: Shared conversations for collaboration
- Prompt Templates: Library of reusable prompts
- Integration Webhooks: Connect to other services (Slack, Discord, etc.)
๐ก Key Takeaways
What Worked Well
- OpenAI SDK Compatibility: Using the standard SDK made development easy
- ConfigMap Deployment: Fast iteration without Docker builds
- Cloudflare Tunnel: Zero-hassle secure access without port forwarding
- Streaming Responses: Great user experience with real-time feedback
- Zero Trust Security: Email-based auth is simple and secure
What I'd Do Differently
- Use Proper Certificates: Set up Let's Encrypt for VergeOS
- Add Monitoring: Prometheus metrics for better observability
- Better Error Handling: More graceful degradation on API failures
- User Management: Proper multi-user support from the start
๐ Resources & References
Documentation
- VergeOS Documentation
- OpenAI API Reference
- Cloudflare Tunnel Docs
- Cloudflare Access Docs
- Kubernetes ConfigMaps
๐ฏ Conclusion
Building a self-hosted AI assistant has been an incredible learning experience. By combining VergeOS for AI model hosting, Kubernetes for orchestration, and Cloudflare for security and access.
The best part? Everything runs in my homelab, giving me complete control over my data and infrastructure. No monthly subscriptions, no data leaving my network (except through the encrypted tunnel), and the ability to customize every aspect of the experience.same goes for your datacentre!
If you're running a homelab and want to experiment with AI, I highly recommend this approach. The combination of modern tools and best practices makes it surprisingly straightforward to build something truly impressive.
Scripts on github!
https://github.com/dvvincent/vergeos-ai-interface
Questions?
Feel free to reach out if you have questions about the setup or want to discuss homelab AI deployments. I'm always happy to help fellow homelabbers!
Tags: #homelab #kubernetes #ai #vergeos #cloudflare #self-hosted #openai #rke2 #docker #devops
Published: November 29, 2025
Last Updated: November 29, 2025