121 Seconds to Kubernetes: Automating Talos Linux on VergeOS with Terraform
What if I told you that you could deploy a production-grade Kubernetes cluster in just over 2 minutes—with zero SSH access, no package managers, and a completely immutable OS?
Most Kubernetes deployments involve hours of configuration, security hardening, and manual intervention. You're juggling Ansible playbooks, fighting with systemd, and praying your SSH keys don't get compromised. There's a better way.
Immutability isn't just a buzzword; it's a security posture. And when you combine Talos Linux—a Kubernetes-only OS with no shell access—with VergeOS's API-driven infrastructure, you get something remarkable: a fully automated, security-hardened cluster that goes from zero to production in the time it takes to grab a coffee.
In our previous article, we explored how to break free from the GUI and master VergeOS using the verge-cli. Today, we're taking that automation to the next level by deploying Talos Linux on VergeOS using Terraform.
This isn't a theoretical guide. This is the complete, battle-tested workflow including every wait time, every troubleshooting step, and every detail you need to successfully deploy production-grade Kubernetes clusters on VergeOS. We're going from VM creation to kubectl get nodes in 121 seconds.
🚀 Why Talos on VergeOS?
Talos Linux removes the complexity of traditional Linux distributions. There is no SSH, no shells, and the entire filesystem is read-only. Management is handled exclusively via a gRPC API (talosctl).
When paired with VergeOS's high-performance VSAN and native API, you get a "Mini-Cloud" that is:
- Immutable: OS configuration is defined in a single YAML file
- Self-Healing: Talos manages the K8s lifecycle automatically
- Atomic: Updates are handled via image swaps, not package managers
- API-Driven: Everything from VM provisioning to cluster bootstrapping is automated
📋 Prerequisites
Before starting, ensure you have:
- VergeOS Instance: Running and accessible (e.g.,
192.168.1.111) - Admin Credentials: Username and password for API access
- Terraform: Installed with the VergeOS provider configured
- talosctl: v1.12.2 or compatible version
- kubectl: For Kubernetes cluster management
- Python 3: For the IP discovery script
- Network: A VergeOS vnet configured for your VMs (e.g., vnet 17)
🏗️ The Complete Deployment Architecture
Our workflow consists of these phases:
- VM Provisioning: Deploy VMs with Terraform
- IP Discovery: Automatically find VM IP addresses
- Configuration Generation: Create Talos configs for control plane and workers
- Cluster Initialization: Apply configs and bootstrap the cluster
- Verification: Confirm cluster health and readiness
📦 Phase 1: VM Provisioning with Terraform
Directory Structure
talos-vergeos-automation/
├── main.tf # VM resource definitions
├── provider.tf # VergeOS provider configuration
├── variables.tf # Input variables
├── scripts/
│ └── get_verge_ip.py # IP discovery automation
└── README.md
Terraform Configuration
provider.tf:
terraform {
required_providers {
vergeio = {
source = "verge-io/vergeio"
}
}
}
provider "vergeio" {
host = var.vergeos_host
username = var.vergeos_user
password = var.vergeos_pass
insecure = true
}
variables.tf:
variable "vergeos_host" {
type = string
description = "VergeOS Host URL/IP"
default = "192.168.1.111"
}
variable "vergeos_user" {
type = string
description = "VergeOS Username"
}
variable "vergeos_pass" {
type = string
description = "VergeOS Password"
sensitive = true
}
variable "talos_image_id" {
type = string
description = "ID of the Talos ISO image in VergeOS"
default = "107" # Update with your ISO ID
}
main.tf:
resource "vergeio_vm" "talos_cp" {
name = "talos-cp-02"
cpu_cores = 4
ram = 8192
powerstate = true
boot_order = "c" # Prioritize CD-ROM
vergeio_drive {
name = "OS Disk"
disksize = 50
interface = "virtio-scsi"
media = "disk"
}
vergeio_drive {
name = "Talos ISO"
media = "cdrom"
media_source = var.talos_image_id
interface = "ide"
}
vergeio_nic {
name = "eth0"
vnet = "17" # Your Kubernetes network
}
}
resource "vergeio_vm" "talos_worker" {
name = "talos-worker-02"
cpu_cores = 4
ram = 8192
powerstate = true
boot_order = "c"
vergeio_drive {
name = "OS Disk"
disksize = 50
interface = "virtio-scsi"
media = "disk"
}
vergeio_drive {
name = "Talos ISO"
media = "cdrom"
media_source = var.talos_image_id
interface = "ide"
}
vergeio_nic {
name = "eth0"
vnet = "17"
}
}
output "talos_cp_ip" {
value = vergeio_vm.talos_cp.vergeio_nic[0].ipaddress
}
output "talos_worker_ip" {
value = vergeio_vm.talos_worker.vergeio_nic[0].ipaddress
}
Deploy the VMs
# Initialize Terraform
terraform init
# Deploy the infrastructure
terraform apply -var="vergeos_user=admin" -var="vergeos_pass=YourPassword" -auto-approve
Expected Output:
vergeio_vm.talos_cp: Creating...
vergeio_vm.talos_worker: Creating...
vergeio_vm.talos_cp: Creation complete after 11s [id=72]
vergeio_vm.talos_worker: Creation complete after 11s [id=73]
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
⏱️ Timing Note: VM creation takes approximately 10-15 seconds.
🔍 Phase 2: Automated IP Discovery
The VMs boot from the Talos ISO and obtain DHCP addresses. We use a custom Python script to discover these IPs automatically.
The IP Discovery Script
scripts/get_verge_ip.py:
#!/usr/bin/env python3
import argparse
import json
import os
import sys
import requests
import urllib3
import time
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def get_env_var(var_name, default=None):
return os.environ.get(var_name, default)
VERGE_HOST = get_env_var("VERGEOS_HOST", "192.168.1.111")
VERGE_USER = get_env_var("VERGEOS_USER", "admin")
VERGE_PASS = get_env_var("VERGEOS_PASS")
def api_request(method, endpoint, auth=None, params=None):
url = f"https://{VERGE_HOST}/api{endpoint}"
try:
if method == "GET":
response = requests.get(url, auth=auth, params=params, verify=False)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
def main():
parser = argparse.ArgumentParser(description="Get VergeOS VM IP Address")
parser.add_argument("--machine-name", type=str, required=True, help="Machine Name")
parser.add_argument("--timeout", type=int, default=300, help="Wait timeout in seconds")
args = parser.parse_args()
auth = (VERGE_USER, VERGE_PASS)
# Resolve machine name to ID
params = {"filter": f"name eq '{args.machine_name}'"}
vms = api_request("GET", "/v4/vms", auth=auth, params=params)
if not vms:
print(f"Machine '{args.machine_name}' not found.", file=sys.stderr)
sys.exit(1)
machine_id = vms[0]['machine']
print(f"Resolved '{args.machine_name}' to Machine ID: {machine_id}", file=sys.stderr)
# Wait for IP address
start_time = time.time()
attempt = 0
while True:
params = {"filter": f"machine eq {machine_id}"}
nics = api_request("GET", "/v4/machine_nics", auth=auth, params=params)
for nic in nics:
mac = nic.get('macaddress')
if not mac:
continue
params = {"filter": f"mac eq '{mac}'"}
addrs = api_request("GET", "/v4/vnet_addresses", auth=auth, params=params)
for addr in addrs:
ip = addr.get('ip')
if ip:
print(ip)
sys.exit(0)
if (time.time() - start_time) > args.timeout:
print(f"Timeout waiting for IP", file=sys.stderr)
sys.exit(1)
elapsed = int(time.time() - start_time)
print(f"Waiting for IP... ({elapsed}/{args.timeout}s)", file=sys.stderr)
time.sleep(5)
attempt += 1
if __name__ == "__main__":
main()
Discover VM IP Addresses
# Set credentials
export VERGEOS_USER="admin"
export VERGEOS_PASS="YourPassword"
# Discover control plane IP (waits up to 5 minutes)
CP_IP=$(python3 scripts/get_verge_ip.py --machine-name talos-cp-02 --timeout 300)
echo "Control Plane IP: $CP_IP"
# Discover worker IP
WORKER_IP=$(python3 scripts/get_verge_ip.py --machine-name talos-worker-02 --timeout 300)
echo "Worker IP: $WORKER_IP"
Expected Output:
Resolved 'talos-cp-02' to Machine ID: 94
Waiting for IP... (0/300s)
10.0.6.166
Resolved 'talos-worker-02' to Machine ID: 95
10.0.6.165
⏱️ Timing Note: IP discovery typically takes 5-30 seconds after VM boot, depending on DHCP response time.
⚙️ Phase 3: Generate Talos Configuration
With the IP addresses discovered, generate the Talos cluster configuration.
# Generate configuration for the cluster
talosctl gen config talos-cluster-2 https://$CP_IP:6443
Expected Output:
generating PKI and tokens
Created /path/to/controlplane.yaml
Created /path/to/worker.yaml
Created /path/to/talosconfig
This creates three files:
controlplane.yaml: Configuration for control plane nodesworker.yaml: Configuration for worker nodestalosconfig: Client configuration fortalosctl
⏱️ Timing Note: Configuration generation is instant (< 1 second).
🚀 Phase 4: Apply Configuration and Bootstrap
Step 1: Apply Configuration to Nodes
# Apply control plane configuration
talosctl apply-config --nodes $CP_IP --file controlplane.yaml --insecure
# Apply worker configuration
talosctl apply-config --nodes $WORKER_IP --file worker.yaml --insecure
⏱️ Timing Note: Each apply-config command completes in 1-2 seconds, but the nodes need time to process the configuration.
⚠️ CRITICAL WAIT TIME: After applying configurations, wait 30-60 seconds before bootstrapping to allow:
- Talos to process the configuration
- etcd to initialize
- Network stack to stabilize
- API server to become available
Step 2: Configure talosctl Client
# Set the endpoint and node for talosctl
talosctl --talosconfig talosconfig config endpoint $CP_IP
talosctl --talosconfig talosconfig config node $CP_IP
Step 3: Bootstrap the Cluster
# Wait for Talos API to be ready
sleep 30
# Bootstrap the cluster
talosctl --talosconfig talosconfig bootstrap
Expected Output:
(no output means success)
⏱️ Timing Note: Bootstrap command completes in 1-2 seconds, but cluster initialization continues in the background.
⚠️ CRITICAL WAIT TIME: After bootstrapping, wait 30-60 seconds for:
- Kubernetes control plane components to start
- CoreDNS to initialize
- CNI (Flannel) to configure networking
- Nodes to register with the API server
🔍 Phase 5: Retrieve kubeconfig and Verify
Get kubeconfig
# Retrieve the kubeconfig
talosctl --talosconfig talosconfig kubeconfig .
This creates a kubeconfig file in the current directory.
Verify Cluster Status
# Check nodes (may show NotReady initially)
kubectl --kubeconfig kubeconfig get nodes -o wide
First Check (immediately after bootstrap):
NAME STATUS ROLES AGE VERSION
talos-8p4-ekr NotReady control-plane 1s v1.35.0
⏱️ CRITICAL WAIT TIME: Wait 20-30 seconds for nodes to become Ready.
Second Check (after 30 seconds):
kubectl --kubeconfig kubeconfig get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE
talos-8p4-ekr Ready control-plane 32s v1.35.0 10.0.6.166 Talos (v1.12.2)
talos-645-r38 Ready <none> 13s v1.35.0 10.0.6.165 Talos (v1.12.2)
Verify System Pods
kubectl --kubeconfig kubeconfig get pods -A
Expected Output:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7859998f6-jzp4v 1/1 Running 0 46s
kube-system coredns-7859998f6-sz8x6 1/1 Running 0 46s
kube-system kube-apiserver-talos-8p4-ekr 1/1 Running 0 39s
kube-system kube-controller-manager-talos-8p4-ekr 1/1 Running 2 39s
kube-system kube-flannel-g6kzp 1/1 Running 0 41s
kube-system kube-flannel-mhxnk 1/1 Running 0 22s
kube-system kube-proxy-8xnxj 1/1 Running 0 22s
kube-system kube-proxy-npwjm 1/1 Running 0 41s
kube-system kube-scheduler-talos-8p4-ekr 1/1 Running 0 39s
⏱️ Complete Timing Breakdown
Here's the complete timeline from start to finish:
| Phase | Step | Time | Cumulative |
|---|---|---|---|
| 1 | Terraform init | 5s | 5s |
| 1 | Terraform apply (VM creation) | 11s | 16s |
| 2 | IP discovery (control plane) | 10s | 26s |
| 2 | IP discovery (worker) | 5s | 31s |
| 3 | Generate Talos config | 1s | 32s |
| 4 | Apply control plane config | 2s | 34s |
| 4 | Apply worker config | 2s | 36s |
| 4 | WAIT for Talos API | 30s | 66s |
| 4 | Bootstrap cluster | 2s | 68s |
| 4 | WAIT for K8s initialization | 30s | 98s |
| 5 | Retrieve kubeconfig | 1s | 99s |
| 5 | WAIT for nodes Ready | 20s | 119s |
| 5 | Verify cluster | 2s | 121s |
Total Time: ~2 minutes from zero to fully operational cluster
🛠️ Managing Existing Clusters
Shutting Down VMs
When you need to shut down your cluster VMs:
# Using the VergeOS API (correct endpoint)
curl -k -u "admin:YourPassword" -X POST \
"https://192.168.1.111/api/v4/vms/72/poweroff"
curl -k -u "admin:YourPassword" -X POST \
"https://192.168.1.111/api/v4/vms/73/poweroff"
⏱️ Timing Note: Wait 15-20 seconds for graceful shutdown to complete before proceeding with other operations.
Cloning for New Deployments
To create a new cluster from an existing configuration:
# Clone the directory
cp -r talos-vergeos-automation talos-cluster-2
# Clean old state
cd talos-cluster-2
rm -f terraform.tfstate terraform.tfstate.backup kubeconfig talosconfig *.yaml
# Update VM names in main.tf
sed -i 's/talos-cp-01/talos-cp-02/g' main.tf
sed -i 's/talos-worker-01/talos-worker-02/g' main.tf
# Deploy new cluster
terraform init
terraform apply -var="vergeos_user=admin" -var="vergeos_pass=YourPassword" -auto-approve
⚠️ Troubleshooting Guide
Issue: Terraform Shows "No Changes" But VMs Are Running
Symptom: terraform apply reports no changes needed, but VMs are still powered on.
Cause: Terraform state is out of sync with actual VM power state.
Solution: Use the VergeOS API directly to manage power state:
curl -k -u "admin:password" -X POST \
"https://192.168.1.111/api/v4/vms/{vm_id}/poweroff"
Issue: "Connection Refused" During Bootstrap
Symptom: talosctl bootstrap fails with "connection refused"
Cause: Talos API not ready yet after apply-config.
Solution: Wait 30-60 seconds after applying configuration before bootstrapping:
talosctl apply-config --nodes $CP_IP --file controlplane.yaml --insecure
sleep 30
talosctl bootstrap
Issue: Nodes Show "NotReady"
Symptom: kubectl get nodes shows nodes in NotReady state.
Cause: CNI (Flannel) or CoreDNS still initializing.
Solution: Wait 20-30 seconds. Check pod status:
kubectl --kubeconfig kubeconfig get pods -n kube-system
Ensure kube-flannel and coredns pods are Running.
Issue: IP Discovery Times Out
Symptom: get_verge_ip.py times out waiting for IP.
Causes:
- VM hasn't booted yet
- DHCP server not responding
- Network misconfiguration
Solution:
# Check VM status in VergeOS
curl -k -u "admin:password" \
"https://192.168.1.111/api/v4/vms?filter=name eq 'talos-cp-02'" | jq
# Verify VM is powered on and check console for boot errors
Issue: Terraform CD-ROM Sync Errors
Symptom: Terraform fails with "Error syncing disks" and 404 on CD-ROM drive.
Cause: CD-ROM drive was removed manually but still exists in Terraform state.
Solution: Either:
- Keep CD-ROM in Terraform config (recommended)
- Or remove from both actual VM and Terraform config together
🎯 Production Considerations
Security Hardening
- Remove CD-ROM After Install:
# Get CD-ROM drive ID
curl -k -u "admin:password" \
"https://192.168.1.111/api/v4/machine_drives" | \
jq '.[] | select(.machine == 94 and .media == "cdrom")'
# Remove it
curl -k -u "admin:password" -X DELETE \
"https://192.168.1.111/api/v4/machine_drives/{drive_id}"
-
Change Boot Order:
Updateboot_orderin Terraform from"c"to"d"after installation. -
Enable Talos RBAC:
Talos v1.12.2 includes RBAC by default. Verify:
talosctl --talosconfig talosconfig version
High Availability
For production, deploy 3 control plane nodes:
resource "vergeio_vm" "talos_cp" {
count = 3
name = "talos-cp-${count.index + 1}"
# ... rest of configuration
}
Update bootstrap to use all control plane IPs:
talosctl gen config talos-cluster https://10.0.6.166:6443,https://10.0.6.167:6443,https://10.0.6.168:6443
Backup and Recovery
- Backup etcd:
talosctl --talosconfig talosconfig etcd snapshot /tmp/etcd-backup.db
- Backup Talos configs:
Storecontrolplane.yaml,worker.yaml, andtalosconfigin secure version control.
🏁 Conclusion
You now have a complete, production-ready workflow for deploying Talos Linux Kubernetes clusters on VergeOS. The key takeaways:
- Automation is achievable - From VM provisioning to cluster bootstrap in ~2 minutes
- Timing matters - Wait times are critical for stability (30s after config apply, 30s after bootstrap)
- IP discovery is essential - The
get_verge_ip.pyscript bridges the gap between Terraform and Talos - API-first approach - Both VergeOS and Talos are fully API-driven, enabling complete automation
The combination of VergeOS's high-performance infrastructure and Talos's immutable, security-focused design creates a powerful platform for modern Kubernetes deployments.
📚 Additional Resources
Ready to deploy your own immutable Kubernetes infrastructure? Clone the repository and get started:
git clone https://github.com/dvvincent/talos-vergeos-automation2
cd talos-vergeos-automation2
terraform init
# Follow the steps above!