ControlOS Product Sheet | Control Plane Corporation

Executive Summary

ControlOS: The Data Center Operating System

ControlOS™ is an enterprise-grade platform that transforms bare metal data centers into fully-automated, multi-tenant cloud infrastructure. Built on proven open source technologies and designed for the demands of modern workloads including AI/ML, ControlOS delivers the agility of public cloud with the control, security, and economics of on-premises infrastructure.

Key Value Propositions

Cloud Experience

Self-service VM provisioning in minutes, not weeks

GPU-Ready

Native support for NVIDIA GPU passthrough for AI/ML workloads

Multi-Tenant

Securely serve multiple customers from shared infrastructure

Control Plane Native

Seamless integration with Control Plane's global platform

Open Architecture

Built on proven open source, no vendor lock-in

Enterprise Security

SOC 2, ISO 27001, HIPAA, GDPR compliant architecture

By the Numbers

< 60s

VM Provisioning

99.99%

API Availability SLA

< 100ms

Live Migration Downtime

100%

Tenant Isolation

All NVIDIA

GPU Support

10,000+

Max Cluster Nodes

The Challenge

Data Centers Face a Critical Inflection Point

Operational Complexity

Manual provisioning takes days or weeks
Lack of self-service for tenants
Fragmented tooling across compute, network, storage
Difficult to scale operations

Competitive Pressure

Public cloud offers instant provisioning
Tenants expect cloud-like experiences
AI/ML requires specialized GPU infrastructure
Pressure to reduce costs

Technical Debt

Expensive legacy virtualization licensing
Proprietary vendor lock-in
Difficult DevOps integration
Limited API capabilities

Platform Architecture

Five-Layer Design for Enterprise Reliability

Control Plane Cloud

Optional Global Integration

ControlOS™ API

REST API CLI SDKs Terraform

⚡

COMPUTE

VMs
GPU Passthrough
Live Migration
Scheduling

🌐

NETWORK

VPC
Security Groups
Load Balancers
Floating IPs

💾

STORAGE

Block Storage
Object Storage
Snapshots
Backup

🔐

SECURITY

Identity
Encryption
Audit Logs
Compliance

BARE METAL INFRASTRUCTURE

Node Lifecycle Discovery Provisioning BMC Management

Architecture Layers

Layer 5: Control Plane & API

Orchestration, REST API, multi-tenancy, integration. Envoy Gateway, PostgreSQL, NATS messaging, Vault secrets.

Layer 4: Software-Defined Storage

Ceph distributed storage providing block (RBD), object (S3-compatible RGW), and optional file (CephFS). Self-healing, multi-tier.

Layer 3: Software-Defined Networking

OVN/OVS providing tenant isolation, security groups, distributed routing, NAT, floating IPs, and load balancing.

Layer 2: Virtualization

KVM hypervisor with libvirt management, QEMU emulation, VFIO for GPU passthrough. Near-native performance.

Layer 1: Physical Infrastructure

Tinkerbell bare metal provisioning, automatic hardware discovery, BMC management, PXE boot with secure boot chain.

Technical Foundation

Production-Grade Components, Precisely Integrated

ControlOS is built on battle-tested open source components that power the world's largest infrastructure deployments. Each component was selected for its proven reliability at scale, and integrated with careful attention to failure modes, performance characteristics, and operational requirements.

⚙️

Component Inventory

Every component pinned to specific versions with documented upgrade paths

Component	Version	Role	HA Model
Tinkerbell	Latest	Bare metal provisioning via PXE/iPXE	Active-Passive
KVM/QEMU	Kernel	Type-1 hypervisor (hardware-assisted)	Per-node (stateless)
libvirt	10.x+	VM lifecycle, domain XML, migration	Per-node (state in PostgreSQL)
OVN	24.x+	SDN control plane (logical networks)	Active-Active
Open vSwitch	3.3+	Virtual switch with OpenFlow	Per-node (local state)
Ceph	Squid (19.x)	Distributed block/object storage	Active-Active
PostgreSQL	17.x	Primary state database	Patroni (Leader/Replica)
NATS	2.11+	Message bus (pub/sub, request/reply)	Clustered
Keycloak	26.x+	Identity (OIDC/SAML provider)	Active-Active
Vault	1.18+	Secrets management with auto-unseal	HA with Raft

🖥️

Virtualization: KVM + libvirt + QEMU

The Linux-native hypervisor stack that powers Google Compute Engine, DigitalOcean, Linode, and most OpenStack deployments

How VMs Actually Run

KVM (Kernel-based Virtual Machine)

A Linux kernel module that turns the kernel into a Type-1 hypervisor. Uses VT-x/AMD-V hardware extensions for near-native CPU performance. VMs run as regular Linux processes.

QEMU (Quick Emulator)

Provides device emulation: virtio-blk for disks, virtio-net for networking, Q35 machine type for modern PCIe topology. Handles UEFI boot via OVMF firmware.

libvirt

Manages VM lifecycle via domain XML. Handles creation, migration, snapshots, and resource limits. Exposes unified API regardless of underlying hypervisor.

VFIO (Virtual Function I/O)

Provides direct PCI passthrough via IOMMU groups. GPUs are bound to vfio-pci driver, giving VMs 100% native hardware access with zero hypervisor overhead.

GPU Passthrough Configuration

Full GPU passthrough via VFIO requires proper IOMMU configuration and driver binding. Here's how ControlOS configures GPU nodes:

                                    
                                    
                                    
                                
libvirt-domain.xml (GPU VM)

<!-- GPU Passthrough via VFIO -->
<hostdev mode='subsystem' type='pci' managed='yes'>
  <driver name='vfio'/>
  <source>
    <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/>
  </source>
</hostdev>

<!-- 1GB Huge Pages + Memory Locking -->
<memoryBacking>
  <hugepages>
    <page size='1048576' unit='KiB' nodeset='0'/>
  </hugepages>
  <locked/>
</memoryBacking>

<!-- Pin to same NUMA node as GPU -->
<numatune>
  <memory mode='strict' nodeset='0'/>
</numatune>

<!-- Expose host CPU features -->
<cpu mode='host-passthrough'>
  <topology sockets='1' cores='16' threads='1'/>
</cpu>
                            

Why This Matters

GPU workloads require careful attention to memory locality. By pinning VM memory and CPUs to the same NUMA node as the GPU, we eliminate cross-socket DMA transfers that can reduce training throughput by 15-30%. Huge pages prevent TLB misses during large tensor operations.

Live Migration Implementation

ControlOS uses post-copy migration with auto-converge for large-memory VMs:

1

Pre-copy

Iteratively copy dirty pages

→

2

Auto-converge

Throttle vCPU if needed

→

3

Switchover

<100ms pause

→

4

Post-copy

Demand-page remainder

Migration flags: VIR_MIGRATE_LIVE | VIR_MIGRATE_TUNNELLED | VIR_MIGRATE_POSTCOPY | VIR_MIGRATE_AUTO_CONVERGE

🌐

Networking: OVN + Open vSwitch

Distributed SDN from the Open vSwitch project, proven at Red Hat, IBM, and in OpenStack deployments worldwide

OVN Architecture

Northbound Database

Stores logical topology: switches, routers, ports, ACLs, NAT rules. This is the API-facing layer where ControlOS configures tenant networks.

OVSDB Protocol (TCP 6641)

ovn-northd

Translates logical topology into physical flows. Computes shortest paths, generates logical flows, handles distributed routing decisions.

Active-Active with clustered DB

Southbound Database

Contains compiled flow rules and chassis bindings. Each compute node's ovn-controller subscribes to relevant entries.

OVSDB Protocol (TCP 6642)

ovn-controller (per node)

Runs on every compute node. Reads Southbound DB, programs local OVS with OpenFlow rules. Handles BFD for tunnel health monitoring.

Geneve Encapsulation

Security Group Implementation

Security groups are implemented as OVN ACLs with connection tracking for statefulness:

                                    
                                    
                                    
                                
ovn-nbctl commands

# Create logical switch for tenant network
ovn-nbctl ls-add tenant-a-network

# Create port with MAC/IP binding (anti-spoofing)
ovn-nbctl lsp-add tenant-a-network vm-001-port
ovn-nbctl lsp-set-addresses vm-001-port "fa:16:3e:aa:bb:cc 10.100.0.10"
ovn-nbctl lsp-set-port-security vm-001-port "fa:16:3e:aa:bb:cc 10.100.0.10"

# Allow inbound SSH (to-lport = traffic TO the port)
ovn-nbctl acl-add tenant-a-network to-lport 1000 \
  'outport == "vm-001-port" && ip4 && tcp.dst == 22' allow-related

# Allow inbound HTTPS
ovn-nbctl acl-add tenant-a-network to-lport 1000 \
  'outport == "vm-001-port" && ip4 && tcp.dst == 443' allow-related

# Allow all outbound (from-lport = traffic FROM the port)
ovn-nbctl acl-add tenant-a-network from-lport 1000 \
  'inport == "vm-001-port" && ip4' allow-related

# Default deny inbound (lower priority)
ovn-nbctl acl-add tenant-a-network to-lport 900 \
  'outport == "vm-001-port"' drop
                            

Why This Matters

OVN ACLs use Linux kernel connection tracking (conntrack) for stateful filtering at line rate. Rules are compiled to OpenFlow and executed in the kernel datapath—no userspace packet processing. Port security bindings prevent MAC/IP spoofing at the hypervisor level, providing defense-in-depth even if a VM is compromised.

💾

Storage: Ceph Distributed Storage

Exabyte-scale storage proven at CERN, Bloomberg, and major telecommunications providers

Ceph Architecture

Monitors (MON)

Maintain cluster maps using Paxos consensus. Store OSD map, CRUSH map, MDS map. Require quorum (N/2+1) for cluster operations.

3 or 5 per cluster

Object Storage Daemons (OSD)

One per physical disk. Handle replication, recovery, rebalancing. Use BlueStore backend for direct disk access (no filesystem overhead).

1 GB RAM per TB recommended

CRUSH Algorithm

Controlled Replication Under Scalable Hashing. Determines placement without central lookup. Clients compute locations directly from CRUSH map.

Pseudo-random, deterministic

RBD (RADOS Block Device)

Thin-provisioned block devices striped across OSDs. Supports snapshots, cloning, online resize. Directly accessed by QEMU via librbd.

Native QEMU integration

CRUSH Map for Failure Domain Isolation

The CRUSH map defines your failure domains. ControlOS configures rack-level isolation by default:

                                    
                                    
                                    
                                
crush-map-configuration.sh

# Define physical hierarchy
ceph osd crush add-bucket dc1-row1-rack1 rack
ceph osd crush add-bucket dc1-row1-rack2 rack
ceph osd crush add-bucket dc1-row1-rack3 rack
ceph osd crush add-bucket dc1-row1 row
ceph osd crush add-bucket dc1 datacenter

# Build the tree
ceph osd crush move dc1-row1-rack1 row=dc1-row1
ceph osd crush move dc1-row1-rack2 row=dc1-row1
ceph osd crush move dc1-row1-rack3 row=dc1-row1
ceph osd crush move dc1-row1 datacenter=dc1

# Create CRUSH rule for rack-level failure domain
ceph osd crush rule create-replicated replicated_rack default rack

# Create pool (PG autoscaling enabled by default in Reef)
ceph osd pool create vms replicated replicated_rack
ceph osd pool set vms size 3        # 3 replicas
ceph osd pool set vms min_size 2    # Operate with 2 (degraded)

# Enable RBD application tag
ceph osd pool application enable vms rbd

# Create isolated namespace per tenant
rbd namespace create vms/tenant-a
rbd namespace create vms/tenant-b
                            

Why This Matters

With rack-level failure domains, losing an entire rack (power, ToR switch failure) never loses data—replicas exist in other racks. Tenant namespaces provide storage isolation: Tenant A cannot enumerate or access Tenant B's volumes, enforced at the RADOS layer. The min_size=2 setting allows continued I/O during single-replica failures while maintaining durability.

Storage Performance Tiers

100K+

IOPS

NVMe Tier (4KB random)

3+

GB/s

NVMe Sequential

<0.5

ms

NVMe Latency (avg)

3x

Replication

Default Durability

🔧

Bare Metal Provisioning: Tinkerbell

CNCF project for declarative, workflow-based bare metal provisioning

Provisioning Stack

Smee (DHCP/TFTP)

Serves iPXE binary to PXE-booting servers. Provides boot script URL pointing to Tinkerbell workflow. DHCP (port 67), TFTP (port 69), and HTTP for iPXE scripts.

Hegel (Metadata)

Cloud-init compatible metadata service. Provides instance identity, network configuration, SSH keys. Servers query during boot like EC2 metadata service.

Tink (Workflow Engine)

Stores hardware definitions and workflow templates. Tracks action execution state. Provides gRPC API for workflow management.

Tink-Worker

Runs in-memory on target hardware. Executes containerized workflow actions (disk wipe, partition, image write). Reports status back to Tink server.

Provisioning Workflow

                                    
                                    
                                    
                                
compute-node-workflow.yaml

# Tinkerbell Template (Kubernetes CRD format)
apiVersion: tinkerbell.org/v1alpha1
kind: Template
metadata:
  name: compute-node
spec:
  data: |
    version: "0.1"
    name: compute-node
    global_timeout: 3600
    tasks:
      - name: "os-installation"
        worker: "{{.device_1}}"
        actions:
          - name: "stream-image"
            image: quay.io/tinkerbell/actions/image2disk:v1.0.0
            timeout: 600
            environment:
              IMG_URL: http://images.controlos.local/ubuntu-22.04.raw.zst
              DEST_DISK: /dev/sda
              COMPRESSED: "true"
          - name: "write-netplan"
            image: quay.io/tinkerbell/actions/writefile:v1.0.0
            timeout: 90
            environment:
              DEST_DISK: /dev/sda3
              DEST_PATH: /etc/netplan/config.yaml
              CONTENTS: {{.netplan_config}}
          - name: "install-agent"
            image: controlos/node-bootstrap:v1.0.0
            timeout: 300
            environment:
              API_ENDPOINT: https://api.controlos.local
              NODE_TOKEN: {{.node_token}}
                            

🛡️

High Availability Implementation

No single point of failure at any layer

Control Plane HA

API Servers (Stateless)

Multiple instances behind load balancer. Health checks on /healthz. Any instance can serve any request. Failed instance removed from pool in <10s.

PostgreSQL (Patroni)

Leader election via etcd. Synchronous replication to standby. Automatic failover in <30 seconds. Point-in-time recovery with WAL archiving.

etcd (Raft Consensus)

3 or 5 node cluster. Tolerates (N-1)/2 failures. Used by Patroni for PostgreSQL leader election. OVN uses its own clustered OVSDB protocol.

Vault (Integrated Storage)

Raft-based HA with auto-unseal via cloud KMS or HSM. Secrets remain available with N/2+1 nodes. Audit logging to immutable storage.

VM High Availability

VMs with HA enabled are automatically restarted on healthy hosts if their current host fails:

1

Detect

Host heartbeat missed (30s)

→

2

Fence

IPMI power off (prevent split-brain)

→

3

Schedule

Select new host

→

4

Restart

Boot from Ceph storage

Recovery time: typically <60 seconds from detection to VM running on new host

Failure Scenarios

Failure	Impact	Recovery
Single API node	None (load balanced)	Automatic (<10s)
PostgreSQL primary	Brief API unavailability	Patroni failover (<30s)
Single Ceph OSD	None (replicated)	Automatic recovery
Entire storage node	Degraded (still available)	Automatic rebalancing
Compute node	VMs on that node	HA restart (<60s)
Full rack loss	Degraded storage, some VMs	Automatic (CRUSH isolation)

📊

Measured Performance Characteristics

Benchmarks from production deployments, not theoretical maximums

8-15

seconds

VM Boot (API to SSH)

<100

ms

Live Migration Downtime

25

Gbps

Single VM Network

<100

μs

Network Latency (same rack)

Scalability Tested Limits

Resource	Tested Limit	Notes
Nodes per cluster	1,000+	Linear scaling with proper network
VMs per node	200+	Depends on VM size and resources
VMs per cluster	100,000+	With proper PostgreSQL sizing
Networks per tenant	100	OVN logical switches
Security group rules	1,000	Per security group
Volumes per tenant	10,000	Ceph RBD images

Core Capabilities

Compute

Virtual Machine Management

Feature	Description
Instant Provisioning	VMs ready in < 60 seconds
Flexible Sizing	Custom vCPU, memory, and disk configurations
Live Migration	Move VMs between hosts with < 100ms downtime
GPU Passthrough	Full NVIDIA GPU access via VFIO
Nested Virtualization	Run VMs inside VMs for testing
Cloud-Init	Automated VM configuration on first boot
Snapshots	Point-in-time VM state capture
Templates	Golden images for rapid deployment

Supported GPU Models

GPU Model	Memory	Use Case
NVIDIA A100	40/80 GB	Large model training
NVIDIA H100	80 GB	Next-gen AI workloads
NVIDIA L40S	48 GB	Inference, graphics
NVIDIA A10	24 GB	Inference, VDI
NVIDIA T4	16 GB	Cost-effective inference

Networking

Private Networks

Isolated L2/L3 networks per tenant

Security Groups

Stateful firewall rules

Floating IPs

Public IP addresses for VMs

Load Balancers

L4 load balancing with health checks

VPN Gateway

Site-to-site VPN connectivity

DNS Integration

Automatic DNS for VMs

Network Performance

Metric	Value
Internal Bandwidth	25-100 Gbps per node
Latency (same rack)	< 100 μs
Latency (cross-rack)	< 500 μs
Security Group Throughput	Line rate
Overlay Protocol	Geneve (RFC 8926)

Storage

Storage Performance Tiers

Tier	IOPS	Throughput	Latency
NVMe	100,000+	3 GB/s	< 0.5 ms
SSD	50,000	1 GB/s	< 1 ms
HDD	5,000	200 MB/s	< 10 ms

Object Storage (S3-Compatible)

Full S3 API

AWS S3 API compatibility

Unlimited Buckets

Per tenant bucket creation

Versioning

Object version history

Lifecycle Policies

Automatic expiration and transitions

Security & Compliance

Defense in Depth Architecture

Layer 5: Application Security

API Authentication • RBAC • Audit Logging

Layer 4: Data Security

Encryption at Rest • Encryption in Transit • Key Management

Layer 3: Tenant Isolation

Network Isolation • Storage Isolation • Compute Isolation

Layer 2: Network Security

Segmentation • Firewalls • IDS/IPS • DDoS Protection

Layer 1: Infrastructure Security

Secure Boot • TPM • Host Hardening • Physical Security

Compliance Certifications

SOC 2 Type II Ready ISO 27001 Ready HIPAA Ready (with BAA) GDPR Compliant PCI DSS Ready

Encryption Standards

Scope	Method
In Transit (API/Control)	TLS 1.3
In Transit (Overlay)	IPsec (OVN native)
At Rest (Storage)	AES-256 dm-crypt
At Rest (Database)	AES-256 (encrypted volume)
Secrets	Vault with auto-unseal

🔐

Authorization: OPA + Keycloak

Policy-based access control with external authorization

Authentication Flow

1

Login

OIDC/SAML to Keycloak

→

2

JWT Issued

Claims include tenant, roles

→

3

API Request

JWT in Authorization header

→

4

OPA Check

Policy decision

OPA Policy Implementation

Fine-grained authorization enforced at the API gateway using Open Policy Agent:

                                    
                                    
                                    
                                
authz.rego

package controlos.authz

import future.keywords.in

default allow = false

# Tenant admin can manage their tenant's resources
allow {
    input.user.roles[_] == "tenant_admin"
    input.resource.tenant == input.user.tenant
}

# Users can manage VMs in their tenant
allow {
    input.user.roles[_] == "user"
    input.resource.tenant == input.user.tenant
    allowed_user_actions[input.action]
}

allowed_user_actions := {
    "vm:create", "vm:read", "vm:delete",
    "vm:start", "vm:stop", "volume:create",
    "volume:read", "network:read"
}

# Quota enforcement
deny[msg] {
    input.action == "vm:create"
    tenant_vms := count([vm |
        vm := data.vms[_]
        vm.tenant == input.user.tenant
    ])
    tenant_vms >= data.quotas[input.user.tenant].max_vms
    msg := "Tenant VM quota exceeded"
}
                            

Why This Matters

OPA policies are evaluated in <1ms. They're version-controlled, testable, and auditable. Unlike embedded authorization code, OPA policies can be updated without redeploying services. Every API decision is logged with the policy version that made it—critical for compliance audits.

Multi-Tenancy

Complete Isolation at Every Layer

Tenant A

Isolated Environment

🖥️

VM 1

🖥️

VM 2

🌐

Network 10.A.0.0/16

Private VPC with isolation

💾

Storage Pool (RBD)

Dedicated Ceph namespace

🔒

Tenant B

Isolated Environment

🖥️

VM 1

🖥️

VM 2

🌐

Network 10.B.0.0/16

Private VPC with isolation

💾

Storage Pool (RBD)

Dedicated Ceph namespace

No communication possible between tenants

Isolation Guarantees

Layer	Isolation Mechanism
Compute	Separate QEMU processes, no shared memory
Network	OVN logical switches, VNI segmentation
Storage	Ceph namespaces, separate pools
API	Tenant-scoped tokens, RBAC
Secrets	Vault namespaces per tenant

🔒

How Isolation Is Enforced

Technical mechanisms at each layer

Network: OVN Logical Switches

Each tenant gets dedicated OVN logical switches with unique VNI (Virtual Network Identifier) tags. Geneve encapsulation ensures traffic never mixes at L2. OVN port-security binds MAC/IP to prevent spoofing.

VNI space: 16 million networks

Storage: Ceph RADOS Namespaces

Each tenant's volumes exist in a dedicated RBD namespace. Ceph capabilities (cephx) are scoped per-tenant—Tenant A's credentials cannot access Tenant B's namespace at the RADOS protocol level.

Enforced by Ceph MONs

Compute: Process Isolation

Each VM runs as a separate QEMU process with dedicated memory. No shared memory regions between VMs. SELinux/AppArmor provides mandatory access control on hypervisor hosts.

sVirt labeling per VM

Secrets: Vault Namespaces

Each tenant has a dedicated Vault namespace. Policies are scoped per-namespace—a token for Tenant A cannot access secrets in Tenant B's namespace, even with an admin role.

Hierarchical namespaces

Defense in Depth

Tenant isolation isn't a single barrier—it's enforced independently at every layer. A bug in one component (say, the API) cannot bypass network isolation (OVN) or storage isolation (Ceph namespaces). Each layer authenticates and authorizes independently.

Deployment Options

Small

10-50 VMs

6

Nodes Minimum

~200

vCPU Capacity

~50 TB

Storage

25 GbE

Network

3 converged control nodes + 3+ compute nodes

Medium

50-500 VMs

18+

Nodes

~1,000

vCPU Capacity

~500 TB

Storage

25-100 GbE

Network

3 dedicated control + 5+ storage + 10+ compute

Large

500+ VMs

100+

Nodes

10,000+

vCPU Capacity

5+ PB

Storage

100 GbE

Network

Multi-AZ with 6+ control, 30+ storage, 50+ compute

Deployment Timeline

Planning

1-2 weeks — Requirements, design, procurement

Infrastructure

1-2 weeks — Rack, power, network cabling

Deployment

1 week — ControlOS installation

Configuration

1 week — Customization, integration

Testing & Go-Live

1 week — Validation, performance tuning, production cutover

Total: 4-8 weeks from planning to production

Operational Excellence

Monitoring & Alerting

Pre-built Grafana dashboards
100+ pre-configured alert rules
SLO tracking
Capacity planning
Centralized logging with Loki
Distributed tracing with Jaeger

Automated Operations

Auto-healing VM restart on failure
Automatic workload balancing
Ceph automatic data redistribution
Automated certificate renewal
Scheduled backups with retention

Zero-Downtime Maintenance

Rolling node updates with migration
Blue-green control plane updates
Online storage expansion
Online compute expansion

Support Tiers

Tier	Response Time	Coverage	Includes
Standard	< 4 hours	Business hours	Email, portal
Premium	< 1 hour	24x7	Email, portal, phone
Enterprise	< 15 minutes	24x7	Dedicated TAM, on-site option

Infrastructure as Code

Terraform Provider

                            
                            
                            
                            main.tf
                        

# Example: Complete application stack
terraform {
  required_providers {
    controlos = {
      source  = "controlplane/controlos"
      version = "~> 1.0"
    }
  }
}

provider "controlos" {
  endpoint = "https://api.controlos.example.com"
  api_key  = var.api_key
}

# Network
resource "controlos_network" "app" {
  name = "production-network"
  cidr = "10.100.0.0/24"
}

# Load Balanced Web Servers
resource "controlos_vm" "web" {
  count  = 3
  name   = "web-${count.index + 1}"
  image  = "ubuntu-22.04"
  flavor = "m1.medium"
  network_id = controlos_network.app.id
}

# GPU ML Server
resource "controlos_vm" "ml" {
  name   = "ml-training"
  image  = "ubuntu-22.04-cuda"
  flavor = "g1.xlarge"

  gpu {
    count = 2
    type  = "nvidia-a100"
  }
}
                        

CI/CD Integration

GitHub Actions

Native integration with examples

GitLab CI

Native integration with examples

Jenkins

Pipeline examples, plugins

ArgoCD

GitOps integration

Comparison with Alternatives

ControlOS vs. VMware vSphere

Capability	VMware vSphere	ControlOS™
Licensing	Per-socket, expensive	Simple, predictable
GPU Passthrough	vGPU licensed separately	Included (VFIO)
Multi-Tenancy	Complex (NSX-T extra)	Native, included
API	SOAP-based, complex	REST, OpenAPI
Cloud Integration	VMware Cloud only	Control Plane + open
Open Source	Proprietary	Open core

Total Cost of Ownership

Cost Factor	Legacy Platforms	ControlOS™
Software Licensing	$$$$	$$
GPU Licensing	$$$	Included
Hardware Lock-in	$$$ (vendor premium)	$0 (open hardware)
Operations Staff	$$$ (specialists)	$$ (simplified ops)
3-Year TCO	$$$$$	$$

Customer Success

Use Case

AI/ML Infrastructure Provider

A growing AI startup deployed ControlOS on 50 GPU nodes (200 NVIDIA A100 GPUs) with instant GPU VM provisioning, multi-tenant isolation for enterprise customers, and S3-compatible storage for datasets.

70%

Cost reduction vs. public cloud

Minutes

VM provisioning (was days)

100%

GPU utilization

Zero

Security incidents

Use Case

Managed Services Provider

An MSP deployed ControlOS across 3 data centers with per-tenant networks, quotas, and billing, white-label self-service portal, and SOC 2 Type II compliant architecture.

500+

Tenants on shared infrastructure

99.99%

Uptime achieved

SOC 2

Certification obtained

40%

Margin improvement

Pricing

Simple, Predictable Pricing Based on Managed Node Capacity

Tier	Nodes	Support	Price
Starter	Up to 10	Standard	Contact Sales
Professional	Up to 50	Premium	Contact Sales
Enterprise	Unlimited	Enterprise	Contact Sales

All Tiers Include

Full Platform

Complete ControlOS platform

GPU Support

GPU passthrough included

Multi-Tenancy

Full tenant isolation

API Access

REST API, CLI, SDKs

Terraform

Full Terraform provider

Updates

All software updates