Cloud

Overview

Cocos AI deploys confidential computing workloads across multi-cloud environments using purpose-built Confidential Virtual Machines (CVMs) that provide hardware-level memory encryption and integrity guarantees. The cloud infrastructure implements a defense-in-depth security model combining customer-managed encryption, confidential computing primitives, and runtime integrity monitoring to establish a verifiable trusted execution environment.

Architecture Components

Agent Runtime Environment

The Cocos agent operates as the core computational engine within each CVM, providing a secure execution context for collaborative confidential computing algorithms. The agent supports multiple execution runtimes:

Docker Containers: Containerized workloads with hardware-encrypted memory isolation
WebAssembly (Wasm) Modules: Lightweight, sandboxed execution via WasmEdge runtime
Python Scripts: Native Python execution with confidential computing guarantees
ELF Binaries: Direct native code execution within the trusted environment

Multi-Cloud Deployment Strategy

Microsoft Azure Implementation

Azure deployment leverages Confidential VM SKUs (Standard_DC*ads_v5) with VMGuestStateOnly encryption and customer-managed disk encryption sets (DES) supported by SEV-SNP (Secure Encrypted Virtualization with Secure Nested Paging) for memory encryption and integrity protection.

Key Vault Configuration:

resource "azurerm_key_vault" "encryption_vault" {
  sku_name                   = "premium"          # FIPS 140-2 Level 2 HSMs
  purge_protection_enabled   = true               # Prevents key destruction
  soft_delete_retention_days = 7                  # Recovery window
}

Confidential VM Specification:

resource "azurerm_linux_virtual_machine" "confidential" {
  size = "Standard_DC${var.vcpu}ads_v5"           # Confidential compute SKU
  
  os_disk {
    security_encryption_type = "VMGuestStateOnly"  # Guest state encryption
    disk_encryption_set_id   = var.disk_encryption_id
  }
  
  vtpm_enabled        = true                      # Virtual TPM for attestation
  secure_boot_enabled = true                      # Verified boot chain
}

Google Cloud Platform Implementation

GCP deployment utilizes AMD Milan-based N2D instances with SEV-SNP (Secure Encrypted Virtualization with Secure Nested Paging) for memory encryption and integrity protection.

Confidential Computing Configuration:

resource "google_compute_instance" "confidential" {
  machine_type     = "n2d-standard-${var.vcpu}"
  min_cpu_platform = "AMD Milan"                  # SEV-SNP requirement
  
  confidential_instance_config {
    enable_confidential_compute = true
    confidential_instance_type  = "SEV_SNP"       # Hardware memory encryption
  }
  
  shielded_instance_config {
    enable_integrity_monitoring = true            # Boot integrity verification
    enable_secure_boot          = true            # Verified boot process
    enable_vtpm                 = true            # Virtual TPM
  }
  
  scheduling {
    on_host_maintenance = "TERMINATE"             # Prevents live migration
  }
}

Security Architecture

Memory Protection Mechanisms

Azure VMGuestStateOnly Encryption: Encrypts VM guest state including memory, CPU state, and temporary storage while maintaining host visibility for management operations.

GCP SEV-SNP Protection: Provides comprehensive memory encryption with integrity guarantees, protecting against both passive memory snooping and active memory corruption attacks from the hypervisor layer.

Customer-Managed Key Infrastructure

Both platforms implement customer-controlled encryption keys with hardware security module (HSM) backing:

Azure Key Vault Premium: FIPS 140-2 Level 2 validated HSMs with comprehensive key lifecycle management and access policies restricted to cryptographic operations (decrypt, encrypt, wrapKey, unwrapKey).

GCP Cloud KMS: Customer-managed encryption keys (CMEK) with automated 90-day rotation policies and global key distribution for multi-region deployments.

Integrity Measurement Architecture

The deployment integrates Linux Integrity Measurement Architecture (IMA) with hardware-based attestation:

# IMA Policy Configuration
ima_policy=tcb    # Trusted Computing Base measurement

IMA Implementation Details:

Boot-time Measurement: All critical system components are measured during boot sequence
Runtime Monitoring: Continuous measurement of executed files and loaded libraries
Attestation Support: Generates cryptographic proofs of system integrity state
Cache Optimization: Pre-measurement of frequently accessed files to minimize runtime overhead

Agent Provisioning Pipeline

Cloud-Init Orchestration

The agent deployment utilizes a multi-stage Cloud-Init configuration that implements security hardening alongside functional provisioning.

Stage 1: Base System Hardening

# Disable remote access vectors
- systemctl disable ssh.service sshd.service
- systemctl stop ssh.service sshd.service

# Network interface validation (single interface enforcement)
if [ $NUM_OF_IFACE -gt $NUM_OF_PERMITED_IFACE ]; then
    exit 1  # Fail deployment on multiple interfaces
fi

Stage 2: Runtime Environment Setup

# WasmEdge Runtime Installation
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- -v 0.13.5

# Docker Configuration with Ramdisk
Environment=DOCKER_RAMDISK=true  # Ephemeral container storage

Stage 3: Agent Service Configuration

[Service]
Type=simple
User=root
WorkingDirectory=/cocos
EnvironmentFile=/etc/cocos/environment
ExecStartPre=/cocos_init/agent_setup.sh
ExecStart=/cocos_init/agent_start_script.sh
Restart=always
StartLimitInterval=300
StartLimitBurst=5

Certificate and Credential Management

Agent authentication utilizes mutual TLS with certificate-based authentication for communication with the cloud:

write_files:
  - path: /etc/cocos/certs/cert.pem
    permissions: "0644"
  - path: /etc/cocos/certs/key.pem  
    permissions: "0600"              # Restricted key access
  - path: /etc/cocos/certs/ca.pem
    permissions: "0644"

Network Security Configuration

Firewall and Access Control

Network access is restricted to essential agent communication channels:

GCP Firewall Rule:

resource "google_compute_firewall" "allow-agent" {
  name = "allow-agent-${var.vm_name}"
  
  allow {
    protocol = "tcp"
    ports    = ["7002"]              # Agent GRPC endpoint only
  }
  
  source_ranges = ["0.0.0.0/0"]     # Consider IP restriction for production
  target_tags   = [var.vm_name]
}

Interface Validation and Host Discovery

The agent startup script enforces network topology constraints:

# Single interface enforcement
NUM_OF_IFACE=$(ip route | grep -Eo 'dev [a-z0-9]+' | awk '{ print $2 }' | 
               grep -v '^docker' | sort | uniq | wc -l)

# Dynamic host configuration
DEFAULT_IFACE=$(route | grep '^default' | grep -o '[^ ]*$')
AGENT_GRPC_HOST=$(ip -4 addr show $DEFAULT_IFACE | grep inet | 
                  awk '{print $2}' | cut -d/ -f1)

Operational Resilience

Service Recovery and State Management

The agent implements automatic recovery mechanisms with state preservation:

Automatic Restart: Restart=always with exponential backoff (10s base, 5 burst attempts)
Post-Reboot Recovery: Dedicated service handles IMA cache warming and service restoration
Docker Daemon Management: Ramdisk configuration with dependency-aware startup sequencing

Monitoring and Observability

Comprehensive logging infrastructure captures system and agent state:

StandardOutput=file:/var/log/cocos/agent.stdout
StandardError=file:/var/log/cocos/agent.stderr

Log Categories:

Setup Logs: /var/log/cocos/setup.log - Initial provisioning status
IMA Logs: /var/log/cocos/ima_setup.log - Integrity measurement configuration
Agent Logs: /var/log/cocos/agent.log - Runtime operation and GRPC communication
Verification Logs: /var/log/cocos/verification.log - Component validation results

Performance Optimizations

Memory and Storage Efficiency

Docker Ramdisk: Eliminates persistent container storage overhead and forensic artifacts
IMA Cache Warming: Pre-measurement of system files reduces runtime measurement latency
Filesystem Expansion: Automatic root partition resizing maximizes available compute storage

Boot Sequence Optimization

Conditional Reboot Logic: IMA policy activation triggers single reboot when required
Dependency Management: Service ordering ensures network and Docker availability before agent startup
Resource Validation: Pre-flight checks prevent failed deployments due to missing dependencies

This cloud configuration establishes a hardened, verifiable execution environment for Cocos AI's confidential computing workloads, combining hardware-level security primitives with comprehensive software-based integrity monitoring and access controls.

Overview​

Architecture Components​

Agent Runtime Environment​

Multi-Cloud Deployment Strategy​

Microsoft Azure Implementation​

Google Cloud Platform Implementation​

Security Architecture​

Memory Protection Mechanisms​

Customer-Managed Key Infrastructure​

Integrity Measurement Architecture​

Agent Provisioning Pipeline​

Cloud-Init Orchestration​

Stage 1: Base System Hardening​

Stage 2: Runtime Environment Setup​

Stage 3: Agent Service Configuration​

Certificate and Credential Management​

Network Security Configuration​

Firewall and Access Control​

Interface Validation and Host Discovery​

Operational Resilience​

Service Recovery and State Management​

Monitoring and Observability​

Performance Optimizations​

Memory and Storage Efficiency​

Boot Sequence Optimization​