Skip to main content

Cloud

Overview

Cocos AI deploys confidential computing workloads across multi-cloud environments using purpose-built Confidential Virtual Machines (CVMs) that provide hardware-level memory encryption and integrity guarantees. The cloud infrastructure implements a defense-in-depth security model combining customer-managed encryption, confidential computing primitives, and runtime integrity monitoring to establish a verifiable trusted execution environment.

Architecture Components

Agent Runtime Environment

The Cocos agent operates as the core computational engine within each CVM, providing a secure execution context for collaborative confidential computing algorithms. The agent supports multiple execution runtimes:

  • Docker Containers: Containerized workloads with hardware-encrypted memory isolation
  • WebAssembly (Wasm) Modules: Lightweight, sandboxed execution via WasmEdge runtime
  • Python Scripts: Native Python execution with confidential computing guarantees
  • ELF Binaries: Direct native code execution within the trusted environment

Multi-Cloud Deployment Strategy

Microsoft Azure Implementation

Azure deployment leverages Confidential VM SKUs (Standard_DC*ads_v5) with VMGuestStateOnly encryption and customer-managed disk encryption sets (DES) supported by SEV-SNP (Secure Encrypted Virtualization with Secure Nested Paging) for memory encryption and integrity protection.

Key Vault Configuration:

resource "azurerm_key_vault" "encryption_vault" {
sku_name = "premium" # FIPS 140-2 Level 2 HSMs
purge_protection_enabled = true # Prevents key destruction
soft_delete_retention_days = 7 # Recovery window
}

Confidential VM Specification:

resource "azurerm_linux_virtual_machine" "confidential" {
size = "Standard_DC${var.vcpu}ads_v5" # Confidential compute SKU

os_disk {
security_encryption_type = "VMGuestStateOnly" # Guest state encryption
disk_encryption_set_id = var.disk_encryption_id
}

vtpm_enabled = true # Virtual TPM for attestation
secure_boot_enabled = true # Verified boot chain
}

Google Cloud Platform Implementation

GCP deployment utilizes AMD Milan-based N2D instances with SEV-SNP (Secure Encrypted Virtualization with Secure Nested Paging) for memory encryption and integrity protection.

Confidential Computing Configuration:

resource "google_compute_instance" "confidential" {
machine_type = "n2d-standard-${var.vcpu}"
min_cpu_platform = "AMD Milan" # SEV-SNP requirement

confidential_instance_config {
enable_confidential_compute = true
confidential_instance_type = "SEV_SNP" # Hardware memory encryption
}

shielded_instance_config {
enable_integrity_monitoring = true # Boot integrity verification
enable_secure_boot = true # Verified boot process
enable_vtpm = true # Virtual TPM
}

scheduling {
on_host_maintenance = "TERMINATE" # Prevents live migration
}
}

Security Architecture

Memory Protection Mechanisms

Azure VMGuestStateOnly Encryption: Encrypts VM guest state including memory, CPU state, and temporary storage while maintaining host visibility for management operations.

GCP SEV-SNP Protection: Provides comprehensive memory encryption with integrity guarantees, protecting against both passive memory snooping and active memory corruption attacks from the hypervisor layer.

Customer-Managed Key Infrastructure

Both platforms implement customer-controlled encryption keys with hardware security module (HSM) backing:

Azure Key Vault Premium: FIPS 140-2 Level 2 validated HSMs with comprehensive key lifecycle management and access policies restricted to cryptographic operations (decrypt, encrypt, wrapKey, unwrapKey).

GCP Cloud KMS: Customer-managed encryption keys (CMEK) with automated 90-day rotation policies and global key distribution for multi-region deployments.

Integrity Measurement Architecture

The deployment integrates Linux Integrity Measurement Architecture (IMA) with hardware-based attestation:

# IMA Policy Configuration
ima_policy=tcb # Trusted Computing Base measurement

IMA Implementation Details:

  • Boot-time Measurement: All critical system components are measured during boot sequence
  • Runtime Monitoring: Continuous measurement of executed files and loaded libraries
  • Attestation Support: Generates cryptographic proofs of system integrity state
  • Cache Optimization: Pre-measurement of frequently accessed files to minimize runtime overhead

Agent Provisioning Pipeline

Cloud-Init Orchestration

The agent deployment utilizes a multi-stage Cloud-Init configuration that implements security hardening alongside functional provisioning.

Stage 1: Base System Hardening

# Disable remote access vectors
- systemctl disable ssh.service sshd.service
- systemctl stop ssh.service sshd.service

# Network interface validation (single interface enforcement)
if [ $NUM_OF_IFACE -gt $NUM_OF_PERMITED_IFACE ]; then
exit 1 # Fail deployment on multiple interfaces
fi

Stage 2: Runtime Environment Setup

# WasmEdge Runtime Installation
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- -v 0.13.5

# Docker Configuration with Ramdisk
Environment=DOCKER_RAMDISK=true # Ephemeral container storage

Stage 3: Agent Service Configuration

[Service]
Type=simple
User=root
WorkingDirectory=/cocos
EnvironmentFile=/etc/cocos/environment
ExecStartPre=/cocos_init/agent_setup.sh
ExecStart=/cocos_init/agent_start_script.sh
Restart=always
StartLimitInterval=300
StartLimitBurst=5

Certificate and Credential Management

Agent authentication utilizes mutual TLS with certificate-based authentication for communication with the cloud:

write_files:
- path: /etc/cocos/certs/cert.pem
permissions: "0644"
- path: /etc/cocos/certs/key.pem
permissions: "0600" # Restricted key access
- path: /etc/cocos/certs/ca.pem
permissions: "0644"

Network Security Configuration

Firewall and Access Control

Network access is restricted to essential agent communication channels:

GCP Firewall Rule:

resource "google_compute_firewall" "allow-agent" {
name = "allow-agent-${var.vm_name}"

allow {
protocol = "tcp"
ports = ["7002"] # Agent GRPC endpoint only
}

source_ranges = ["0.0.0.0/0"] # Consider IP restriction for production
target_tags = [var.vm_name]
}

Interface Validation and Host Discovery

The agent startup script enforces network topology constraints:

# Single interface enforcement
NUM_OF_IFACE=$(ip route | grep -Eo 'dev [a-z0-9]+' | awk '{ print $2 }' |
grep -v '^docker' | sort | uniq | wc -l)

# Dynamic host configuration
DEFAULT_IFACE=$(route | grep '^default' | grep -o '[^ ]*$')
AGENT_GRPC_HOST=$(ip -4 addr show $DEFAULT_IFACE | grep inet |
awk '{print $2}' | cut -d/ -f1)

Operational Resilience

Service Recovery and State Management

The agent implements automatic recovery mechanisms with state preservation:

  • Automatic Restart: Restart=always with exponential backoff (10s base, 5 burst attempts)
  • Post-Reboot Recovery: Dedicated service handles IMA cache warming and service restoration
  • Docker Daemon Management: Ramdisk configuration with dependency-aware startup sequencing

Monitoring and Observability

Comprehensive logging infrastructure captures system and agent state:

StandardOutput=file:/var/log/cocos/agent.stdout
StandardError=file:/var/log/cocos/agent.stderr

Log Categories:

  • Setup Logs: /var/log/cocos/setup.log - Initial provisioning status
  • IMA Logs: /var/log/cocos/ima_setup.log - Integrity measurement configuration
  • Agent Logs: /var/log/cocos/agent.log - Runtime operation and GRPC communication
  • Verification Logs: /var/log/cocos/verification.log - Component validation results

Performance Optimizations

Memory and Storage Efficiency

  • Docker Ramdisk: Eliminates persistent container storage overhead and forensic artifacts
  • IMA Cache Warming: Pre-measurement of system files reduces runtime measurement latency
  • Filesystem Expansion: Automatic root partition resizing maximizes available compute storage

Boot Sequence Optimization

  • Conditional Reboot Logic: IMA policy activation triggers single reboot when required
  • Dependency Management: Service ordering ensures network and Docker availability before agent startup
  • Resource Validation: Pre-flight checks prevent failed deployments due to missing dependencies

This cloud configuration establishes a hardened, verifiable execution environment for Cocos AI's confidential computing workloads, combining hardware-level security primitives with comprehensive software-based integrity monitoring and access controls.