Docker Deployment

Note

This guide covers running TunnelMesh in Docker containers for development, testing, and production deployments. Containers need elevated privileges (NET_ADMIN) to create TUN interfaces.

Quick Start

# Start the full mesh stack
cd docker
docker compose up -d

# View logs
docker compose logs -f

# Run connectivity tests
make docker-test

# Stop
docker compose down

Building the Image

Using Make

make docker-build

Manual Build

docker build -t tunnelmesh:latest -f docker/Dockerfile .

Docker Compose Stack

The included docker-compose.yml sets up a complete mesh environment with scalable coordinators for testing chunk-level replication:

Service Description
coordinator Coordinator peer with admin UI (2 replicas by default)
client Mesh peer (5 replicas by default)
prometheus Metrics collection (one per coordinator)
grafana Dashboards and visualisation (one per coordinator)
loki Log aggregation (one per coordinator)
sd-generator Prometheus service discovery
benchmarker Automated performance testing
Note

Multi-coordinator architecture: Each coordinator replica runs its own monitoring stack and uses tmpfs for ephemeral storage. Coordinators discover each other via peer registration and replicate S3 chunks peer-to-peer. All coordinators are equal peers with no primary/replica distinction.

Starting the Stack

cd docker

# Start all services (2 coordinators by default)
docker compose up -d

# Start with custom number of coordinators
docker compose up -d --scale coordinator=3

# Scale coordinators after startup
make docker-scale-coords

# Scale clients
docker compose up -d --scale client=10

Viewing Logs

# All services
docker compose logs -f

# Coordinator logs only
make docker-logs-coords

# Specific coordinator
docker compose logs -f coordinator

# View replication activity
docker compose logs coordinator | grep -i replication

Accessing Services

Service URL Notes
Coordination API http://localhost:8081 Load-balanced across all coordinators
Admin Dashboard https://coordinator-node.tunnelmesh/ Mesh-only (any coordinator)
Grafana http://localhost:3000 Metrics dashboards (first coordinator)
Prometheus http://localhost:9090 Raw metrics (first coordinator)
Note

The admin panel is accessible from any coordinator node within the mesh. Monitoring stacks (Grafana/Prometheus) are exposed from the first coordinator replica via port mapping. Each coordinator runs its own isolated monitoring stack with tmpfs storage.

Container Requirements

Warning

Elevated privileges required: TunnelMesh containers need NET_ADMIN capability and access to /dev/net/tun to create network interfaces. This is unavoidable for VPN/tunnel software.

TunnelMesh containers need elevated privileges for TUN interface creation:

cap_add:
  - NET_ADMIN
devices:
  - /dev/net/tun:/dev/net/tun

Minimal Container Configuration

services:
  tunnelmesh:
    image: ghcr.io/zombar/tunnelmesh:latest
    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    volumes:
      - ./config.yaml:/etc/tunnelmesh/config.yaml:ro
    command: ["join", "--config", "/etc/tunnelmesh/config.yaml"]

Configuration

Note

Context management (tunnelmesh context) is designed for host-based installations where you run multiple meshes from one machine. In Docker deployments, each container is typically dedicated to a single mesh and receives its config directly via volume mount or environment variables.

Coordinator Configuration

Start coordinator (automatically bootstraps when no server URL):

docker run tunnelmesh join --token your-secure-token

Optional coordinator.yaml for custom settings:

name: "coordinator"

# Coordinator services (auto-enabled when no server URL provided)
# Admin panel, relay, and S3 are always enabled (ports: 443, 9000)
coordinator:
  listen: ":8080"  # Coordination API (default: ":8443")
  data_dir: "/var/lib/tunnelmesh"

Peer Configuration

Create docker/config/peer.yaml:

name: "peer-1"

# DNS is always enabled
dns:
  listen: "127.0.0.53:5353"

Start peer:

docker run tunnelmesh join coordinator:8080 --token your-secure-token --config /etc/tunnelmesh/peer.yaml

Network Modes

Bridge Network (Default)

Services communicate over a Docker bridge network:

networks:
  mesh-control:
    driver: bridge
    ipam:
      config:
        - subnet: 172.28.0.0/24

Host Network

For production deployments requiring full network access:

services:
  tunnelmesh:
    network_mode: host
    # Note: No port mapping needed with host network

Shared Network Namespace

Monitoring services share the coordinator's network to access mesh IPs:

services:
  prometheus:
    network_mode: "service:coordinator"
Note

Each coordinator replica runs its own monitoring stack. All monitoring containers share their respective coordinator's network namespace.

Volumes

Coordinator Storage

Coordinators use tmpfs (ephemeral RAM-based storage) for testing replication:

volumes:
  - type: tmpfs
    target: /var/lib/tunnelmesh
    tmpfs:
      size: 268435456  # 256MB
  - type: tmpfs
    target: /root/.tunnelmesh
Warning

Ephemeral storage: All coordinator data (S3 chunks, SSH keys, metrics) is lost on container restart. This is intentional for replication testing where you want a clean slate. For production deployments, use named volumes or host mounts instead of tmpfs.

Monitoring Stack Volumes

Each monitoring service has persistent storage:

volumes:
  prometheus-data:     # Prometheus TSDB
  grafana-data:        # Grafana configuration
  loki-data:           # Log storage
  benchmark-results:   # Benchmark JSON output

Accessing Benchmark Results

# List results
docker compose exec benchmarker ls -la /results/

# Copy to host (find the benchmarker container ID first)
docker ps | grep benchmarker
docker cp <container-id>:/results ./benchmark-results/

# View latest result
docker compose exec benchmarker cat /results/benchmark_*.json | jq . | tail -50

Health Checks

All services include health checks:

healthcheck:
  test: ["CMD", "curl", "-sf", "http://localhost:8080/health"]
  interval: 5s
  timeout: 3s
  retries: 5
  start_period: 3s

Check service health:

docker compose ps

Running Tests

Connectivity Tests

# From host
make docker-test

# Manual ping test (pick any coordinator)
docker compose exec coordinator ping -c 3 client-1.tunnelmesh

Benchmark Tests

# Run single benchmark (use benchmarker container)
docker compose exec benchmarker tunnelmesh benchmark client-1 --size 50MB

# View automated benchmark results
docker compose logs benchmarker

Replication Tests

Test chunk-level replication between coordinators:

# 1. Create bucket with replication factor 2
export TUNNELMESH_TOKEN=$(openssl rand -hex 32)
curl -X PUT http://localhost:8081/test-bucket \
  -H "Authorization: Bearer $TUNNELMESH_TOKEN" \
  -H "X-Replication-Factor: 2"

# 2. Upload test file (will be chunked and replicated)
echo "test data" > test.txt
aws s3 cp test.txt s3://test-bucket/test.txt \
  --endpoint-url http://localhost:9000

# 3. Check replication in coordinator logs
docker compose logs coordinator | grep -i "replication\|chunk"

# 4. Verify chunk distribution
curl -H "Authorization: Bearer $TUNNELMESH_TOKEN" \
  http://localhost:8081/api/admin/buckets/test-bucket

# 5. Test distributed reads (stop one coordinator)
docker ps | grep coordinator  # Note one coordinator ID
docker stop <coordinator-id>
aws s3 cp s3://test-bucket/test.txt - --endpoint-url http://localhost:9000
# Should succeed if replication worked

Troubleshooting

TUN Device Issues

Error: cannot create TUN device

Ensure the container has proper privileges:

# Check capabilities (pick any coordinator container)
docker ps | grep coordinator
docker inspect <coordinator-container-id> | jq '.[0].HostConfig.CapAdd'

# Should include: ["NET_ADMIN"]

DNS Resolution

Error: cannot resolve peer

Check mesh DNS is working:

docker compose exec coordinator dig peer-1.tunnelmesh @localhost

Container Networking

# Check container IPs (pick any coordinator)
docker compose exec coordinator ip addr

# Check routes
docker compose exec coordinator ip route

# Check mesh connectivity
docker compose exec coordinator tunnelmesh status

Coordinator Discovery

Verify coordinators can discover each other:

# View coordinator registration logs
docker compose logs coordinator | grep -i "coordinator.*registered"

# Should show each coordinator discovering the others

Logs

# View coordinator logs
make docker-logs-coords

# View peer discovery
docker compose logs coordinator 2>&1 | grep -i peer

# View replication activity
docker compose logs coordinator 2>&1 | grep -i replication

Production Considerations

Tip

Production checklist: Use resource limits, restart policies, log rotation, secrets management, and health checks. Don't run containers as root where possible (though NET_ADMIN requires root).

Resource Limits

services:
  server:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 256M

Restart Policy

services:
  server:
    restart: unless-stopped

Logging

services:
  server:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
Caution

Security best practices: Never hardcode auth tokens in compose files. Use Docker secrets or environment files with restricted permissions. Limit port exposure to only what's necessary.

Security

  • Use secrets management for auth_token
  • Run containers as non-root where possible
  • Use read-only root filesystem for static configs
  • Limit network exposure with explicit port mappings

Example: Minimal Production Setup

version: "3.8"

services:
  coordinator:
    image: ghcr.io/zombar/tunnelmesh:latest
    restart: unless-stopped
    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    ports:
      - "8080:8080"
    volumes:
      - ./coordinator.yaml:/etc/tunnelmesh/coordinator.yaml:ro
    command: ["join", "--config", "/etc/tunnelmesh/coordinator.yaml"]
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8080/health"]
      interval: 30s
      timeout: 5s
      retries: 3

TunnelMesh is released under the AGPL-3.0 License.