The Runtime
When you deploy an app, it runs as its own process — not inside a shared server. A control-plane orchestrator (supervisord) spawns one detached tabbify-runner per app, and each runner joins the private mesh as a peer hosting exactly one workload on a deterministic address. This split is what makes apps survive the supervisor crashing.
Supervisor and runners
supervisord hosts nothing itself. It is pure control plane: it spawns runners, persists their records to disk, monitors liveness, and re-adopts them on restart. Each runner hosts one app, builds the runtime (WASM, Firecracker, or Docker), and serves on [app_ula]:8730.
# Orchestrator; --no-mesh binds loopback for local dev (no root/TUN)
supervisord --runner-bin ./tabbify-runner --no-mesh
# A runner is normally spawned for you, derived from the app UUID
tabbify-runner --uuid 0191e7c2-1111-7222-8333-444455556666 --no-mesh
A supervisor advertises only what it can run: it joins with tag [supervisor], adds firecracker if /dev/kvm is present, and docker if the daemon is reachable.
Lifecycle
You drive apps through the control API on the supervisor's mesh ULA (port 8730), or via the node:
POST /v1/apps/{uuid}/start spawn runner (idempotent if alive), wait healthy 30s, return app_ula
POST /v1/apps/{uuid}/stop shutdown runner, forget record, keep cached artifacts
POST /v1/apps/{uuid}/purge shutdown + clear artifact cache + docker image
GET /v1/apps list runner records on disk + a live health probe each
start replies with the deterministic address and restart state:
{"state":"running","app_ula":"fd5a:1f02:44a5:240b:121a::1","restart_status":"running","restart_count":0}
Each runner record (RunnerHandle) is JSON on disk — uuid, pid, control_sock, app_ula, restart state. That file is the single source of truth, which is what enables re-adoption.
Health and crash survival
Runners expose a Unix-socket control channel (JSON-lines, one command per connection):
-> {"cmd":"health"}
<- {"reply":"health","state":"running","app_ula":"fd5a:1f02:…::1","pid":12345,"app_health":"serving"}
Because runners are detached, killing supervisord does not kill them — they keep serving. On restart the orchestrator scans on-disk records and probes pid + socket. Living runners are re-adopted with no respawn blip; a record whose process is dead (or whose socket is hung) is force-killed and respawned. A monitor loop reconciles every 5 seconds.
Restart supervision
App-level (L2) restart is the supervisor's job: per-runner exponential backoff (base 10s, cap 300s, doubling per failure, reset after 60s stable) with a crash-loop threshold of 5 failures. Keeping supervisord itself alive (L1) belongs to the launch backend — a container --restart policy or systemd.
docker run -d --restart on-failure --device /dev/net/tun --cap-add NET_ADMIN \
-v tbf-state:/var/lib/tabbify tabbify-supervisor
Warm start
Runners cache artifacts on disk, so a stop then start skips the fetch. Warm-start is verified for Firecracker (snapshot/restore mmaps snap.mem) and Docker (image cache); WASM pooling is staged. Note: sticky identity lives under SUPERVISOR_DATA_DIR — drop that volume and the app rejoins as a fresh peer with a new address.