Özgür Işık Damar
10 min read

Running five production apps on a $50 a month Hetzner box

The Vercel bill was $312. The Hetzner bill was €11.66. Six months later, this is what's actually on the box.

infrastructureself-hostedhetznercost-engineering

My last Vercel bill was $312. The one before that, $287. Three Next.js sites, a Postgres database I had long outgrown the free tier on, a Redis instance that charged me by the connection. In a single weekend I moved everything onto one Hetzner CAX21 ARM server. The next month's bill was €11.66, plus €4.49 for a smaller staging box. Six months on, I run five things across those two boxes and pay €16.15 a month. Here is what is actually on them.

The boxes

  • Box A — CAX21. 4 vCPU ARM, 8 GB RAM, 80 GB NVMe, Falkenstein. €11.66/month.
  • Box B — CAX11. 2 vCPU ARM, 4 GB RAM, 40 GB NVMe, Helsinki. €4.49/month.

That is the whole inventory. No load balancer, no managed Postgres, no Redis-as-a-service. Two Ampere ARM boxes wired together over a Hetzner private network, both running plain Debian 12.

I went ARM because a friend who runs a small agency was on the x86 cax line and paying more for the same RAM. The ARM line is roughly 40 percent cheaper for what you get. Nothing I run cares about the architecture, except a Sharp build that wanted --platform=linux/arm64 in its Dockerfile.

What runs on Box A

In one docker-compose, in one project directory, owned by one non-root user:

  1. Postgres 17. Single instance, four databases, isolation by role. stork_prod, nova_prod, portfolio, analytics. Each role owns its own database and has no rights anywhere else. I spent two days trying the multi-instance approach and it turned silly. Postgres handles multi-tenancy cleanly the moment you stop fighting it.
  2. Redis 7. Three logical databases on three indexes. App cache on 0, job queue on 1, sessions on 2. And a requirepass that I now know, to the character, is exactly 64 long — for reasons I will get to.
  3. Caddy 2. Five domains, automatic HTTPS, one Caddyfile. About a third the length of the equivalent Nginx config.
  4. Stork warehouse-test. Go API plus the stork-admin-panel build, both behind Caddy on subdomains.
  5. nova-api. Two Go microservices in the same compose file, talking to each other over the internal Docker network.
  6. Prometheus + Grafana. Yes, on the same box. The Prometheus binary sits at 60 MB at idle, Grafana around 35 MB. The whole observability stack uses one percent of the available RAM. I will move it the day I need to.

Box A's load average sits around 0.6 on a normal afternoon and climbs to 1.8 when the nightly Postgres dump runs. CPU steal is zero — ARM dedicated cores are not shared. I used to have a shared VPS where the st column in vmstat 1 flickered like a nervous animal; here I watched it for an hour and it never moved. That matters, because predicting how long Postgres autovacuum will take is only meaningful when nobody else is stealing my CPU.

What runs on Box B

  1. Postgres 17 — test instance. A sample of prod data, refreshed every Sunday at 03:00 by a script that pulls a pg_dump from Box A over the private network and restores it under a fresh password and an anonymised users table.
  2. Membrane AI prototype. A Python FastAPI service for a side project. It lives here because I do not want a half-finished prototype anywhere near the prod database.

Box B is reachable from Box A over the 10.0.0.0/16 private network and from nowhere else. Its public IP exists, but the firewall rejects everything except SSH from my home IP and ICMP.

Why not Kubernetes

I priced out the operator time. K8s would have meant roughly four hours of maintenance a month — upgrades, cert rotation, the inevitable kubectl describe pod archaeology when a pod restarts at three in the morning. Five apps fit inside docker-compose up -d. Compared to compose, I might claw back thirty minutes a month in deployment ergonomics. The trade is bad.

A colleague who tried K8s first for a similar shape of project burned three weekends on it and went back to compose. That story made the decision easy.

Why Caddy, not Nginx

Two reasons. The Caddyfile reads the way a config file should read. And ACME comes built in. I have never written a certbot cron in my life, and I do not intend to start.

# /etc/caddy/Caddyfile — five domains, one file
{
    email isikozgur35@gmail.com
}
 
ozgurdamar.dev {
    reverse_proxy portfolio:3000
    encode zstd gzip
}
 
api.stork-test.dev {
    # internal Docker DNS — Caddy and stork-api share the same compose network
    reverse_proxy stork-api:8080
}
 
admin.stork-test.dev {
    reverse_proxy stork-admin:3000
}
 
nova-test.dev, *.nova-test.dev {
    reverse_proxy nova-api:8081
}
 
grafana.ozgurdamar.dev {
    # only my home IP can reach the dashboard
    @home client_ip 92.45.xx.0/24
    handle @home { reverse_proxy grafana:3000 }
    handle { respond 403 }
}

Caddy renews the certificates without me. I only noticed it had renewed everything last week because a Slack reminder I had set six months ago fired, and I went to check.

The compose file

The whole stack lives in /srv/box-a/docker-compose.yml, owned by user deploy. The interesting bits:

# docker-compose.yml — trimmed to the load-bearing services
services:
  postgres:
    image: postgres:17-bookworm
    restart: unless-stopped
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/pg_password
      # explicit collation — Ubuntu default broke after the migration
      LANG: C.UTF-8
      LC_COLLATE: C.UTF-8
    volumes:
      - pgdata:/var/lib/postgresql/data
    secrets: [pg_password]
 
  redis:
    image: redis:7-bookworm
    restart: unless-stopped
    # password loaded from file so it never lands in `docker inspect`
    command: ["redis-server", "/etc/redis/redis.conf"]
    volumes:
      - ./redis.conf:/etc/redis/redis.conf:ro
 
  caddy:
    image: caddy:2
    restart: unless-stopped
    ports: ["80:80", "443:443"]
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy_data:/data
    depends_on: [stork-api, stork-admin, nova-api, portfolio]

restart: unless-stopped plus the systemd unit below means the box can reboot and everything comes back without me looking at it.

The systemd unit that holds it all up

This is the one thing I would tell anyone copying this setup not to skip. The Docker daemon comes up on boot. Your compose project does not, unless you tell it to.

# /etc/systemd/system/box-a.service
[Unit]
Description=Box A docker-compose stack
Requires=docker.service
After=docker.service network-online.target
 
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/srv/box-a
# `up -d` is idempotent — safe to retry; `down` on stop releases volumes cleanly
ExecStart=/usr/bin/docker compose up -d --remove-orphans
ExecStop=/usr/bin/docker compose down
TimeoutStartSec=300
 
[Install]
WantedBy=multi-user.target

systemctl enable box-a once, and forget it. The box has rebooted twice in six months — once for a kernel update, once because I typed shutdown -r when I meant now. Both times, everything was back inside 90 seconds. My phone stayed quiet, no alerts fired, Caddy didn't forget anything. When a service doesn't come back after a reboot, it is almost always a surprise dependency chain — which is why the depends_on list isn't trimmed; it reflects the actual startup order.

Backups, which are the actually expensive thing

Backups go encrypted to a Hetzner Storage Box via restic. €3.20/month for 1 TB, of which I use about 47 GB. The script runs at 03:30 every night under cron:

#!/usr/bin/env bash
# /srv/box-a/scripts/nightly-backup.sh — pg_dump + restic, exit non-zero on any failure
set -euo pipefail
export RESTIC_PASSWORD_FILE=/etc/restic.password
export RESTIC_REPOSITORY=sftp:u123456@u123456.your-storagebox.de:/backups
 
STAMP="$(date -u +%Y%m%dT%H%M%SZ)"
DUMP_DIR="/var/backups/pg/${STAMP}"
mkdir -p "$DUMP_DIR"
 
for DB in stork_prod nova_prod portfolio analytics; do
  # custom format — parallel restore later, compressed on the wire
  docker exec postgres pg_dump -Fc -d "$DB" > "${DUMP_DIR}/${DB}.dump"
done
 
restic backup "$DUMP_DIR" --tag nightly --tag pg
restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6 --prune
 
rm -rf "$DUMP_DIR"

Every quarter I run a restore drill: pull last night's snapshot to Box B, restore under a different database name, run count(*) against a known-stable table. Three minutes if everything works. I have done it twice. One of those drills caught a missing role grant.

Three things broke during the migration

One. A Postgres collation difference. Ubuntu's default had been en_US.UTF-8; the fresh Debian image came up on C.UTF-8. The Stork warehouse migration ran clean, but a sort query on Turkish product names returned things in the wrong order. The bug was small — Ş was sorting after Z instead of after S — and it took me three hours to find, because the symptom was a UX complaint, not a stack trace. The fix was the explicit LC_COLLATE=C.UTF-8 in the compose file and a one-off ALTER COLLATION on the affected indexes.

Two. Next.js standalone build hitting a memory ceiling. I tried to build on CAX11 first, because that was where I had parked the runner. The build was OOM-killed twice. The 4 GB on CAX11 was not enough headroom for next build with the analytics page bundle. I moved the build to Box A — 8 GB is plenty — and rsync the standalone output across the private network into the deploy directory. Build on A, run on B, deploy by symlink. Ten minutes of work, problem gone.

Three. The one that almost shipped to prod. The Redis password I generated was 88 characters, base64 from openssl rand 64. I pasted it into .env without quotes. Docker compose silently truncated it at the first + character, because + is a YAML control in some contexts once it travels through a shell. The container came up on 6379. It accepted AUTH with the truncated prefix as the password — but also, through a subtle interaction I never fully understood, accepted no password at all. I caught it because I ran redis-cli -h 10.0.0.2 PING from Box B without credentials and got back PONG. Box B should not have been able to do that. I rotated the password — no + this time — and now I generate Redis passwords with openssl rand -hex 32: exactly 64 hex characters, no shell-special bytes.

I was one docker compose up away from running an internet-reachable Redis with no auth. The migration window was the riskiest 48 hours of the year, and this was the bug that nearly cost me.

The cost spreadsheet, finally

ThingVercel/managed beforeHetzner now
Compute (3 Next.js sites)$79included
Postgres (managed, 8 GB)$135included
Redis (managed)$58included
Bandwidth$32included (20 TB/month)
Storage Box (backups)€3.20
Monthly total$312€19.35 (~$21)

Roughly 14× cost reduction. Not the 18× I quoted myself in the hook — I forgot the Storage Box on the back of the envelope. The math gets worse if I include the eight hours I spent on the migration weekend, but six months in, those hours have amortised to roughly free.

The counter-intuitive bit

Serverless was cheap when I started. It got expensive when I succeeded. The cost curve of managed PaaS isn't linear; it stair-steps right when your app is most fragile. The day my portfolio caught an unexpected spike from a Hacker News link was the day Vercel sent me a bill alert. The day my warehouse-test API crossed 50 connections to Postgres was the day the connection-priced Redis tier became the single most expensive line on my invoice.

The boring ARM box does not know my app is having a good week. It just runs.

Don't do this if

You don't enjoy debugging at the OS layer. The Postgres collation bug ate three hours. The Redis password bug could have ended badly. The Caddy-renewal cron does not exist because Caddy handles it — but you have to trust Caddy, and trusting it means reading the docs once instead of skimming them. If journalctl -u box-a and docker compose logs --tail 200 aren't in your muscle memory, the $290 a month you save is not worth what the first 2 AM page will take from you.

I enjoy this. I find it restful. It is a hobby that happens to cost nothing. When a managed service's abstraction breaks, I am usually digging through a support ticket. When something on my own box breaks, I open journalctl and find the answer myself. Both are work — but one feels like control, and the other feels like waiting.

The easter egg

The most expensive line on the bill is still the Storage Box for backups. Backups feel cheap right up until you need one. €3.20 a month is the line I will never optimise.

// while you're here