Build your own CDN - Part 2: Using Nomad to manage Caddy

In the previous post we discussed how to provision, and keep TLS certs in synch across multiple Caddy instances. In this post we will look at how to dynamically create and distribute Caddy configuration to multiple nodes using Hashicorp’s Nomad.

Similar to other orchestrations tools such as Kubernetes, Nomad allows you to schedule and run containers, and non-containerized applications, across many servers with a standardized configuration.

Using nomad, we can manage multiple servers, in many datacentres, to create those Points of Presence (PoPs) for our CDN we discussed in our previous post. We will create configuration that will run Caddy on each edge node, and generate configuration dynamically to act as a reverse proxy based on the applications that nomad is also running.

First, we need to create the nomad job specification that defines the Caddy service we want to run. This will look something like:

job "caddy" {
  # as you with a CDN, you'll want to have datacentres close to where your data is being browsed
  # you can define as many PoPs here as you wish
  datacenters = ["dc1", "dc2", "dc3"]
  constraint {
    # this contstraint will ensure that the job will only be run on nodes that have
    # a certain hostname, in this case only those that are loadbalancers
    attribute = "${attr.unique.hostname}"
    value     = "edge-lb-.+"
    operator  = "regexp"
  }
  group "loadbalancer" {
    # by default count is 1, meaning only one container will be created
    # however you can use nomad's autoscaler to dynamically change this
    # otherwise if you know the number of nodes in advanced, and they will
    # remain fixed, you can hardcode as you wish
    # count = 1
    constraint {
      # ensure that a Caddy servers in the group are scheduled on separate physical hosts
      operator  = "distinct_hosts"
    }
    task "server" {
      driver = "docker"
      config {
        # we'll use the official caddy image, but if you want to leverage the S3 cert sharing from
        # the previous post, you'll need to build your own image with the caddy plugin included
        image = "caddy:2"
        # to allow Caddy to bind directly to any port for the host, instead of using docker port forwardign
        # you can give caddy access to the host namespace
        network_mode = "host"
      }
    }
  }
}

This job specification will create a Caddy instance on each node that has a hostname that matches edge-lb-.+. This will allow you to have multiple loadbalancers in each datacentre, and have Caddy running on each of them. This example will serve the default Caddy landing page, and not forward any requests to your applications or request TLS certificates.

To go beyond serving the default Caddy landing page, and start serving TLS certificates, we’ll need to extend the nomad job, and provide a template for the Caddy configuration file.

The way that template files are loaded into a nomad job, is by mounting them into the container as a file. We can do this with:

...
task "server" {
config {
    ...
    # mount in the generated caddy configuration as a ready only volume
    # the path used, is the path to the generated files from the below template stanzas
    mount {
        type     = "bind"
        source   = "..${NOMAD_ALLOC_DIR}/../server/caddy"
        target   = "/etc/caddy"
        readonly = true
    }
}

template {
        data = <<EOH
# caddy configuration goes here
# this is hardcoded configuration that responds with the hostname of the node that is handling the response
:80 {
  respond "Hello World from {{ env "node.unique.id" }}!"
}
        EOH
        # where to write out the configuration
        destination = "caddy/Caddyfile"
    }
}

As you can see, the templates can take variables, and generate configuration based on them. We can extend this behaviour to have nomad generate configuration based on endpoints of services it is running:

... Caddyfile template

{{range service "nomad-example-app"}}
{{index .ServiceMeta "domain"}} {
  tls {
    on_demand
  }
  header {
    X-Balance "{{ env "node.unique.id" }}"
  }
  reverse_proxy {{.NodeAddress}}:{{.Port}}
}
{{end}}

This will take the list of services from Nomad with the name “nomad-example-app”, loop over them using the domain meta information, and have Caddy proxy traffic to the address, and port of each installation. Right now, the connection will happen over whichever network the docker bridge is attached to, and so please be aware that if your IPs are publically accessible, that bypass of the Caddy server may be possible based on network firewall rules.

Minor updates to your application service will be needed. In your application Nomand specification, you’ll need to add a port that will be exposed for the container to listen on. In our case we are exposing port 80 of the container, and nomad will publish that to a random high port on the host. In your service meta information, you’ll need to define a domain so that nomad will be able to create a configuration block in the generated Caddy configuration.

group "nomad-example-app" {

    network {
      port "http"  { to = 80 }
      mode = "bridge"
    }
    service {
      name = "nomad-example-app"
      port = "http"
      meta {
        domain = "nomad-app.example.com"
      }
    }
    task "server" {
      driver = "docker"

      config {
        image = "traefik/whoami:latest"
        ports = ["http"]
...

Now as new services come online, or old ones go away, nomad will update the caddy configuration as needed.

Nomad will restart the Caddy container as the configuration changes, but as this is possibly undesirable for a production load balancer, in the future you may want to look at using Caddy’s automatic reloading capabilities instead. This can be done by adding a change_mode option to the template stanza, which has the value script, and then providing a script that will reload Caddy using the caddy reload --config /etc/caddy/Caddyfile --adapter caddyfile command when the configuration changes.

There are ways to expand on this further, by adding healthchecks, creating services dynamically using gitops or the nomad API, and integrating with a service mesh like Consul connect for mTLS on connection between Caddy, and the application. This covers the basics of using Nomad to dynamically generate and update Caddy configuration based on services defined by Nomad.

In the way I plan to use the above for Gitea.pages, is to have a container exist for each static site, and create nomad services for each site. The static site will have an instance created in each region, and the Caddy server will direct traffic to the container in its region. This allows for regional CDN like capabilities, while keeping the sites loosely coupled and using Nomad for orchestration. When new static sites are built, we can let nomad know a new container is published, and then nomad will handle updating the services globally.

In the next post we will look into how to use Caddy to safely request certificates for custom custormer-provided domains on-demand, and how to validate those requests.