Debugging 404 errors in Let's Encrypt challenges (with some IPv6 flavor)
I’d set up a public webserver (this one!) to use certbot for HTTPS, but my certificates apparently had never renewed since the first day.
There are a lot of potential reasons why you might see the renewal error message I got. I’ll explain the issue in my case, and how I dug into it!
Failing the renewal challenge
At first I wondered if I had just misunderstood my certbot installation, and if maybe it just wasn’t running renewal automatically like I had expected it to.
To test this, I SSH’d into the machine and tried to renew manually. When I did, I got this message:
$ sudo certbot renew
Saving debug log to /var/log/letsencrypt/letsencrypt.log
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/blog.matchu.dev.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Renewing an existing certificate for blog.matchu.dev
Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems:
Domain: blog.matchu.dev
Type: unauthorized
Detail: 2600:3c0a::f03c:93ff:fe96:afe: Invalid response from http://blog.matchu.dev/.well-known/acme-challenge/tO6ufDHLDSPCvlC2nvcSL5wf594ZTMD7ApDx7WKC0ag: 404
Hint: The Certificate Authority failed to verify the temporary nginx configuration changes made by Certbot. Ensure the listed domains point to this nginx server and that it is accessible from the internet.
Failed to renew certificate blog.matchu.dev with error: Some challenges have failed.
What surprised me most was that the “Detail” section seemed to say that the server was responding to the challenge, but explicitly returning a 404 status code. If the connection was successful, why was my server responding as if the challenge didn’t exist?
In hindsight, here’s a spoiler: the fact that it lists my site’s
IPv6 address (2600:3c0a::f03c:93ff:fe96:afe
) in the “Detail” section is a big clue!
Surprising behaviors
Most help I found online suggested that this might be a DNS issue—that maybe my public DNS records pointed to the wrong place, in a way that was different from what my browser was seeing.
And that sure seemed to feel like the vibe, but it didn’t quite make sense with what I was seeing: my DNS records were indeed correct, and every time I tried to manually run what it seemed like the challenge was doing, my requests were successful.
Reproducing the nginx authentication challenge
I ran the challenge with the
-vv
extra-verbose flag, to learn more about the nginx challenge. Turns
out, the nginx challenge will:
- Find the nginx config files that mention the server you’re using.
- Back them up.
- Temporarily replace them with copies that insert extra logic to respond to the challenge.
- Upon success or failure, replace the config files with the originals.
I didn’t find a way to skip the cleanup step, but I did find that,
when running with
-vv
,
certbot will print out the content of the config files. So, I
copied those config modifications, saved them to the nginx config
myself, restarted nginx, and sent the challenge request. And it
worked for me!
Reproducing the webroot authentication challenge
Concerned that maybe part of the issue was an issue in the nginx configuration override that I wasn’t understanding, I tried certbot’s more lo-fi “webroot” authentication procedure: you tell it the path to your website’s public file root, and it saves the challenge response in there as a file.
This, too, failed the challenge—but I tried writing a file to the same location myself, and loading it myself, and it worked for me!
This suggested the issue was further upstream: that there was likely something in common between the nginx challenge and webroot challenge that was triggering the same issue, before the nginx config was even deciding what to serve.
Cracking the mystery: IPv6 config!
Noticing that the nginx authenticator is complicated and could fail in lots of ways, but the webroot authenticator is simple, here’s the smart idea that got me the answer: I started Googling for the webroot version of the error specifically.
When searching for
certbot webroot 404
, I immediately found the answer to my problem in
this Let’s Encrypt support thread:
- I had registered my site’s DNS records with both an A record (for IPv4) and an AAAA record (for IPv6), for future-proofing.
- 🌟 However, I had incorrectly configured nginx to only listen on IPv4, not IPv6!
- Unlike curl and my browser, which presumably used IPv4 by default (or knew to fall back when IPv6 was failing), Let’s Encrypt defaults to IPv6 when possible.
And so that’s why it felt like Let’s Encrypt was seeing entirely different behavior: it was! My server apparently was always responding to IPv6 clients with 404 for all requests, and I just never noticed!
The solution: adding IPv6 listening to my nginx config
So, I updated my nginx config to listen twice for each port: once for IPv4 connections, and once for IPv6 connections.
After replacing this config and restarting nginx, I ran
sudo certbot renew
again, and the challenges succeeded!
I then went and made this change in all of my nginx configs for all of my projects, oops lol!
server {
server_name blog.matchu.dev;
listen 80;
listen [::]:80; # ✅ Added this line!
if ($host = blog.matchu.dev) {
return 301 https://$host$request_uri;
}
}
server {
server_name blog.matchu.dev;
listen 443 ssl;
listen [::]:443 ssl; # ✅ Added this line!
ssl_certificate /etc/letsencrypt/live/blog.matchu.dev/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/blog.matchu.dev/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
ssl_session_cache shared:SSL:10m; # https://superuser.com/q/1484466/14127
root /srv/blog.matchu.dev;
error_page 404 /404.html;
}
(By the way, I don’t use certbot’s feature to automatically add
SSL config lines to my config files automatically, so my
ssl_certificate
etc lines might look a bit different than yours. That doesn’t mean
yours is set up wrong!)
Aside: How to make requests to a site with expired HTTPS
My browser was correctly refusing to process HTTPS requests to my site, because I had configured it with “HSTS”, which assures the browser that I really mean it about HTTPS and promise to keep my certs up-to-date. (Oops!)
But thankfully,
curl
has
a simple override flag
-k
that
skips HTTPS checks. So, to manually test requests to my site, I
ran:
$ curl -k https://blog.matchu.dev/
Aside: Dealing with rate limiting
At some point, Let’s Encrypt started refusing to run my renewal requests:
Failed to renew certificate blog.matchu.dev with error: urn:ietf:params:acme:error:rateLimited :: There were too many requests of a given type :: Error creating new order :: too many failed authorizations recently: see https://letsencrypt.org/docs/failed-validation-limit/
To address this, I followed the link in the message and learned
about
Let’s Encrypt’s staging environment, which you can activate using
--test-cert
.
To avoid blowing up my existing production certificates, I created
a new site
test.blog.matchu.dev
using the same process as I’d used for
blog.matchu.dev
, installed the certificate, and started running my renewal
attempts against that site. (Just like my original issue,
the certificate installed correctly the first time, but renewal
challenges failed.)
One extra trick: because my initial
test.blog.matchu.dev
certificate wasn’t expired yet, I had to use
--force-renewal
to get certbot to try to renew it anyway, and
--dry-run
to force the challenge to replay.
That said, I think maybe I gave myself a bit of a runaround?
It seems like maybe just using
--dry-run
on the main renewal request would have beat the rate limits
too, because I’ve since learned
--dry-run
automatically selects the staging server? Well, I can’t know
now—all I can tell you is what I know worked for me!