5 min

Website down: what a 2016 outage taught me

2026-06-18

Just after midnight on 15 November 2016, I published a post titled "My blog is unreachable". On the very site that had just gone down, I explained why it no longer responded, with the slightly naive enthusiasm of someone discovering that a good security practice can turn against you. Ten years on, I have maintained platforms where downtime is no longer measured in lost visitors but in municipalities cut off from their electoral tool. This post starts from the 2016 incident and draws out the method I apply today when a site goes down.

November 2016: locked out by my own security

Back then, my blog was served over HTTPS with a free Let's Encrypt certificate. I had also enabled HSTS myself, going as far as adding the domain to the browsers' preload list. HSTS essentially tells the browser: this site is served only over a secure connection, refuse everything else, no exception.

That day, I migrated to a server with more resources. During the migration, I had to regenerate my SSL certificate, through a "one-click" tool. One field was wrong, so the certificate was too. And since HSTS forbade any insecure connection, the browsers did exactly what I had asked of them: they refused to load my site. No hack, no data loss. A site locked by the rule I had written myself, until I could regenerate a valid certificate.

Even back then I made clear that I did not blame Let's Encrypt; the "one-click" regeneration was not theirs. The real lesson was not "security is risky". It was: a critical operation run in a rush, without checking, costs you immediately.

What I got right without knowing it

Rereading that post, one thing strikes me. My first reflex, in the middle of the night, was not to pretend nothing had happened. I wrote publicly about what had happened, what I had missed and what I took away from it. That is, without my framing it that way at the time, the foundation of incident handling: say what you know, when you know it.

A client who learns about the outage from your message forgives it. A client who discovers it alone, then notices your silence, remembers the silence above all.

Lesson 1: you don't just notice an outage, you measure it

In 2016, I discovered the problem by visiting my own site. In other words, I was my only monitoring tool. Today, my site checks its own services continuously and publishes the result on a public status page: web server, application, database, email, galleries, with a day-by-day history over 90 days and the list of past incidents.

A status page changes two things. First, detection no longer depends on the chance of a visit. Second, transparency becomes verifiable: when I announce an uptime figure, anyone can look at the green bars.

Lesson 2: a backup is prepared before, never during

Amusingly, the 2016 post already advised "making a backup before proceeding". I had grasped the principle before I measured its weight. A certificate failure is fixed without loss. A corrupted database, a dead disk or a hack, no: there, only the backup from before the incident saves you.

That is why my managed hosting plans include, by default, 14-day backups (30 days on the Business plan), managed updates and uptime monitoring. Not as an option, in the base plan. The question to ask your host is not "do you make backups?" but "when did you last test a restore?".

Lesson 3: reliability is quantified, then proven

In 2024, at the Imprimerie Wallonne des Communes, I built and maintained the digital infrastructure that 123 Belgian municipalities relied on for the legislative and municipal elections, including gebel.be, the multilingual platform for managing polling stations. From 2,000 to 4,500 visitors a day, four languages, and one simple requirement: everything had to run on voting day, weekends and public holidays included. The slightest error could compromise the electoral process of an entire country. The platform never went down.

An election night cannot be replayed. This kind of project forces you to treat reliability as a starting requirement: redundancy, monitoring, written procedures, and no unchecked "one-click" operation on a production system. Exactly the opposite of my 2016 migration.

Your site is down: the method, in order

If your site is unreachable right now, here is what I would do in your place.

  1. Confirm the outage. Test from another device or over 4G. If the site responds elsewhere, the problem is on your side (cache, local DNS, network).
  2. Read the exact error message. "Insecure connection" points to the certificate, as it did for me in 2016. A 500 error points to the application, a timeout to the server, a parking page to an expired domain.
  3. Check the expiry dates. Expired domain names and certificates remain ordinary, avoidable causes of outages.
  4. Change only one thing at a time. Note every action. Three simultaneous changes mean three times the trouble understanding what worked.
  5. Communicate early. A short, honest message to your clients beats an hour of silence: "the site is down, I have found the likely cause, expected back around such a time".
  6. Restore if necessary. If data is affected, start again from the last clean backup rather than tinkering with a corrupted system.
  7. Write the post-mortem. Even three lines. My 2016 post was one, and it is thanks to it that I can write this one.

Ten years between two posts

In 2016, an outage of my blog was a setback and an anecdote to tell. In 2026, an outage at a client's is revenue grinding to a halt, and my job is to make it rare, short and announced. If your site has just gone down, or if you want an outage to never again be a surprise, let's talk.