We had another brief outage yesterday, this was somewhat intended. It will have affected about ~35% of our sites. What we are working on is migrating our sites over from our old system, which used Docker, Debian, and a bunch of other tools, to a new system we are custom creating called, “Nixos-Scale”. The specific problem was the firewall updated rules (ahead of time!) and this broke the ingress/DNS routing to the websites. The websites weren’t broken themselves, but the path TO the websites was. Now we are building a durable fix into Nixos-Scale to fix this now and in the future.
Human written This was written and shaped by a human author.
Under the hood what we are working on currently is updating to the most recent stable version of NixOS. What this does is that it pins recent/new software and then deploys it. It is a full scale upgrade of the software across the system. In the process of the upgrade I realized there would be too much downtime, so we have to move sites around and deploy the upgrade in phases, which raises other problems, etc. All in All, soon we should be up-to-date with current software across Nixos-Scale and we will continue marching forward to creating the perfect system!