Since everyone in the world will be off giving Hallmark their money tomorrow anyway, we will be taking a very brief outage to prepare hardware updates on our backend database and cache systems.
We’ll be prepping at 19:00 UTC (2PM EST) and executing the failover shortly after that.
We’re adding additional memory to our primary redis nodes and additional SSD storage to our SQL masters. In order to do this we’ll take the network offline for approximately 1 minute while we switch everything at once. Rather than a bit of random connection failures while IPs flip and such, we’re opting to just take a 1 minute maintenance window and do it quick and clean.
For the curious: we’re upgrading the Stack Overflow SQL cluster from a 2TB P3700 PCIe NVMe SSD drive to a 4TB P3608 version, taking us from 2TB to 4TB on that storage tier. The old P3700 will join 2 sister cards in the other Stack Exchange SQL cluster upping that storage tier from 4TB to 6TB.
The primary redis server will be going from 128GB to 256GB of RAM as well.
This maintenance window won’t actually involve hardware upgrades - it’s just swapping master to the secondary systems which have already been upgraded. Actual hardware upgrades to the current masters will happen next week during normal hours. This outage is scheduled on a Sunday to minimize user impact.
If you have any questions or want to know when we kick things off, follow along with @StackStatus on Twitter.