Our data center racks in New York unexpectedly lost connectivity. There was a failure in the UPS feeding our equipment that went undetected by our provider. Here is a timeline of major events so far:
- 06:43 UTC - The UPS feeding our racks in the New York data center stopped feeding utility power and failed to provider backup power
- 07:04 UTC - We learned the nature of the UPS issue and began failover to our Oregon facility
- 07:07 UTC - Oregon came online in read-only mode
- 07:10-09:00 UTC - We experienced issues with stability in Oregon due to the expected failover with caching and database connections. We will detail these issues in the postmortem.
- 09:50 UTC - New York data center regains power
- 09:54 UTC - Load balancers were reactivated and New York is again reachable.
- 09:56 UTC - SQL Databases are online and restoring databases
- 10:00 UTC - SQL Databases came back online without any apparent damage (phew)
We will provide updates as we continue to work through this. We are currently awaiting word from Internap that power is stable before failing back and enabling write mode to the sites.
We apologize for the trouble this morning, we have identified several issues that an unexpected failover exposed in our strategy due to new infrastructure since the last time this was necessary. We will be working on those issues in parallel with getting our sites back to full functionality.