The API was down this morning between 3:40AM and 7:49AM ET.
A transient issue with our cloud provider caused our authentication database to enter an unusual failure state, where it was up and responding to requests, but always with an error message. This sort of error was not one we’d ever encountered before, and so was not checked for in our automated monitoring. As a result our staff wasn’t notified of the issue in a timely fashion.
In order to prevent this kind of downtime from occurring in the future, we will be making the following changes:
- We will be switching to a more resilient database layer, which will greatly reduce the chances for this layer to fail in a catastrophic manner.
- We will be updating our monitoring system in order to notify our operations staff in all scenarios, so that such a failure will have a shorter impact upon our systems.
Thanks for your patience, and our apologies for the error.