A percentage of API clients (peaking at 12.5%) have experienced intermittent downtimes during these windows:
- 2016-09-28 09:30 EDT through 2016-09-28 09:45 EDT
- 2016-09-29 16:30 EDT through 2016-09-30 07:45 EDT
The symptom of these downtimes was a connection failure due to a SSL handshake error. This SSL handshake error was caused by one of our cloud provider’s load balancers failing to correctly respond to traffic. This issue was exacerbated because, in all other respects, the load balancer was behaving normally, making our monitoring software return conflicting reports.
As soon as we were aware of the issue in both cases, we contacted our cloud provider and they fixed the issue within minutes. However, they were unable to provide us with a root cause for the issue.
In order to permanently fix it, we will be migrating away from our cloud provider’s load balancing network and running our own in-house, so that we can monitor it at a far finer level of granularity. We expect to have this change ready in the next week or so and will update this status blog when done. Thanks for your patience!