Our monitoring system is seeing occasional, intermittent 502 Bad Gateway errors returned from the API. These errors are relatively infrequent, affecting 0.04% of API traffic. (However, the distribution of these errors depends on DNS; lucky clients would experience no problems, while unlucky clients would experience a much greater share of them.)
Update 22 Jan 2018: We believe that these issues are related to resource contention on our cloud provider’s hosts in the wake of hotfixes related to the Meltdown CPU vulnerability announced earlier this month. Last Friday (the 19th), we pulled the VMs that we had identified as most affected by this issue out of rotation, and over the weekend (the 20th and 21st) we did the same with additional problem VMs as we identified them.
Update 28 Jan 2018: Our cloud provider continues to investigate the issues with their system stability. We are continuing to respond to individual VM failures as they occur. Notably, we experienced a partial outage this morning between 3:30AM and 11:55AM ET as one of our load balancing servers became inaccessible in a way that our monitoring system didn’t catch. We have resolved the immediate issue and will be making our monitoring more aggressive for such failures so that they are resolved more quickly.
Thanks for bearing with us!