A pair of issues caused alerts processing to be largely impaired starting on 11 Aug 2017:
- We had been ingesting Canadian alerts in a manner inconsistent with their specifications, and an infrastructural issue caused Canadian alerts to fail to ingest starting on the 11th.
- We had been tightly coupling all alerts together, regardless of the source; this meant that errors in the Canadian alerts cascaded to alerts from other sources, occasionally (though often) causing all of the alerts to be unavailable.
Careful monitoring notified us of these issues immediately, but it nonetheless took us a week or so to fully correct the issues and give them adequate testing before moving them into production.
The two issues were remedied separately. We uncoupled alerts from each source from each other, meaning that a failure of a single source no longer affects other sources. Additionally, upon consultation with the Canadian government’s data services team, we have brought our alerts consumer in line with specifications, fixing the issue that caused the error in the first place and also improving the accuracy of alerts.
While working on these fixes, we took the opportunity to make some improvements: we now receive alerts from all sources in a more timely manner, our service is more robust against transient problems with individual sources, and we are a little more precise about the region that alerts cover than we were before.