Rui Carlos Posted May 23, 2023 at 07:32 PM Report Share #631036 Posted May 23, 2023 at 07:32 PM Um bom artigo sobre um incidente que levou a um downtime global na Datadog: Inside Datadog’s $5M Outage Entre os pontos abordados no artigo, realçava dois: A dependência circular entre software que gere a infraestrutura e a própria infraestrutura, cada vez mais comum. O facto de usarem vários serviços cloud diferentes não ter evitado que todos falhassem em simultâneo. Citação [...] So the control plane going offline was the real problem. Had the control plane been unaffected, the outage would have likely been brief. In that case, Datadog could have just re-added the vanished nodes to the routes using the control plane. But with the control planes also gone, the first order of business had to be to get this control plane back and figure out how it disappeared in the first place. This circular dependency, where the infrastructure control plane depends on the infrastructure it manages, recalls what happened when the video game Roblox went down for three days straight in October 2022. Then, the dependency was that Nomad (orchestrating containers) and Vault (secrets management service) both used Consul (service discovery.) So when Consul became unhealthy, Vault went offline. But Vault was needed to operate Consul: a circular dependency. [...] It’s interesting to consider how much Datadog did to avoid a global outage: operating a multi-cloud, multi-region, multi-zone setup with separate infrastructure control planes per region. But despite these efforts, the unforeseen event of a parallel operating system update —and the impact of this update—brought it down. This is a reminder that prevention is just as important as mitigation. [...] O postmortem da Datadog está disponível aqui. 1 Report Rui Carlos Gonçalves Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now