DigitalOcean Event Processing Issues

D

DigitalOcean

Guest
Oct 18, 20:16 UTC
Postmortem -
The Incident


At 08:57 UTC on October 13th, 2017, our Support team began seeing errors with starting the Droplet console, attaching/detaching volumes, and login device verification. Our teams eventually determined this was the result of a failing auditing subsystem. Our engineers worked to bring the subsystem back to a normal state with a patch to ensure that internal auditing failures would not cause the entire operation to be rejected. At 11:30 UTC, the fix was pushed to our production systems and the support team confirmed affected operations were then operational.

Timeline of Events



08:57 UTC - Support team detects operation failures

09:40 UTC - We identify that the internal auditing subsystem was failing

10:30 UTC - Implement patch to resolve the issue


11:30 UTC - Patch deployment is completed

11:33 UTC - Support team confirms affected actions are now operational

#Future Measures
We are working on increasing the reliability of the aforementioned auditing subsystem. Meanwhile, we have developed a patch to ensure users’ requests will succeed regardless of the operational status of the auditing subsystem.

#In Conclusion
We’re disappointed we have let a non-essential sub-system affect user operations and we apologize for the inconveniences and frustrations it has caused.

Oct 13, 12:26 UTC
Resolved - Our engineering team has resolved the issue impacting event processing. If you continue to experience issues with processing events please open a ticket with our Support team and we would be happy to assist you.

Oct 13, 11:41 UTC
Monitoring - We have isolated the cause of the event processing issue and are currently monitoring. Events should be proceeding as normal; if you're still experiencing issues, please open a ticket with our support team.

Oct 13, 11:22 UTC
Update - Our engineering team is continuing to work through actions we believe will resolve the event processing issue. We appreciate your patience and will provide additional updates soon.

Oct 13, 10:12 UTC
Identified - Our engineering team has isolated the issue impacting event processing and is actively working to resolve. We apologize for the issue and will provide an update soon.

Oct 13, 09:36 UTC
Update - Our engineering team continues to investigate the event processing issues causing delays during console access, creates, volume and DNS related events, and login issues during the device verification process. We will provide additional updates as more information becomes available.

Oct 13, 08:33 UTC
Update - Our engineering team is still investigating the event processing issues. During this time you may experience delays during console access, creates, volume and DNS related events. You may also experience login issues during the device verification process. We will continue to update you, and apologize for the inconvenience.

Oct 13, 07:35 UTC
Investigating - Our engineering team is actively investigating issues with event processing. During this time you may experience delays during creates, destroys and power events. We will keep you updated as we have more information. We apologize for any inconvenience this causes.

Continue reading...
 

Similar threads

D
Replies
0
Views
274
DigitalOcean
D
D
Replies
0
Views
291
DigitalOcean
D
D
Replies
0
Views
340
DigitalOcean
D
D
Replies
0
Views
379
DigitalOcean
D
U
Replies
0
Views
450
UpCloud Status - Incident History
U
Back
Top