Linode RESOLVED: Newark Connectivity Issues

L

Linode

Guest
Tuesday, April 15, 2014

We have received the following initial RFO from our colocation provider:


Net Access sincerely apologizes for the outage on Saturday, April 12, 2014. Since the outage, our engineering and facilities organizations have been working non-stop to ascertain the Root Cause for Failure (RCF) in consultation with our main Automatic Transfer Switch (ATS) manufacturer, Eaton (www.eaton.com) . We have escalated this issue to the highest level within their organization. As of this time, we can provide the following timeline of events:

- At about 23:16:50, JCPL (power company) performed some unannounced maintenance on the 34.5kv and 12.5kv circuits feeding to and from the substation that MMU gets power from. At this time, we lost utility power;

- Power System B and C transferred load to generator without incident;

- Power System E (previously known as Power systems A and D )sequenced as follows:

- The transfer controller in the E system switchgear commanded a generator start; Gen 3, 1, 4 and 2 were up and running at 23:16:59, 23:17:01, 23:17:02 and 23:17:03 respectively. They then arbitrated and connected to the generator collector bus at 23:17:11, 23:17:15, 23:17:17 and 23:17:26 (Gen1, 3, 4, 2 respectively);

- The next appropriate sequence would be for the transfer controller to command the utility breaker to open, and the generator breaker to close. What actually occurred is that the utility breaker opened, but then then the transfer controller locked itself out in a "safety condition"- the generator breaker did not close. Normally, this is only to occur if a breaker changes state and wasn't commanded to, or if a breaker was commanded to change state and doesn't. It is not clear why the transfer controller locked itself out, and we are working with Eaton to get to the root cause of this unexpected action;

- During the Power System E sequencing, the UPS plant successfully took over load and ran on battery until 23:38:11, at which time the battery plant was depleted. It was not until 00:33:02 that our safety investigation was completed and the utility breaker was re-closed, UPS plant re-energized and load switched back to utility.

Our engineering and facilities team is working through the analysis of the ATS log data with Eaton to determine the RCF. Today, we performed a complete inspection of the ATS, UPS and battery systems in Power System E and observed no anomalies and are confident that the systems will perform as expected. We will continue to closely monitor the system and work with Eaton on a final root cause. We have also escalated a complaint with JCPL to understand the reason for the unplanned maintenance and power interruption that precipitated this event. An updated RFO will be sent out as soon as completed.

Once again we apologize for this power interruption and the significant issues it has caused your organization. We sincerely appreciate your business and will work tirelessly to ensure a 100% satisfactory resolution to this outage.



Sunday, April 13, 2014

5:59AM EDT (UTC -4): At this time all hosts are up and running. All Linodes within the Newark datacenter should be running or are in the queue to be started shortly. If you have questions or concerns, please feel free to open a ticket via the Linode Manager.

3:05AM EDT (UTC -4): Many Linodes are booted at this time and we are working with the remaining hosts to fully restore services. We apologize for any inconvenience caused by this event.

1:56AM EDT (UTC -4): Some Linodes have been booted at this time however, we are still working on bringing all Linodes to a running state. We will continue to keep you updated as we have further information.

1:09AM EDT (UTC -4): The Newark datacenter has suffered from a power outage. We are working on bringing Linodes back up as soon as possible.

12:40AM EDT (UTC -4): We are still working with the Newark datacenter to resolve this issue. We apologize for any inconvenience caused.

Saturday, April 12, 2014

11:51PM EDT (UTC -4): We are investigating an issue with connectivity within the Newark datacenter at this time. We will post updates as we have them. We apologize for any inconvenience.

Continue reading...
 
Top