Printable Version of Topic
Click here to view this topic in its original format
forum.schmolie.com > Outages and Maintenance > Backbone maintenance


Posted by: andy Jun 25 2018, 02:09 PM
CenturyLink, one our backbone providers, is doing scheduled maintenance between 3AM and 6AM, Tuesday, June 26, 2018, Pacific Daylight Time (GMT -7:00).

Most people will notice no downtime. There may be a few that will notice a brief period of time (approximately 1 - 3 minutes) while the routing protocols route around the backbone link that is down for maintenance.

Some may also notice that their data takes a different path to/from our network during and after the maintenance.

Since we maintain multiple upstream backbone links, this maintenance even will have little to no impact to customers' use of their services.

Posted by: andy Jun 27 2018, 09:38 AM
Unfortunately, it looks like CenturyLink botched this maintenance badly.

From approximately 4:25AM to 7:40AM, June 26, 2018 Pacific Daylight time (GMT -7:00), they caused a partial outage in our upstream connectivity by not properly taking down the service during the maintenance. They left the routing path active and broke the path used by data flowing through their network, effectively blackholing any traffic that would have otherwise transited that path.

Not only was their maintenance window more than an hour off from their specified time frame and longer than specified, blackholing your customer traffic during maintenance is a complete rooky mistake. It implies that they've poorly engineered their network and the people doing the maintenance are not competent enough to avoid the issue.

We pay them enough to expect much better.

I'll be filing a formal complaint with their engineering, maintenance, and technical support teams soon. Hopefully it will lead to actions internally within CenturyLink that will prevent this from happening the next time they do network maintenance.

Posted by: andy Jun 27 2018, 04:28 PM
I've submitted a problem summary and request for reason for outage to CenturyLink's maintenance team, which they will forward to their "IP" team for analysis.

The goal is that they identify and correct the network configuration and/or maintenance procedure that caused the problem.

If I'm not satisfied that they've corrected the problem before their upcoming July 10 maintenance, I'll manually shutdown our BGP routing session to CenturyLink prior to the July 10 maintenance and bring it up again after the maintenance has concluded. This will force traffic to take the correct path during the maintenance. It should be automatic, but something about how they have their network configured caused the network rerouting not to function as it should. Until that is resolved, a manual approach can be a temporary work-around.

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)