[identity profile] lordstorm.livejournal.com posting in [community profile] techrecovery
Ack. The internet broke this evening; not very pretty - albeit rather entertaining at times - watching the helpdesk cope. Since I'm not officially part of the helpdesk anymore I was spared most of the anguish of dealing with the customers, but as tier 2 Network Support I was instead the sole contact between our upstream provider and reliant downstreamers, and our supervisors controlling the floor.

It went from routing problems in Western Australia, to mail issues, which were both wrapped up fairly quickly, but were only symptoms of the problems to come.

Our ISP - and many like ours - run on a double-tier authentication system: RADIUS. One machine looks after the actual auth requests (username/password checks, previous online sessions) while the other looks after accounting information (once user is actually logged in, accounting packets - "start", "stop", -"alive" - are generated to keep track of the customer's account for billing and cross-references). If you can't authenicate, you won't get on, and if you don't get accounting packets generated correctly/at all, your session drops off and is automatically terminated some time later. It's possible to bypass the machines in times of great need: auth-all bypasses and automatically accepts any user/pwd combo, regardless if it's right, on our auth RADIUS, and bypassing the accounting RADIUS usually results in free internet, since we can't track when/how they got online.

At 1700hrs, our major upstream provider's RADIUS simply disappeared. Like magic. You could ping it, port into it and everything, but interrogating it for RADIUS packets yielded absolutely nothing. Thankfully, being multi-provided, customers using our other minor upstream provider were unaffected, and our DSL customers had no issue either. But a good 60% of all of our business is run through our major provider's dial-up service.....and all of it disappeared when our auth systems did. So anyone using our dial-up services in that provider were immediately returned auth issues (the infamous Windows 691 error).

Oh and how the heavens did drown in the fury of our technicians, and verily the fiery chasms of hell did open to vomit up our customers in an evil mood. :)

An issue of this magnitude usually results in us network reps reaching for the big red handle labelled Auth All - but how can you automatically authenticate packets you don't even receive? No go: we didn't even have that to fall back on.

Now, having been on that same helpdesk myself until recently, I could sympathise with them, having to immediately deal with waves of mad customers demanding to be put back online, regardless of the IVR management had placed on the incoming call queue (do customers even bother listening to those messages???). It eventually got to the point many of the technicians automatically trotted out lines much like this as soon as they picked up each call :

Welcome to your.isp.net, and thank you for your patience. If you are calling regards to authenication issues, or 691 errors, please be advised there is a national outage at the moment with no current ETA. If you still wish to speak to a technician, please remain on the line.

Meanwhile, as the only network rep attempting to liase with our upstreams, development crew, and network engineers (all at physically seperate locations some suburbs away), getting information out to the supervisors on the floor as soon as possible became my priority. But you can't really instil confidence to a floor looking up at you for information when the upstreamer's engineers themselves simply don't know what broke and are still searching for the problem.....for the next 6 hours.

*head-desk head-desk head-desk*

Of course, we might have made it through the night.....if the sheer number of incoming customer calls hadn't completely shat our PABX and lead to a very messy telephone systems crash. Suddenly over 200 pods on the floor were effectively cut off from the outside world.

*sharpens razor* *poise over wrist*

Ever administered a helpdesk in peak evening period and it's damn near silent? As much as the techs and salespeople liked the break, it was damn near spooky, I'll tell you. Eventually we got our backup phone system up, but that had no intelligent queue-sort system operative, and it was like literally picking up calls from a blind stack, not knowing what department the customer wanted, and copping a lot of customer angst in the mean-time. Later attemtps to restart, kick-start, curse-start, threaten, cajole, hack, bypass and eventually switch modules couldn't stand up to the traffic and the phonesystem kept crashing.....another seven times in total, apparently.

I guess it could have been worse: our customer databse could have crashed along with everything else, and then we could have gone completely blind! *sarcastic cheer*

And now I'm home: it's still going on but I can't summon the energy to care, to be honest. Need aspirin. And preferably something very alcoholic.

Date: 2005-05-05 04:26 pm (UTC)
From: [identity profile] jahbulon.livejournal.com
I think I might know who you work for. Has your company just taken over another quite large company?

Date: 2005-05-05 04:31 pm (UTC)
From: [identity profile] jahbulon.livejournal.com
Ii think?
or just I think? ;)

Date: 2005-05-06 02:03 am (UTC)
From: [identity profile] reynardo.livejournal.com
[livejournal.com profile] lordstorm, meet [livejournal.com profile] jahbulon. [livejournal.com profile] jahbulon used to share a flat with one of my workmates, who trained [livejournal.com profile] sarin_girl, who is part of the [livejournal.com profile] shinyshinyelves group in Delros, whom [livejournal.com profile] tsung knows, as played by [livejournal.com profile] lordstorm.

There are other links, but that one will do for the time being.

Date: 2005-05-05 04:50 pm (UTC)
From: [identity profile] grayhawkfh.livejournal.com
My first thought: Which gods did y'all fail to appease? Because the gods surely crapped on your collective head that night...

Date: 2005-05-05 10:56 pm (UTC)
From: [identity profile] kallell.livejournal.com
Envy I have not
Pity you I do

Date: 2005-05-05 11:11 pm (UTC)
From: [identity profile] taleya.livejournal.com
sweet holy shitfuckcrapery!

was it so very very wrong of me to laugh my arse off at this? Purely because it happened when I WASN'T there? >:D

break it tonight before my shift and I swear to god I'll stab thee in the face.

Date: 2005-05-05 11:20 pm (UTC)
From: [identity profile] kingogre.livejournal.com
I feel for you, and I've been there myself......back in 95 when our radius servers went boom because our air conditioner in our crowded server room died and all the overclocked processers running on like P 133's craped out. Then our well shitty drunken admins were at a bachelors party.. That was fun.. I think they broke more stuff than fixed that night.


You poor poor bastard....

Profile

techrecovery: (Default)
Elitist Computer Nerd Posse

April 2017

S M T W T F S
      1
2345678
91011121314 15
16171819202122
23242526272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 20th, 2026 10:20 am
Powered by Dreamwidth Studios