[identity profile] syberghost.livejournal.com posting in [community profile] techrecovery
So, we're having a datacenter-wide power outage this weekend, as part of an expansion. We've got to shut all the servers down, and we've been asking the test organizations for our various projects to inform us when they have finished with a test level for the day so we can shut down as many of them early as we can, and not have to coordinate it all after hours.

About 11:30 this morning, one of my projects informs me that their Level 4 and Level 2 environments are done for the day, we can go ahead and shut those down.

Two hours later, they tell me this:

"BTW, the APAC databases for L3 and L4 all reside on the same server, so we can't shut that one down yet."

TWO HOURS LATER. Do ya think perhaps I might have already gotten those bad boys shut down, folks? Just maybe?

Edit: and now, Trane has screwed up the UPS and we may not even have the power outage...

Edit two: yep, sure enough; fire marshals have to certify the repairs before we can connect the new UPS, can't get them today, power outage is pushed off more than a month.

Date: 2007-12-08 09:28 am (UTC)
ext_8716: (Default)
From: [identity profile] trixtah.livejournal.com
Hah, we've been waiting to install a new UPS switch, since the current one is not adequate for the purpose, and is in fact potentially unsafe. So, you know, health and safety issue. So we have had a major outage planned for 6 months, during our regular scheduled monthly maintenance period, which involves discussions with the application owners on the 300 servers, electricians, building maintenance, SAP guys, Oracle DBAs, network ops and so on.

So we have a co-ordination meeting the week before, with 30 people present, only to find the acting manager (who comes from a different division located in a different city) say "Oh, I think that's the night that $old_division is doing a software upgrade. You'll have to cancel it if they are." It turns out that they have been changing their maintenance windows constantly for several months, but they "forgot" about ours (which has only been the same for the last decade). Acting manager didn't have the balls to negotiate with the other division, since they used the magic word "operational". So, their software tweak, which takes 10 minutes, is located in a different city and is a job they do weekly (with constantly varying days) apparently could not be moved... and that 6 months of planning has gone down the gurgler.

Of course, while the other division might be "operational", we look after the "business" systems. Such a shit if everything there goes up in smoke, and we can't bill our customers or pay our staff, eh? Not to mention the bloody Health and Safety issue. I didn't realise software tweaks were so much more risky in comparison... GRAR.

Profile

techrecovery: (Default)
Elitist Computer Nerd Posse

April 2017

S M T W T F S
      1
2345678
91011121314 15
16171819202122
23242526272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 20th, 2026 12:19 am
Powered by Dreamwidth Studios