[identity profile] lions-tambua.livejournal.com posting in [community profile] techrecovery

-------------------
Situation:

W2k3-SAN-Cluster where the System is 'controlled' and 'delegated' by one "Master Server" (Non-Redundant! *g*)
The Cluster can work alone. but new jobs will only be accepted when delegated by the Master-Server.
----------
Problem:

Harddrive went OFFLINE. CS tried to Rebuild Harddrive. Rebuild aborted. CS tested Harddrive with Diagnostic-Tool -> Diag tool said, Harddrive OK. CS Low-Level Formated Harddrive (worked) and tried to rebuild again. Rebuild failed again. CS took an NEW harddrive, Rebuild also failed with this one. Conclution -> Raid-5 is corrupt.
----------
Issue:

After an long talk to the System-Administrator, lots of Troubleshooting we found out that his Raid-5 is Inconsistent. Parity Informations dont fit Data expectations. CS never did an Consistency Check on his Raid.
Now, Customer needs to backup System, Initialise Raid-Container or delete raid-Container and create it completely new.
Customer tried backup, Backup failed. CRC error. CS tried to Backup Data to Network. Copy failed. Read-Error. System crashed. System didnt came up any more. Customer said he havnt got very important DATAS on the System, but the COnfiguration is EXTREMELY complex. but he realizes that he HAS to Re-Install the System

-----
Solution:
Customer Deleted WHOLE raid-Array and Reinstalled System. Worked fine.
It took 3 WHOLE WORKING DAYS for the Customer to reinstall and reconfigure System. Customer has been at work during the whole weekend. Monday Morning, System was up and running again, production could be continued.

----------
?Fun?:

Customer called me again monday noon. "Raid-Controller must either be defect or Harddrives is defect. System wont boot any more."

Me: "what was the last thing you've done on the system ?"

Customer: "after i reinstalled the whole system, and that took 3 days, i wanted to check if system boots normally again to get sure everything is OK. rebooted, everything was fine. then i thought i'll just _REINITIALISE THE RAID-5_ just to get sure"

Me: "...... *gulp*..... you did.... what ?"

Customer: "to make sure the Raid container is OK this time, i Re-Initialised it"

Me: *takes a deep breath* "ok. initialising an Raid-Container is like FORMATING the harddrive. It writes an CLEAN Raid-Signature over ALL disks and erases all Datas. but i told you that when i explained you that you need to initialise the raid-container or create it completely new"

Customer: "oh....yea. i remember. and i've read the warning 'datas will be lost' but i didnt assume that would involve the datas on the disk. does that mean the last 3 days have been for NOTHING ?"

me: "havnt you made an backup after you installed the System ?"

Customer: "no, i wanted to make the backup today at evening"

Me: *sighs* "well, then yes. you have to reinstall the system again."

Customer: "no.... NOOOO.... *cries* why.... i ... +sighs+ well, however... thank you for your help. looks like the system does work normally."
-----------------------

such stories fill my heart with pure sadness....
why have such peoples the jobs i always want to have ?

Oh. and an good hint for everyone out there.
When you system is finally working PERFEKT after MONTHS of installation, configuration, optimation....
FORMAT YOUR HARDDRIVES! *G* ALL OF THEM!!

Date: 2005-09-26 10:19 am (UTC)
From: [identity profile] canthlian.livejournal.com
and i've read the warning 'datas will be lost' but i didnt assume that would involve the datas on the disk.

I once tried to figure out the thought process that leads to this sort of thing. Then my head imploded.

Date: 2005-09-26 11:38 am (UTC)
From: [identity profile] byh.livejournal.com
Disk caches I guess :)

Myself I often enjoy similar situations with unsaved documents.

Date: 2005-09-26 11:39 am (UTC)
ext_8716: (Default)
From: [identity profile] trixtah.livejournal.com
yeah, ditto. I mean, which data would be lost? The data floating around in the luminiferous ether? Hello?

And god, I hate SANs when they go buggy. I think of the SAN containers my backups are going to is going splortch, but it's intermittent and wierd and the only symptom is random pipeline errors when trying to copy backups from that disk to tape. *sigh*

Maybe I should get that guy to reinitialise my SAN container for me, now he's such an expert.

Date: 2005-09-26 01:02 pm (UTC)
From: [identity profile] greylady.livejournal.com
I love sarcasm in the morning... it goes so well with my coffee...

Date: 2005-09-26 01:49 pm (UTC)
From: [identity profile] infy.livejournal.com
If he keeps that up, the only containers he'll be allowed to "reinitialize" will be ones marked "TRASH".

Date: 2005-09-26 01:52 pm (UTC)
From: [identity profile] dysan27.livejournal.com
I really can't understand people who do that. Though I think it a symptom of the "There's a message on the screen, let's click yes to make it go away." mentality.

P.S. 'data' is already plural(the singular is 'datum'). Could You stop typing 'datas'. Unless that's what the customer actualy said, in which case I cry some more.

Date: 2005-09-26 03:19 pm (UTC)
jecook: (Default)
From: [personal profile] jecook
*dies laughing*

My former boss did that once. He was working on the server and accidenatlly re-formatted the old drives trying to upgrade the amount of storage on the RAID.

His backups were good, at least.

Date: 2005-09-26 03:23 pm (UTC)
From: [identity profile] kalidor.livejournal.com
Heehee .. this sounds so familiar .. sending this to a friend. Exactly what one of the techs working on their server did .. well sorta. Wanted to increase space, so they installed a new drive, and re-initialized the raid to incorperate the drive ... 'wow .. you mean .. all the data is gone?"

Date: 2005-09-26 05:28 pm (UTC)
From: [identity profile] starblazr.livejournal.com
i'm thinking lack of sleep.. makes competent techs look like complete idiots.
Page generated Mar. 20th, 2026 02:24 pm
Powered by Dreamwidth Studios