Server Support Hotline Issue 4
Sep. 26th, 2005 11:28 am-------------------
Situation:
W2k3-SAN-Cluster where the System is 'controlled' and 'delegated' by one "Master Server" (Non-Redundant! *g*)
The Cluster can work alone. but new jobs will only be accepted when delegated by the Master-Server.
----------
Problem:
Harddrive went OFFLINE. CS tried to Rebuild Harddrive. Rebuild aborted. CS tested Harddrive with Diagnostic-Tool -> Diag tool said, Harddrive OK. CS Low-Level Formated Harddrive (worked) and tried to rebuild again. Rebuild failed again. CS took an NEW harddrive, Rebuild also failed with this one. Conclution -> Raid-5 is corrupt.
----------
Issue:
After an long talk to the System-Administrator, lots of Troubleshooting we found out that his Raid-5 is Inconsistent. Parity Informations dont fit Data expectations. CS never did an Consistency Check on his Raid.
Now, Customer needs to backup System, Initialise Raid-Container or delete raid-Container and create it completely new.
Customer tried backup, Backup failed. CRC error. CS tried to Backup Data to Network. Copy failed. Read-Error. System crashed. System didnt came up any more. Customer said he havnt got very important DATAS on the System, but the COnfiguration is EXTREMELY complex. but he realizes that he HAS to Re-Install the System
-----
Solution:
Customer Deleted WHOLE raid-Array and Reinstalled System. Worked fine.
It took 3 WHOLE WORKING DAYS for the Customer to reinstall and reconfigure System. Customer has been at work during the whole weekend. Monday Morning, System was up and running again, production could be continued.
----------
?Fun?:
Customer called me again monday noon. "Raid-Controller must either be defect or Harddrives is defect. System wont boot any more."
Me: "what was the last thing you've done on the system ?"
Customer: "after i reinstalled the whole system, and that took 3 days, i wanted to check if system boots normally again to get sure everything is OK. rebooted, everything was fine. then i thought i'll just _REINITIALISE THE RAID-5_ just to get sure"
Me: "...... *gulp*..... you did.... what ?"
Customer: "to make sure the Raid container is OK this time, i Re-Initialised it"
Me: *takes a deep breath* "ok. initialising an Raid-Container is like FORMATING the harddrive. It writes an CLEAN Raid-Signature over ALL disks and erases all Datas. but i told you that when i explained you that you need to initialise the raid-container or create it completely new"
Customer: "oh....yea. i remember. and i've read the warning 'datas will be lost' but i didnt assume that would involve the datas on the disk. does that mean the last 3 days have been for NOTHING ?"
me: "havnt you made an backup after you installed the System ?"
Customer: "no, i wanted to make the backup today at evening"
Me: *sighs* "well, then yes. you have to reinstall the system again."
Customer: "no.... NOOOO.... *cries* why.... i ... +sighs+ well, however... thank you for your help. looks like the system does work normally."
-----------------------
such stories fill my heart with pure sadness....
why have such peoples the jobs i always want to have ?
Oh. and an good hint for everyone out there.
When you system is finally working PERFEKT after MONTHS of installation, configuration, optimation....
FORMAT YOUR HARDDRIVES! *G* ALL OF THEM!!
no subject
Date: 2005-09-26 10:19 am (UTC)I once tried to figure out the thought process that leads to this sort of thing. Then my head imploded.
no subject
Date: 2005-09-26 10:25 am (UTC)no subject
Date: 2005-09-26 11:38 am (UTC)Myself I often enjoy similar situations with unsaved documents.
no subject
Date: 2005-09-26 11:39 am (UTC)And god, I hate SANs when they go buggy. I think of the SAN containers my backups are going to is going splortch, but it's intermittent and wierd and the only symptom is random pipeline errors when trying to copy backups from that disk to tape. *sigh*
Maybe I should get that guy to reinitialise my SAN container for me, now he's such an expert.
no subject
Date: 2005-09-26 11:46 am (UTC)dont let them get even NEAR your SAN
no subject
Date: 2005-09-26 01:02 pm (UTC)no subject
Date: 2005-09-26 01:49 pm (UTC)no subject
Date: 2005-09-26 01:52 pm (UTC)P.S. 'data' is already plural(the singular is 'datum'). Could You stop typing 'datas'. Unless that's what the customer actualy said, in which case I cry some more.
no subject
Date: 2005-09-26 01:59 pm (UTC)well, iam not native english speaking. that for my bad english :)
i'll try to keep that in mind that data is right word for datas.
in german its singular "datei" and plural "Dateien" > "data" and "datas"
and "datum" is german and is translated with "date" (xx.yy.zzzz)
no subject
Date: 2005-09-26 03:19 pm (UTC)My former boss did that once. He was working on the server and accidenatlly re-formatted the old drives trying to upgrade the amount of storage on the RAID.
His backups were good, at least.
no subject
Date: 2005-09-26 03:23 pm (UTC)no subject
Date: 2005-09-26 05:28 pm (UTC)no subject
Date: 2005-09-27 06:42 am (UTC)