Server Support Hotline Issue 4
Sep. 26th, 2005 11:28 am-------------------
Situation:
W2k3-SAN-Cluster where the System is 'controlled' and 'delegated' by one "Master Server" (Non-Redundant! *g*)
The Cluster can work alone. but new jobs will only be accepted when delegated by the Master-Server.
----------
Problem:
Harddrive went OFFLINE. CS tried to Rebuild Harddrive. Rebuild aborted. CS tested Harddrive with Diagnostic-Tool -> Diag tool said, Harddrive OK. CS Low-Level Formated Harddrive (worked) and tried to rebuild again. Rebuild failed again. CS took an NEW harddrive, Rebuild also failed with this one. Conclution -> Raid-5 is corrupt.
----------
Issue:
After an long talk to the System-Administrator, lots of Troubleshooting we found out that his Raid-5 is Inconsistent. Parity Informations dont fit Data expectations. CS never did an Consistency Check on his Raid.
Now, Customer needs to backup System, Initialise Raid-Container or delete raid-Container and create it completely new.
Customer tried backup, Backup failed. CRC error. CS tried to Backup Data to Network. Copy failed. Read-Error. System crashed. System didnt came up any more. Customer said he havnt got very important DATAS on the System, but the COnfiguration is EXTREMELY complex. but he realizes that he HAS to Re-Install the System
-----
Solution:
Customer Deleted WHOLE raid-Array and Reinstalled System. Worked fine.
It took 3 WHOLE WORKING DAYS for the Customer to reinstall and reconfigure System. Customer has been at work during the whole weekend. Monday Morning, System was up and running again, production could be continued.
----------
?Fun?:
Customer called me again monday noon. "Raid-Controller must either be defect or Harddrives is defect. System wont boot any more."
Me: "what was the last thing you've done on the system ?"
Customer: "after i reinstalled the whole system, and that took 3 days, i wanted to check if system boots normally again to get sure everything is OK. rebooted, everything was fine. then i thought i'll just _REINITIALISE THE RAID-5_ just to get sure"
Me: "...... *gulp*..... you did.... what ?"
Customer: "to make sure the Raid container is OK this time, i Re-Initialised it"
Me: *takes a deep breath* "ok. initialising an Raid-Container is like FORMATING the harddrive. It writes an CLEAN Raid-Signature over ALL disks and erases all Datas. but i told you that when i explained you that you need to initialise the raid-container or create it completely new"
Customer: "oh....yea. i remember. and i've read the warning 'datas will be lost' but i didnt assume that would involve the datas on the disk. does that mean the last 3 days have been for NOTHING ?"
me: "havnt you made an backup after you installed the System ?"
Customer: "no, i wanted to make the backup today at evening"
Me: *sighs* "well, then yes. you have to reinstall the system again."
Customer: "no.... NOOOO.... *cries* why.... i ... +sighs+ well, however... thank you for your help. looks like the system does work normally."
-----------------------
such stories fill my heart with pure sadness....
why have such peoples the jobs i always want to have ?
Oh. and an good hint for everyone out there.
When you system is finally working PERFEKT after MONTHS of installation, configuration, optimation....
FORMAT YOUR HARDDRIVES! *G* ALL OF THEM!!