Hell in a handbasket...
Jan. 23rd, 2003 09:45 pmI'm so glad today's over...
For the last couple of weeks I've been working on this years school ICT equipment bulk capital purchase - around half a million pounds worth of stuff. Trying to collate all the requests from the schools is a total nightmare. We provided them with a menu of about 11 Apple and Dell configurations and some other assorted hardware such as printers, wireless access points, projectors, interactive whiteboards and the like, but a number of the returns were a mess and a surprising number of schools can't add up figures! And to top that off we've got some extra funding which schools can bid of a proportion of. Myself and another couple of members of the project team have been working 12 hour days this week to try and get as much of the purchase organised as quickly as possible so we can concentrate on the next stages.
On top of that the schools email system which I'm responsible for administering (Symantec MailGear running on Solaris) has been up the creek for the last three and a half days. There are somewhere in the region of 30,000 users on the system and none of them have been able to get at their mail since Monday late afternoon. So my job queue is through the roof just now.
Also, one of our secondary schools has lost a large chunk of its network - the PDC lost two of the three SCSI drives in its RAID something-or-other array which put it out of commission. And I heard someone say that one of the BDC's at the same site is reporting a processor or mainboard failure. Great.
As if that wasn't bad enough, I got in to the office at 8.15 this morning and noticed that my machine was really slow to boot. Eventually got logged in to our domain and opened up Outlook to access my Exchange mailbox to find that our Exchange server was also down.
Then one of my colleagues appears at my door and tells me that he hasn't got any web access. Get my browser open and can get as far as our proxy server but it cannot connect to anything external to us. Hmm, thats odd. Change my proxy settings to our other corporate proxy and get the same result. Change again to one of our school proxy servers and get exactly the same result again. Damn. OK so as well as having 30,000 users without email access I now have the same number of users without web access too. Turns out that the Solaris box that handles forward proxy requests (IIRC our network topology correctly this box sits between the central proxy and the firewall) had also fallen over. And it just so happens that this is the same box that is queuing all the school email while our MailGear server is fubar'd.
Then one of our primary internal DNS servers falls over so people all over the building start to experience really odd behaviour, and it meant that I had problems getting reconnected to the helpdesk system. Gah.
All of this before 8.45 this morning. I had a project team meeting scheduled from 1pm - 2pm today and it ran over. It damn well ran until 6.15pm. Only 4 and a bit hours over. I feel an exception report coming on.
Did I mention I hate Symantec MailGear?
For the last couple of weeks I've been working on this years school ICT equipment bulk capital purchase - around half a million pounds worth of stuff. Trying to collate all the requests from the schools is a total nightmare. We provided them with a menu of about 11 Apple and Dell configurations and some other assorted hardware such as printers, wireless access points, projectors, interactive whiteboards and the like, but a number of the returns were a mess and a surprising number of schools can't add up figures! And to top that off we've got some extra funding which schools can bid of a proportion of. Myself and another couple of members of the project team have been working 12 hour days this week to try and get as much of the purchase organised as quickly as possible so we can concentrate on the next stages.
On top of that the schools email system which I'm responsible for administering (Symantec MailGear running on Solaris) has been up the creek for the last three and a half days. There are somewhere in the region of 30,000 users on the system and none of them have been able to get at their mail since Monday late afternoon. So my job queue is through the roof just now.
Also, one of our secondary schools has lost a large chunk of its network - the PDC lost two of the three SCSI drives in its RAID something-or-other array which put it out of commission. And I heard someone say that one of the BDC's at the same site is reporting a processor or mainboard failure. Great.
As if that wasn't bad enough, I got in to the office at 8.15 this morning and noticed that my machine was really slow to boot. Eventually got logged in to our domain and opened up Outlook to access my Exchange mailbox to find that our Exchange server was also down.
Then one of my colleagues appears at my door and tells me that he hasn't got any web access. Get my browser open and can get as far as our proxy server but it cannot connect to anything external to us. Hmm, thats odd. Change my proxy settings to our other corporate proxy and get the same result. Change again to one of our school proxy servers and get exactly the same result again. Damn. OK so as well as having 30,000 users without email access I now have the same number of users without web access too. Turns out that the Solaris box that handles forward proxy requests (IIRC our network topology correctly this box sits between the central proxy and the firewall) had also fallen over. And it just so happens that this is the same box that is queuing all the school email while our MailGear server is fubar'd.
Then one of our primary internal DNS servers falls over so people all over the building start to experience really odd behaviour, and it meant that I had problems getting reconnected to the helpdesk system. Gah.
All of this before 8.45 this morning. I had a project team meeting scheduled from 1pm - 2pm today and it ran over. It damn well ran until 6.15pm. Only 4 and a bit hours over. I feel an exception report coming on.
Did I mention I hate Symantec MailGear?
no subject
Date: 2003-01-24 12:16 am (UTC)no subject
Date: 2003-01-24 03:11 am (UTC)I think you should top it off by telling people your taking a two week vacation.
no subject
Date: 2003-01-24 01:03 pm (UTC)The purchasing project is going well. Not a single exception report this week.
DNS and internet access were restored fairly swiftly as was the Exchange server after a swift kicking.
The MailGear server reappeared on the network at about 4pm yesterday - it was screamingly quick but the problem was there were no users on it apart from admin and a techie.
MailGear services finally returned properly at 10.43am this morning. The über-tech who looks after the box finally relented and came to the conclusion that the only course of action left was to wipe the system and rebuild everything from scratch while applying all available updates in one whack. So a totally fresh install and config of Solaris and a new install of the latest build of MailGear that Symantec mailed over to us (2.0.0 build 62 if anyone cares).
Whoop-de-do and hallelujah.
The only remaining issue was that we lost just under two days worth of email.
See, from Monday late afternoon until Wedensday morning the Mailgear server was up for like 30 seconds then it'd fall off the face of the earth again. 20 minutes later up it'd come for 30 seconds the vanish again. Elrond, the box that was queuing the mail for it sort of went 'cool, you're back, here have all this mail' which promptly killed the damn MailGear box. 20 minutes later when MailGear came back up Elrond forwarded another stack of mail at which point MailGear died again.
The problem was that when the MailGear server popped onto the network yesterday afternoon with no users, Elrond forwarded all the mail that was sitting in the queue and MailGear promptly went 'don't know who the hell that message is for' and promptly sent it all back to the originator. Gah.
Never mind, next week must be better. It just has to be.
Cheers
Steve