I've got a stumper of one here.
Aug. 19th, 2006 04:20 pmAnd I'll cut it just for you.
I've been fighting with my company's Ghost Console for the past week, and I need some help, because I've got what I think is a stumper. It's got me stumped, at least.
The issue is that we have an office that has a lab of 13 machines that we re-image remotely on a regular basis (or did until it stopped working earlier this week). The server that runs the show is at a different office some 5 miles distant across a Metro Ethernet / DS3 connection.
The server is at 10.10.128.x, the machines are at 10.20.10.x. Pings, network traffic goes just peachy. Also, the remote client install goes fine as well. What happens is that the server will initate a task (either image creation or clone) and it will sit and time out. The clients appear to just sit there and do absolutely zilch. It's like they don't catch the signal to start the task.
I was able to initally duplicate the problem on a single test machine back at the facility where the server lives, but I thought I had it fixed by completely removing the machine from the console, removing the tasks it was associated with, and then re-adding it from scratch (including reinstalling the client). This appeared to work just peachy, but I can't get the machines I've removed to show back up in the console.
Anyone know of a way to either manually add machines to the console, or to force it to re-discover all the machines on the network?
::puts on flame-resistant suit and digs out the whips::
EDIT:::: It looks like a multicast over WAN, combined with the machines not wanting to report back to the server across said WAN link, possibly using the same multi-cast method. I'd asked one of the BNAFH* to look at it, and we'll see what happens.
*- Bastard Network Admins From Hell. DOn't piss them off, or they will route all your traffic through SimonNet. :)
I've been fighting with my company's Ghost Console for the past week, and I need some help, because I've got what I think is a stumper. It's got me stumped, at least.
The issue is that we have an office that has a lab of 13 machines that we re-image remotely on a regular basis (or did until it stopped working earlier this week). The server that runs the show is at a different office some 5 miles distant across a Metro Ethernet / DS3 connection.
The server is at 10.10.128.x, the machines are at 10.20.10.x. Pings, network traffic goes just peachy. Also, the remote client install goes fine as well. What happens is that the server will initate a task (either image creation or clone) and it will sit and time out. The clients appear to just sit there and do absolutely zilch. It's like they don't catch the signal to start the task.
I was able to initally duplicate the problem on a single test machine back at the facility where the server lives, but I thought I had it fixed by completely removing the machine from the console, removing the tasks it was associated with, and then re-adding it from scratch (including reinstalling the client). This appeared to work just peachy, but I can't get the machines I've removed to show back up in the console.
Anyone know of a way to either manually add machines to the console, or to force it to re-discover all the machines on the network?
::puts on flame-resistant suit and digs out the whips::
EDIT:::: It looks like a multicast over WAN, combined with the machines not wanting to report back to the server across said WAN link, possibly using the same multi-cast method. I'd asked one of the BNAFH* to look at it, and we'll see what happens.
*- Bastard Network Admins From Hell. DOn't piss them off, or they will route all your traffic through SimonNet. :)
no subject
Date: 2006-08-19 11:29 pm (UTC)/me dodges the whip
no subject
Date: 2006-08-20 12:26 am (UTC)no subject
Date: 2006-08-20 12:50 am (UTC)no subject
Date: 2006-08-20 01:14 am (UTC)I'm an enthusiastic follower of "Plug the whole frigging lot out, go for a pint, come back and hope for the best" school of thought.
no subject
Date: 2006-08-20 01:47 am (UTC)no subject
Date: 2006-08-20 10:47 pm (UTC)no subject
Date: 2006-08-20 11:22 pm (UTC)no subject
Date: 2006-08-21 01:24 pm (UTC)no subject
Date: 2006-08-21 01:37 pm (UTC)no subject
Date: 2006-08-21 01:56 pm (UTC)no subject
Date: 2006-08-21 01:58 pm (UTC)no subject
Date: 2006-08-20 02:18 am (UTC)I know I sure don't have the balls to flame a mod.
Now if you were a first-time posting n00b, that'd be a different matter entirely...
no subject
Date: 2006-08-20 03:27 am (UTC)no subject
Date: 2006-08-20 03:30 am (UTC)no subject
Date: 2006-08-20 07:11 pm (UTC)no subject
Date: 2006-08-20 02:32 am (UTC)no subject
Date: 2006-08-20 02:47 am (UTC)Do you have a security group that deals with firewalls? Did they do any updates or patching or "general maintenance" recently? You'd be surprised how many times routers go south after extended uptime because someone forgot to write the running code to the boot flash....
no subject
Date: 2006-08-20 03:25 am (UTC)Frankly, I'm thinking that it's the telco and their metro ethernet thing-y causing the multicast issue. I think I'll toss the thing into forced unicast mode just for kicks to see what it does.
no subject
Date: 2006-08-20 08:39 am (UTC)Hah! I'll lay money they're lying to you. Give it a couple of weeks and they'll say "oh, we did change this, but that won't affect it!". Never trust a network admin who denies everything :)
no subject
Date: 2006-08-20 10:05 am (UTC)According to Symantec, if clients don't respond it'll try a second time and them mark the machine record as timed out before moving on - this can take up to 3 min per attempt. What they don't say is how to clear the timed out status, and I haven't used it in a while so I'm buggered if I can remember :)
Just a silly thought (because you have to have at least one per comment ;) ). Did one of the last images to successfully go out have any tweaks over the previous version that may stop the client running? I only ask because of a mare of a time I had with SMS2003 whereby the clients had set their cache size back to the default without asking and a rights cockup had locked off the temmporary storage location, so nothing would deliver.
no subject
Date: 2006-08-21 07:23 pm (UTC)If the telnet works, your problem isn't network-related; talk to the software vendor about why their app is misbehaving.
If the telnet fails, your networking team or the telco is lying to you, and there's been a recent network change; deliver richly-deserved beatings until the problem is solved.
no subject
Date: 2006-08-21 08:29 pm (UTC)Here Here!
Date: 2006-08-23 08:49 am (UTC)no subject
Date: 2006-08-23 08:50 am (UTC)no subject
Date: 2006-08-24 04:36 am (UTC)See edit for what I've at least narrowed it down to.