[identity profile] japester.livejournal.com posting in [community profile] techrecovery
something you never ever want to see on a production server:


# sar -f sa27 -g

SunOS <hostname> 5.9 Generic_117171-08 sun4u 04/27/2005
00:00:01  pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
01:00:00     0.94     1.98     1.98     0.00     0.05
02:00:00     0.76     1.06     1.05     0.00     0.96

<snippety>

16:00:00    11.38    25.32    25.31     0.00     0.00
16:20:00    10.85    26.22    26.21     0.00     0.00
16:40:00    38.04   270.21   273.24    60.48     0.00
17:00:00    57.50   590.54   592.73   151.67     0.00
17:20:00    44.11   569.10   571.56   161.28     0.00
17:40:01    79.33   795.55   812.00   667.62     0.00
18:00:01    59.29   703.71   723.77  1465.26     0.00

Average      9.51    65.60    66.46    51.43     0.71


the explanation for the non unix geeks. Page in/outs are chunks (pages) of memory being swapped to or from disk. Page scans are requests for a free page of memory. 200 is the upper limit you'd ever expect to see on a fully loaded server. This poor beast seemed to have some root owned process doing:

while (true) {
malloc();
}

and we wondered why it became completely unresponsive at about 18:30 ... We hadn't noticed before because we were too busy playing CounterStrike. Would have been much worse if we'd done the normal 17:00 thing and gone home then (and not noticed until 0830 the next morning)

Date: 2005-04-28 06:58 am (UTC)
From: [identity profile] sean-langley.livejournal.com
That's...very bad.

Even though this isn't a server and is a workstation, I'm running like 3 Adobe apps, e-mail, irc, and a VNC client, and I get:

23:56 up 22:56, 2 users, load averages: 0.54 0.64 1.01

I'd have better uptime, but had to reboot for a kernel update.

Date: 2005-04-28 12:35 pm (UTC)
From: [identity profile] taleya.livejournal.com
oh christ

Not something you want to see on a production box. not at all. Your alert system must have been screaming at you!

Date: 2005-04-28 12:53 pm (UTC)
From: [identity profile] compwizrd.livejournal.com
i can remember working on a SGI Challenge back in the mid 90's that REGULARLY hit 900 load average.

Admins claimed nothing was wrong.

It usually took about 5 minutes before something you typed came back to your terminal.

These are the same admins that claimed eggdrop bots crashed their server...

Date: 2005-04-28 06:25 pm (UTC)
From: [identity profile] sean-langley.livejournal.com
Yeah, I realised that after posting.

Heh, just to humor you though:

Processes: 73 total, 3 running, 70 sleeping... 241 threads 11:24:32
Load Avg: 0.52, 0.83, 0.70 CPU usage: 57.4% user, 30.7% sys, 11.9% idle
SharedLibs: num = 121, resident = 26.5M code, 2.37M data, 8.27M LinkEdit
MemRegions: num = 11463, resident = 161M + 11.3M private, 161M shared
PhysMem: 77.4M wired, 285M active, 143M inactive, 506M used, 5.84M free
VM: 7.12G + 83.5M 125951(104) pageins, 88560(27) pageouts

Date: 2005-04-29 01:15 pm (UTC)
From: [identity profile] taleya.livejournal.com
aii..

We use Nagios, so the agent keeps track of mem usage, swap useage, HDD space, ping responses, telnet/ftp etc...I love it ^_^

Profile

techrecovery: (Default)
Elitist Computer Nerd Posse

April 2017

S M T W T F S
      1
2345678
91011121314 15
16171819202122
23242526272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 20th, 2026 02:57 pm
Powered by Dreamwidth Studios