Re: [OpenIndiana-discuss] Problem with high cpu load (oi_151a)

Gernot Wolf Thu, 20 Oct 2011 12:48:38 -0700

Results are up, see other post...

Regards,
Gernot Wolf



Am 20.10.11 21:00, schrieb Michael Stapleton:

+1

Mike

On Thu, 2011-10-20 at 11:47 -0700, Rennie Allen wrote:

I'd like to see a run of the script I sent earlier.  I don't trust
intrstat (not for any particular reason, other than that I have never used
it)...


On 10/20/11 11:33 AM, "Michael Stapleton"
<michael.staple...@techsologic.com>  wrote:

Don't know. I don't like to trouble shoot by guess if possible. I rather
follow the evidence to capture the culprit. Use what we know to discover
what we do not know.

We know CS rate in vmstat is high, we know Sys time is high, we know
syscall rate is low, we know it is not a user process therefor it is
kernel. Likely a driver.

So what kernel code is running the most?

What's causing that code to run?

Does that code belong to a driver?


Mike



On Thu, 2011-10-20 at 20:25 +0200, Michael Schuster wrote:

Hi,

just found this:
http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html

does it help?

On Thu, Oct 20, 2011 at 20:23, Michael Stapleton
<michael.staple...@techsologic.com>  wrote:

My understanding is that it is not supposed to be a loaded system. We
want to know what the load is.


gernot@tintenfass:~# intrstat 30

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
    e1000g#0 |         1  0,0         0  0,0
      ehci#0 |         0  0,0         4  0,0
      ehci#1 |         3  0,0         0  0,0
   hci1394#0 |         0  0,0         2  0,0
     i8042#1 |         0  0,0         4  0,0
      i915#1 |         0  0,0         2  0,0
   pci-ide#0 |        15  0,1         0  0,0
      uhci#0 |         0  0,0         2  0,0
      uhci#1 |         0  0,0         0  0,0
      uhci#2 |         3  0,0         0  0,0
      uhci#3 |         0  0,0         2  0,0
      uhci#4 |         0  0,0         4  0,0

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
    e1000g#0 |         1  0,0         0  0,0
      ehci#0 |         0  0,0         3  0,0
      ehci#1 |         3  0,0         0  0,0
   hci1394#0 |         0  0,0         1  0,0
     i8042#1 |         0  0,0         6  0,0
      i915#1 |         0  0,0         1  0,0
   pci-ide#0 |         3  0,0         0  0,0
      uhci#0 |         0  0,0         1  0,0
      uhci#1 |         0  0,0         0  0,0
      uhci#2 |         3  0,0         0  0,0
      uhci#3 |         0  0,0         1  0,0
      uhci#4 |         0  0,0         3  0,0

gernot@tintenfass:~# vmstat 5 10
  kthr      memory            page            disk          faults
cpu
  r b w   swap  free  re  mf pi po fr de sr cd s0 s1 s2   in   sy   cs

us

sy id
  0 0 0 4243840 1145720 1  6  0  0  0  0  2  0  1  1  1 9767  121

37073 0

54 46
  0 0 0 4157824 1059796 4 11  0  0  0  0  0  0  0  0  0 9752  119

37132 0

54 46
  0 0 0 4157736 1059752 0  0  0  0  0  0  0  0  0  0  0 9769  113

37194 0

54 46
  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9682  104

36941 0

54 46
  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9769  105

37208 0

54 46
  0 0 0 4157728 1059772 0  1  0  0  0  0  0  0  0  0  0 9741  159

37104 0

54 46
  0 0 0 4157728 1059772 0  0  0  0  0  0  0  0  0  0  0 9695  127

36931 0

54 46
  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9762  105

37188 0

54 46
  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9723  102

37058 0

54 46
  0 0 0 4157744 1059788 0  0  0  0  0  0  0  0  0  0  0 9774  105

37263 0

54 46

Mike


On Thu, 2011-10-20 at 11:02 -0700, Rennie Allen wrote:

Sched is the scheduler itself.  How long did you let this run?  If

only

for a couple of seconds, then that number is high, but not

ridiculous for

a loaded system, so I think that this output rules out a high context
switch rate.

Try this command to see if some process is making an excessive

number of

syscalls:

dtrace -n 'syscall:::entry { @[execname]=count()}'

If not, then I'd try looking at interrupts...


On 10/20/11 10:52 AM, "Gernot Wolf"<gw.i...@chello.at>  wrote:

Yeah, I've been able to run this diagnostics on another OI box (at

my

office, so much for OI not being used in production ;)), and noticed
that there were several values that were quite different. I just

don't

have any idea on the meaning of this figures...

Anyway, here are the results of the dtrace command (I executed the
command twice, hence two result sets):

gernot@tintenfass:~# dtrace -n 'sched:::off-cpu {

@[execname]=count()}'

dtrace: description 'sched:::off-cpu ' matched 3 probes
^C

   ipmgmtd

   gconfd-2

   gnome-settings-d

   idmapd

   inetd

   miniserv.pl

   netcfgd

   nscd

   ospm-applet

   ssh-agent

   sshd

   svc.startd

   intrd

   afpd

   mdnsd

   gnome-power-mana

   clock-applet

   sendmail

   xscreensaver

fmd

   fsflush

   ntpd

   updatemanagernot

   isapython2.6

   devfsadm

   gnome-terminal

   dtrace

   mixer_applet2

   smbd

   nwam-manager

   svc.configd

   Xorg

   sched


gernot@tintenfass:~# dtrace -n 'sched:::off-cpu {

@[execname]=count()}'

dtrace: description 'sched:::off-cpu ' matched 3 probes
^C

   automountd

   ipmgmtd

   idmapd

   in.routed

   init

   miniserv.pl

   netcfgd

   ssh-agent

   sshd

   svc.startd

fmd

   hald

   inetd

   intrd

   hald-addon-acpi

   nscd

   gnome-power-mana

   sendmail

   mdnsd

   devfsadm

   xscreensaver

   fsflush

   ntpd

   updatemanagernot

   mixer_applet2

   isapython2.6

   dtrace

   gnome-terminal

   smbd

   nwam-manager

   zpool-rpool

   svc.configd

   Xorg

   sched


So, quite obviously there is one executable standing out here,

"sched",

now what's the meaning of this figures?

Regards,
Gernot Wolf


Am 20.10.11 19:22, schrieb Michael Stapleton:

Hi Gernot,

You have a high context switch rate.

try
#dtrace -n 'sched:::off-cpu { @[execname]=count()}'

For a few seconds to see if you can get the name of and

executable.


Mike
On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:

Hello all,

I have a machine here at my home running OpenIndiana oi_151a,

which

serves as a NAS on my home network. The original install was
OpenSolaris
2009.6 which was later upgraded to snv_134b, and recently to

oi_151a.


So far this OSOL (now OI) box has performed excellently, with

one major

exception: Sometimes, after a reboot, the cpu load was about

50-60%,

although the system was doing nothing. Until recently, another

reboot

solved the issue.

This does not work any longer. The system has always a cpu load

of

50-60% when idle (and higher of course when there is actually

some work

to do).

I've already googled the symptoms. This didn't turn up very much

useful

info, and the few things I found didn't apply to my problem. Most
noticably was this problem which could be solved by disabling

cpupm in

/etc/power.conf, but trying that didn't show any effect on my

system.


So I'm finally out of my depth. I have to admit that my

knowledge of

Unix is superficial at best, so I decided to try looking for

help here.


I've run several diagnostic commands like top, powertop,

lockstat etc.

and attached the results to this email (I've zipped the results

of

kstat
because they were>1MB).

One important thing is that when I boot into the oi_151a live dvd
instead of booting into the installed system, I also get the

high cpu

load. I mention this because I have installed several things on

my OI

box like vsftpd, svn, netstat etc. I first thought that this

problem

might be caused by some of this extra stuff, but getting the same
system
when booting the live dvd ruled that out (I think).

The machine is a custom build medium tower:
S-775 Intel DG965WHMKR ATX mainbord
Intel Core 2 Duo E4300 CPU 1.8GHz
1x IDE DVD recorder
1x IDE HD 200GB (serves as system drive)
6x SATA II 1.5TB HD (configured as zfs raidz2 array)

I have to solve this problem. Although the system runs fine and
absolutely serves it's purpose, having the cpu at 50-60% load
constantly
is a waste of energy and surely a rather unhealthy stress on the
hardware.

Anyone any ideas...?

Regards,
Gernot Wolf
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss



_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss




_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss



_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss



_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss




_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss



_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] Problem with high cpu load (oi_151a)

Reply via email to