Hi,
Any idea on how it might be possible to boot the system step by step to
get an idea of where this bug might be isolated?
I strip the boot process as much as possible and this is a very old
issue, but may be there is a way to find more in it. Looking at it more,
I think, it's possibly in the scheduler of the kernel. I can see this
problem only on Sun systems, either with the X1 or the V100 so far.
Rebooting the system will give you either a load of 1.08 to 1.12 or 0.08
to 0.12.
I strip the system as much as I can from daemon start now to show it well.
# cat /etc/rc.conf.local
sshd_flags=NO
sendmail_flags=NO
syslogd_flags=NO
inetd=NO # almost always needed
and you can see there isn't anything running on the system to justify
this load.
# ps -auxwk
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 3 99.0 0.0 0 0 ?? DK 6:15PM 7:08.13 (idle0)
root 8 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00
(pagedaemon)
root 9 0.0 0.0 0 0 ?? DK 6:15PM 0:00.26 (reaper)
root 12 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (aiodoned)
root 11 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (update)
root 10 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (cleaner)
root 13 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (crypto)
root 0 0.0 0.0 0 0 ?? DKs 6:15PM 0:00.00 (swapper)
root 4 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (syswq)
root 2 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (kmthread)
root 1 0.0 0.1 616 408 ?? Is 6:15PM 0:00.01 /sbin/init
root 7 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (pfpurge)
root 6 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (usbtask)
root 5 0.0 0.0 0 0 ?? DK 6:15PM 0:00.01 (usb0)
root 6772 0.0 0.2 664 1040 ?? Ss 6:15PM 0:00.02 cron
root 11233 0.0 0.1 552 528 00 Ss 6:15PM 0:00.08 -ksh (ksh)
root 32400 0.0 0.1 416 352 00 R+ 6:22PM 0:00.00 ps -auxwk
however, you get this:
# uptime
6:22PM up 8 mins, 1 user, load averages: 1.08, 0.89, 0.48
# sysctl vm.loadavg
vm.loadavg=1.08 0.89 0.48
# sysctl kern.nprocs
kern.nprocs=17
# sysctl kern.version
kern.version=OpenBSD 4.4 (GENERIC) #1714: Wed Aug 6 13:31:49 MDT 2008
[EMAIL PROTECTED]:/usr/src/sys/arch/sparc64/compile/GENERIC
# sysctl hw.model
hw.model=SUNW,UltraSPARC-IIe (rev 3.3) @ 548 MHz
I tried a few different things with boot -c to see, but so far, I can't
isolate where this might be.
The only thing I get is that it is ONLY and ALWAYS from the start of the
system.
So, either it will be off by one on boot, or good.
Needs to be rebooted may be 5 times to get the real reading, (not off by
1) but then you can get that.
Any suggestion on how I could get more details to dig this more?
I was thinking of may be putting some kind of delay in the scheduler in
case it might be possible to isolate it more that way, but I am not sure
how I could do it.
Or may be log from the scheduler to get what process add/remove to the
load average here, but again no success doing that yet.
This is not really hardware broken as I can do that on way more then 20
different systems here.
May be this might affect something else in the scheduler as looking at
the code looks like some process are schedule based on their load and
how long they have run. So, if the data is wrong, it may well lead to
other issues cause by this.
Any possible suggestions to try to dig this up more and get may be more
valuable informations?
One thing for sure, it's always either right, or off by one when present.
Thanks
Daniel