Yep, I found that one too when I googled the symptoms of my box. There
may be some relation between the symptoms, on the other hand there are
also obvious differences: My system didn't start with almost no cpu load
that slowly increased until the system crashes, but showing over 50% cpu
load im
I haven't read all of this thread, but it reminded me of this bug:
https://www.illumos.org/issues/1333
Jeff.
On Sun, Oct 23, 2011 at 06:31:42PM +0200, Gernot Wolf wrote:
> Hello everyone,
>
> sorry, I'm two days late, but here, as I promised, are the results
> of a rerun of the diagnostic comma
Hello everyone,
sorry, I'm two days late, but here, as I promised, are the results of a
rerun of the diagnostic commands I initially run as well as the dtrace
commands you guys send me to troubleshoot my misbehaving system (see
attachements).
These are the results after I did
#eeprom acpi-u
Ok, I could not resist giving it a try, screw my bed ;)
Mike, bingo! That one hit home. With acpi-user-options set to 0x08 and
subsequent reboot cpu load is back to normal (that is load average=0.05).
I'll run my diagnostics again on my system and post the results in case
anyone is interested
Sorry, I was away from my desk for a while. Obviously this isn't an issue,
in fact, if anything, those numbers are surprisingly small.
On Thu, Oct 20, 2011 at 12:18 PM, Gernot Wolf wrote:
> Here are the results (let the script run for a few secs):
>
> CPU IDFUNCTION:NAME
Just checking ;-)
Night!
Mike
On Thu, 2011-10-20 at 22:40 +0200, Gernot Wolf wrote:
> No. Why?
>
> Regards,
> Gernot Wolf
>
>
> Am 20.10.11 22:33, schrieb Michael Stapleton:
> > Is this running in a VM?
> >
> > Mike
> >
> > On Thu, 2011-10-20 at 22:20 +0200, Gernot Wolf wrote:
> >
> >> Grep
Ok, I'll try that tomorrow. Too late to try anything that might result
in my box having booting problems ;)
Regards,
Gernot Wolf
Am 20.10.11 21:55, schrieb Michael Stapleton:
Probably just too big.
Are there any ACPI settings in the BIOS?
or we can try to change ACPI in OI.
#man eeprom
.
.
No. Why?
Regards,
Gernot Wolf
Am 20.10.11 22:33, schrieb Michael Stapleton:
Is this running in a VM?
Mike
On Thu, 2011-10-20 at 22:20 +0200, Gernot Wolf wrote:
Grep output attached. Hopefully this attachement will go through ;)
Regards,
Gernot Wolf
Am 20.10.11 21:25, schrieb Michael Sta
Yes, I noticed that to when I compared the lockstat output on my OI box
with that on the OI box at my office. There no Acpi debug tracing
functions are shown at all...
Mike made further suggestions concerning apci, but that will have to
wait for tomorrow. My bed is calling my name ;)
Regards
Is this running in a VM?
Mike
On Thu, 2011-10-20 at 22:20 +0200, Gernot Wolf wrote:
> Grep output attached. Hopefully this attachement will go through ;)
>
> Regards,
> Gernot Wolf
>
>
> Am 20.10.11 21:25, schrieb Michael Stapleton:
> > Attachment is missing...
> >
> > I'd like to see the who
I would not worry about it. The messages are being caused by some
problem. Lets focus on getting the messages.
Debug will increase your load, but not like you are seeing.
Mike
On Thu, 2011-10-20 at 22:10 +0200, Gernot Wolf wrote:
> Ok, here we go:
>
> gernot@tintenfass:~# mdb -k
> Loading modul
Ok, here we go:
gernot@tintenfass:~# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
pcplusmp scsi_vhci zfs ip hook neti sockfs arp usba uhci s1394 fctl
stmf_sbd stmf idm fcip cpc random sata crypto sd lofs logindmux ptm ufs
sppp smbsrv nfs ipc ]
> AcpiDbgLevel
> Acp
Well, I zipped it, the zipfile is just 211K? Shouldn't be a problem, I
think...
Regards,
Gernot Wolf
Am 20.10.11 21:38, schrieb James Carlson:
Gernot Wolf wrote:
Ok, for some reason this attachement refuses to go out :( Have to figure
that out...
Probably just because it's huge. Try "tai
Am 20.10.11 20:57, schrieb Michael Schuster:
On Thu, Oct 20, 2011 at 20:55, Michael Stapleton
wrote:
You might be right.
But 45% of what?
Profiling interrupt: 5844 events in 30.123 seconds (194 events/sec)
Count indv cuml rcnt nsec Hottest CPU+PIL
Caller
Probably just too big.
Are there any ACPI settings in the BIOS?
or we can try to change ACPI in OI.
#man eeprom
.
.
.
OPERANDS
x86 Only
acpi-user-options
A configuration variable that controls the use of
Advanced Configuration and Power Interface (ACPI), a
Results are up, see other post...
Regards,
Gernot Wolf
Am 20.10.11 21:00, schrieb Michael Stapleton:
+1
Mike
On Thu, 2011-10-20 at 11:47 -0700, Rennie Allen wrote:
I'd like to see a run of the script I sent earlier. I don't trust
intrstat (not for any particular reason, other than that I
Since Gernot is seeing the issue, maybe he wants to pitch in here?
He wants, he's just having a hard time keeping up with you guys. You're
so fast, I'm hopelessly lagging behind ;)
Thanks a lot for all the help so far to all of you!
Regards,
Gernot
__
Nope. Cpu load remains the same. top shows:
CPU states: 47.5% idle, 0.0% user, 52.5% kernel, 0.0% iowait, 0.0% swap
Regards,
Gernot Wolf
Am 20.10.11 20:25, schrieb Michael Schuster:
Hi,
just found this:
http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html
does it help?
Gernot Wolf wrote:
> Ok, for some reason this attachement refuses to go out :( Have to figure
> that out...
Probably just because it's huge. Try "tail -100 /var/adm/messages".
It's likely that if there's something going nuts on your system,
there'll be enough log-spam to identify it.
--
James C
Ok, for some reason this attachement refuses to go out :( Have to figure
that out...
Regards,
Gernot Wolf
Am 20.10.11 21:20, schrieb Gernot Wolf:
Ooops, something went wrong with my attachement. I'll try again...
Regards,
Gernot Wolf
Am 20.10.11 21:09, schrieb Gernot Wolf:
You mean, besid
I let it run (as all the other dtrace commands you guys have given me)
just for a couple of seconds. And no, it's not a loaded system, that's
the problem here. It's just a home NAS...
Here is the dtrace output:
gernot@tintenfass:/root# dtrace -n 'syscall:::entry { @[execname]=count()}'
dtrace:
Attachment is missing...
I'd like to see the whole things, but in the mean while
#grep -i acpi /var/adm/messages
Anything?
Mike
On Thu, 2011-10-20 at 21:09 +0200, Gernot Wolf wrote:
> You mean, besides being quite huge? I took a quick look at it, but other
> than getting a headache by doing
Here is something to check:
Pop into the debugger ( mdb -k) and see what AcpiDbgLevel's current setting is.
E.g:
AcpiDbgLevel/x
The default setting is 3. If its something higher, that would explain the high
incidence of
Acpi trace/debug calls.
To exit the debugger type $q or ::quit
S
Ooops, something went wrong with my attachement. I'll try again...
Regards,
Gernot Wolf
Am 20.10.11 21:09, schrieb Gernot Wolf:
You mean, besides being quite huge? I took a quick look at it, but other
than getting a headache by doing that, my limited unix skills
unfortunately fail me.
I've zi
Here are the results (let the script run for a few secs):
CPU IDFUNCTION:NAME
1 2 :END DEVICE TIME (ns)
i9151 22111
heci0 23119
pci-ide0 38700
uhci1 4
You mean, besides being quite huge? I took a quick look at it, but other
than getting a headache by doing that, my limited unix skills
unfortunately fail me.
I've zipped it an attached it to this mail, maybe someone can get
anything out of it...
Regards,
Gernot
Am 20.10.11 20:17, schrieb M
i86_mwait is the idle function the cpu is executing when it has
nothing else to do. Basically it sleeps inside of that function.
Lockstat based profiling just samples what is on cpu, so idle time
shows up as some form of mwait, depending on how the bios
is configured.
Steve
- Original
Profiling is AFAIK statistical, so it might not show the correct number.
Certainly the count of interrupts does not appear high, but if the handler
is spending a long time in the interrupt...
The script I sent measures the time spent in the handler (intrstat might do
this as well, but I just don'
+1
Mike
On Thu, 2011-10-20 at 11:47 -0700, Rennie Allen wrote:
> I'd like to see a run of the script I sent earlier. I don't trust
> intrstat (not for any particular reason, other than that I have never used
> it)...
>
>
> On 10/20/11 11:33 AM, "Michael Stapleton"
> wrote:
>
> >Don't know.
On Thu, Oct 20, 2011 at 20:55, Michael Stapleton
wrote:
> You might be right.
>
> But 45% of what?
>
> Profiling interrupt: 5844 events in 30.123 seconds (194 events/sec)
>
> Count indv cuml rcnt nsec Hottest CPU+PIL
> Caller
> --
You might be right.
But 45% of what?
Profiling interrupt: 5844 events in 30.123 seconds (194 events/sec)
Count indv cuml rcnt nsec Hottest CPU+PIL
Caller
---
2649 45% 45% 0.00 1070 cpu[1]
i86
I'd like to see a run of the script I sent earlier. I don't trust
intrstat (not for any particular reason, other than that I have never used
it)...
On 10/20/11 11:33 AM, "Michael Stapleton"
wrote:
>Don't know. I don't like to trouble shoot by guess if possible. I rather
>follow the evidence to
On Thu, Oct 20, 2011 at 20:33, Michael Stapleton
wrote:
> Don't know. I don't like to trouble shoot by guess if possible. I rather
> follow the evidence to capture the culprit. Use what we know to discover
> what we do not know.
if you're answering my question: I'm not guessing that much: I looke
Don't know. I don't like to trouble shoot by guess if possible. I rather
follow the evidence to capture the culprit. Use what we know to discover
what we do not know.
We know CS rate in vmstat is high, we know Sys time is high, we know
syscall rate is low, we know it is not a user process therefor
Hi,
just found this:
http://download.oracle.com/docs/cd/E19253-01/820-5245/ghgoc/index.html
does it help?
On Thu, Oct 20, 2011 at 20:23, Michael Stapleton
wrote:
> My understanding is that it is not supposed to be a loaded system. We
> want to know what the load is.
>
>
> gernot@tintenfass:~# i
My understanding is that it is not supposed to be a loaded system. We
want to know what the load is.
gernot@tintenfass:~# intrstat 30
device | cpu0 %tim cpu1 %tim
-+--
e1000g#0 | 1 0,0 0 0,0
ehci#0 | 0 0
Try the following script, which will identify any drivers with high
interrupt load
-
#!/usr/sbin/dtrace -s
sdt:::interrupt-start { self->ts = vtimestamp; }
sdt:::interrupt-complete
/self->ts && arg0 != 0/
{
this->devi = (struct dev_info *)arg0;
self->name = thi
Gernot,
is there anything suspicious in /var/adm/messages?
Michael
On Thu, Oct 20, 2011 at 20:07, Michael Stapleton
wrote:
> That rules out userland.
>
> Sched tells me that it is not a user process. If kernel code is
> executing on a cpu, tools will report the sched process. The count was
> ho
Wow, that was fast :)
Just caught me with the morning coffee email review.
Well, I just had a nice dinner :)
However, the NIC integrated on the Intel DG965WHMKR mainbord is an Intel
82566DC according to the device driver utility, the reported driver
e1000g. Isn't the bge driver for Broadco
That rules out userland.
Sched tells me that it is not a user process. If kernel code is
executing on a cpu, tools will report the sched process. The count was
how many times the process was taken off the CPU while dtrace was
running.
Lets see what kernel code is running the most:
#dtrace -n '
Sched is the scheduler itself. How long did you let this run? If only
for a couple of seconds, then that number is high, but not ridiculous for
a loaded system, so I think that this output rules out a high context
switch rate.
Try this command to see if some process is making an excessive numb
Your lockstat output fingers Acpi debug tracing functions.
I wonder why these are running in the first place.
Steve
- Original Message -
Hello all,
I have a machine here at my home running OpenIndiana oi_151a, which
serves as a NAS on my home network.
Yeah, I've been able to run this diagnostics on another OI box (at my
office, so much for OI not being used in production ;)), and noticed
that there were several values that were quite different. I just don't
have any idea on the meaning of this figures...
Anyway, here are the results of the
Sure do :-)
People tend to think only about ZFS and maybe Zones. They don't
understand dtrace and resource management.
Solaris is much more than ZFS, but you really have to know what you are
doing to appreciate it.
The true strength of Solaris is server side in the hands of
professionals.
Ther
On Thu, 2011-10-20 at 19:34 +0200, Gernot Wolf wrote:
> Wow, that was fast :)
Just caught me with the morning coffee email review.
> However, the NIC integrated on the Intel DG965WHMKR mainbord is an Intel
> 82566DC according to the device driver utility, the reported driver
> e1000g. Isn't the
Wow, that was fast :)
However, the NIC integrated on the Intel DG965WHMKR mainbord is an Intel
82566DC according to the device driver utility, the reported driver
e1000g. Isn't the bge driver for Broadcom NICs?
And what do you mean by "the other nvidia based interface"? This
mainboard has on
Dontchya just love dtrace?
On 10/20/11 10:22 AM, "Michael Stapleton"
wrote:
>Hi Gernot,
>
>You have a high context switch rate.
>
>try
>#dtrace -n 'sched:::off-cpu { @[execname]=count()}'
>
>For a few seconds to see if you can get the name of and executable.
>
>Mike
>On Thu, 2011-10-20 at 18:4
Hi Gernot,
You have a high context switch rate.
try
#dtrace -n 'sched:::off-cpu { @[execname]=count()}'
For a few seconds to see if you can get the name of and executable.
Mike
On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:
> Hello all,
>
> I have a machine here at my home running Ope
On Thu, 2011-10-20 at 18:44 +0200, Gernot Wolf wrote:
> Hello all,
>
> I have a machine here at my home running OpenIndiana oi_151a, which
> serves as a NAS on my home network. The original install was OpenSolaris
> 2009.6 which was later upgraded to snv_134b, and recently to oi_151a.
>
> So fa
49 matches
Mail list logo