[This is the continuation of a thread that started on -committers]
On Sun, Sep 16, 2001 at 02:48:48PM +0100, Josef Karthauser wrote:
> On Sun, Sep 16, 2001 at 01:35:20AM +0100, Josef Karthauser wrote:
> > On Sat, Sep 15, 2001 at 03:51:07PM +0200, Dag-Erling Smorgrav wrote:
> > > Josef Karthauser <[EMAIL PROTECTED]> writes:
> > > > Is there a possibility that this commit is causing me to lose key
> > > > presses? I'm finding it hard to imagine that I'm miss typing as
> > > > I've never noticed it before. (Every N, where N is > 30 or 40, a key
> > > > that I press doesn't register and I have to press it again).
> > >
> > > Educated guess: your interrupt latency just went to hell (where mine's
> > > been for three months now, I'm still waiting to hear if Matt could
> > > make any sense out of my crash dump) and you're losing interrupts. If
> > > you have a serial mouse, try moving it around a lot and see if it
> > > seems to hang (you should see mentions of interrupt-level buffer
> > > overflows in your /var/log/messages). Also, just for kicks, check how
> > > much CPU time your syncer process is using, and try running sync(8)
> > > and see if your keyboard wedges for a couple of seconds when you do
> > > that.
> >
> > My mouse is /dev/psm0. From time to time the ata device's
> > interrupt/second goes through the roof for not apparent reason (i.e.
> > several hundred interrupts/sec). Sync never wedges anything.
>
> There's almost definitely an interrupt problem. I regularly have
> the machine wedge almost solid when rsyncing a lot of data to and
> fro. The machine begins to behave eratically, which I now think
> happens mainly because all the timers stop working (maybe the
> interrupts stop working?), 'systat -vmstat' doesn't produce any
> numbers because the initial time delay never passes. :(. Also, I
> don't appear to be able to enter the kernel debugger when this
> happens! :( Can someone in the know give me a hand debugging this.
> It really ought to be fixed, but my knowledge isn't sufficient to
> find this on my own.
>
> Thanks,
> Joe
This also happens from time to time:
6 users Load 1.39 1.23 1.14 Sep 21 13:32
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 62696 8932 111764 14728 15052 count
All 249864 12164 2806932 25860 pages
Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt 1 cow 1743 total
6 32 12398 13 866 1823 26 45516 wire stray irq0
90820 act stray irq6
8.3%Sys 5.1%Intr 0.2%User 0.0%Nice 86.4%Idl 102140 inact stray irq7
| | | | | | | | | | 11388 cache 1 acpi0 irq9
====+++ 3664 free 1505 ata0 irq14
daefr uhci0 irq5
Namei Name-cache Dir-cache 5 prcfr 2 pcm0 irq5
Calls hits % hits % react 7 atkbd0 irq
688 687 100 pdwak psm0 irq12
4 zfod pdpgs 100 clk irq0
Disks ad0 fd0 ofod intrn 128 rtc irq8
KB/t 6.00 0.00 9 %slo-z 35712 buf
tps 1507 0 7 tfree 10 dirtybuf
MB/s 8.83 0.00 17913 desiredvnodes
% busy 98 0 14595 numvnodes
4798 freevnodes
Look at the number of interrupts that the ata device is generating.
This is in no way normal! It happens randomly and causes the machine
to basically grind to a halt.
As a comparison on the same machine, here's the output of systat -vmstat
for the machine after I rebooted it and it was running a background
fsck:
4 users Load 1.01 0.42 0.16 Sep 21 13:50
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 40328 3848 71980 4408 53308 count
All 200248 6884 1085132 10232 pages
Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt cow 329 total
2 30 622 11 955 402 2 34 35928 wire stray irq0
35492 act stray irq6
1.4%Sys 1.9%Intr 1.2%User 0.6%Nice 94.9%Idl 128800 inact stray irq7
| | | | | | | | | | 28 cache acpi0 irq9
=+- 53280 free 97 ata0 irq14
daefr uhci0 irq5
Namei Name-cache Dir-cache prcfr 1 pcm0 irq5
Calls hits % hits % react 3 atkbd0 irq
536 534 100 pdwak psm0 irq12
8 zfod pdpgs 100 clk irq0
Disks ad0 fd0 1 ofod intrn 128 rtc irq8
KB/t 7.99 0.00 7 %slo-z 35712 buf
tps 97 0 1 tfree 33 dirtybuf
MB/s 0.76 0.00 17913 desiredvnodes
% busy 98 0 1655 numvnodes
29 freevnodes
Who's responsible for this area? I'm happy to help in getting to the
bottom of it. Is it an interrupt routing problem? It is a ata device
problem? It is something else (maybe locking) altogether?
This problem has existed in -current for at least 6 weeks.
Thanks for any suggestions,
Joe
PGP signature