Hello, all!

On 5/27/23 07:49, Mike Larkin wrote:

I don't know what's wrong with atapi CD emulation on wdc(4), my recommendation
would be to move the cd to a vioscsi device instead of wdc.

Yes we know various workarounds, but more detailed view shows that there exists kernel memory corruption that is somehow related to ATAPI timeouts - leading to trap when accessing xfer->chp ...

I build stable-7.3 kernel with this patch:

--- dev/ic/wdc.c        31 Dec 2019 10:05:32 -0000      1.136
+++ dev/ic/wdc.c        28 May 2023 08:24:04 -0000
@@ -883,8 +883,10 @@ wdcstart(struct channel_softc *chp)
                return;
        }

+       printf("HP: xfer=%p orig chp=%p\n",xfer,chp);
        /* adjust chp, in case we have a shared queue */
        chp = xfer->chp;
+       printf("HP: xfer=%p xfer->chp=%p\n",xfer,chp);

        if ((chp->ch_flags & WDCF_ACTIVE) != 0 ) {
                return; /* channel already active */


And here is what I got:

... lot of messages with occasional timeout message ...
HP: xfer=0xfffffd807e020c38 orig chp=0xffff80000007c168
HP: xfer=0xfffffd807e020c38 xfer->chp=0xffff80000007c168
HP: xfer=0xfffffd807e020c38 orig chp=0xffff80000007c168
HP: xfer=0xfffffd807e020c38 xfer->chp=0x6e1e3d12d428657b
kernel: protection fault trap, code=0
Stopped at      wdcstart+0x49:  movl    0x58(%r15),%eax
ddb> trace
wdcstart(ffff80000007c168,ffff80000007c168,ffff80000007c168,fffffd807e020c38,10,ffff800021707a90)
 at wdcstart+0x49
wdc_atapi_the_machine(ffff80000007c168,fffffd807e020c38,2,ffff80000007c168,ffff80000007c168,fffffd807e020c38)
 at wdc_atapi_the_machine+0x14a
wdc_atapi_intr(ffff80000007c168,fffffd807e020c38,1,ffff80000007c168,fffffd807e020c38,ffff80000007c168)
 at wdc_atapi_intr+0x47
wdcintr(ffff80000007c168,ffff80000007c168,0,0,6,1)
at wdcintr+0xaeintr_handler(ffff800021707bf0,ffff800000065500,ffff800000065680,0,ffffffff81212216,ffff800021707be0)
 at intr_handler+0x26
Xintr_ioapic_edge14_untramp(0,ffffff9c,ffffffff81483770,0,ffff8000216cd060,75f3d68ef248)
 at Xintr_ioapic_edge14_untramp+0x18f
ndinitat(ffff8000216cd060,ffffffffffffff9c,2e96cea10,75f3d68ef248,0,ffff8000216cd060)
 at ndinitat
syscall(ffff800021707ef0,ffff800021707ef0,0,ffff8000216cd060,0,0)
 at syscall+0x201
Xsyscall(6,26,5,26,2e96cea10,0)
 at Xsyscall+0x128
end of kernel
end trace frame: 0x75f3d68ef2f0, count: -9
ddb>


So it seems that part of xfer structure is under some rare condition overwritten.

The question is how to find what is causing that corruption.

Best regards
  --Henryk Paluch

Reply via email to