Hello, all!
On 5/27/23 07:49, Mike Larkin wrote:
I don't know what's wrong with atapi CD emulation on wdc(4), my recommendation
would be to move the cd to a vioscsi device instead of wdc.
Yes we know various workarounds, but more detailed view shows that there
exists kernel memory corruption that is somehow related to ATAPI
timeouts - leading to trap when accessing xfer->chp ...
I build stable-7.3 kernel with this patch:
--- dev/ic/wdc.c 31 Dec 2019 10:05:32 -0000 1.136
+++ dev/ic/wdc.c 28 May 2023 08:24:04 -0000
@@ -883,8 +883,10 @@ wdcstart(struct channel_softc *chp)
return;
}
+ printf("HP: xfer=%p orig chp=%p\n",xfer,chp);
/* adjust chp, in case we have a shared queue */
chp = xfer->chp;
+ printf("HP: xfer=%p xfer->chp=%p\n",xfer,chp);
if ((chp->ch_flags & WDCF_ACTIVE) != 0 ) {
return; /* channel already active */
And here is what I got:
... lot of messages with occasional timeout message ...
HP: xfer=0xfffffd807e020c38 orig chp=0xffff80000007c168
HP: xfer=0xfffffd807e020c38 xfer->chp=0xffff80000007c168
HP: xfer=0xfffffd807e020c38 orig chp=0xffff80000007c168
HP: xfer=0xfffffd807e020c38 xfer->chp=0x6e1e3d12d428657b
kernel: protection fault trap, code=0
Stopped at wdcstart+0x49: movl 0x58(%r15),%eax
ddb> trace
wdcstart(ffff80000007c168,ffff80000007c168,ffff80000007c168,fffffd807e020c38,10,ffff800021707a90)
at wdcstart+0x49
wdc_atapi_the_machine(ffff80000007c168,fffffd807e020c38,2,ffff80000007c168,ffff80000007c168,fffffd807e020c38)
at wdc_atapi_the_machine+0x14a
wdc_atapi_intr(ffff80000007c168,fffffd807e020c38,1,ffff80000007c168,fffffd807e020c38,ffff80000007c168)
at wdc_atapi_intr+0x47
wdcintr(ffff80000007c168,ffff80000007c168,0,0,6,1)
at
wdcintr+0xaeintr_handler(ffff800021707bf0,ffff800000065500,ffff800000065680,0,ffffffff81212216,ffff800021707be0)
at intr_handler+0x26
Xintr_ioapic_edge14_untramp(0,ffffff9c,ffffffff81483770,0,ffff8000216cd060,75f3d68ef248)
at Xintr_ioapic_edge14_untramp+0x18f
ndinitat(ffff8000216cd060,ffffffffffffff9c,2e96cea10,75f3d68ef248,0,ffff8000216cd060)
at ndinitat
syscall(ffff800021707ef0,ffff800021707ef0,0,ffff8000216cd060,0,0)
at syscall+0x201
Xsyscall(6,26,5,26,2e96cea10,0)
at Xsyscall+0x128
end of kernel
end trace frame: 0x75f3d68ef2f0, count: -9
ddb>
So it seems that part of xfer structure is under some rare condition
overwritten.
The question is how to find what is causing that corruption.
Best regards
--Henryk Paluch