Michael, I just came back from work and finally could spend some time on this kernel crash.
I tried putting up to 5 sync() and even moving the close_partition() call at the end (making a total of 10 sync()). It still crashes but less frequently with this mod. I tried running the code on x86, but it looks like I will need to move a mac drive to a Intel box to do the test. I am still working on that :-P In the mean time, more info regarding the dump: Find below the disasm of the whole function as you asked. The crash always happens with lr pointing to c0039d14 so apparently the test at c0039cf0 fails and the kernel is trying to tell me something (printk) except that I don't see it (no flush?). Apparently the message is still for x86 (ead is a x86 right?). Here is what r3 and r4 point to before calling printk: 65 61 64 20 63 61 6c 6c 20 28 72 65 71 20 25 e a d c a l l ( r a q % 61 69 6c 65 64 21 0a 00 a i l a d ! \t I added two blank lines in the listing below between the area where the code shouldn't not run (IMHO.) Thanks for looking into this. I wish I could be more helpful. At any rate, let me know if you need anything else. LdS vmlinux-2.4.15-pre2-ben0: file format elf32-powerpc Disassembly of section .text: c0039c38 <invalidate_bdev>: c0039c38: 94 21 ff c0 stwu r1,-64(r1) c0039c3c: 7c 08 02 a6 mflr r0 c0039c40: be 81 00 10 stmw r20,16(r1) c0039c44: 90 01 00 44 stw r0,68(r1) c0039c48: 7c 76 1b 78 mr r22,r3 c0039c4c: a3 16 00 12 lhz r24,18(r22) c0039c50: 7c 99 23 78 mr r25,r4 c0039c54: 3e 80 c0 32 lis r20,-16334 c0039c58: 3d 20 c0 32 lis r9,-16334 c0039c5c: 3a a9 89 e4 addi r21,r9,-30236 c0039c60: 3b 40 00 00 li r26,0 c0039c64: 3a e0 00 00 li r23,0 c0039c68: 3b 60 00 00 li r27,0 c0039c6c: 39 34 89 d8 addi r9,r20,-30248 c0039c70: 7f e9 d8 2e lwzx r31,r9,r27 c0039c74: 2c 1f 00 00 cmpwi r31,0 c0039c78: 41 82 01 0c beq c0039d84 <invalidate_bdev+0x14c> c0039c7c: 7f bb a8 2e lwzx r29,r27,r21 c0039c80: 2c 1d 00 00 cmpwi r29,0 c0039c84: 40 81 01 00 ble c0039d84 <invalidate_bdev+0x14c> c0039c88: a0 1f 00 0c lhz r0,12(r31) c0039c8c: 83 9f 00 20 lwz r28,32(r31) c0039c90: 7c 00 c0 00 cmpw r0,r24 c0039c94: 40 82 00 e4 bne c0039d78 <invalidate_bdev+0x140> c0039c98: 80 1f 00 30 lwz r0,48(r31) c0039c9c: 2c 00 00 00 cmpwi r0,0 c0039ca0: 41 82 00 d8 beq c0039d78 <invalidate_bdev+0x140> c0039ca4: 81 3f 00 18 lwz r9,24(r31) c0039ca8: 71 20 00 04 andi. r0,r9,4 c0039cac: 41 82 00 3c beq c0039ce8 <invalidate_bdev+0xb0> c0039cb0: 3b df 00 10 addi r30,r31,16 c0039cb4: 7c 00 f0 28 lwarx r0,r0,r30 c0039cb8: 30 00 00 01 addic r0,r0,1 c0039cbc: 7c 00 f1 2d stwcx. r0,r0,r30 c0039cc0: 40 a2 ff f4 bne- c0039cb4 <invalidate_bdev+0x7c> c0039cc4: 71 20 00 04 andi. r0,r9,4 c0039cc8: 41 82 00 0c beq c0039cd4 <invalidate_bdev+0x9c> c0039ccc: 7f e3 fb 78 mr r3,r31 c0039cd0: 4b ff f3 a1 bl c0039070 <__wait_on_buffer> c0039cd4: 3b 40 00 01 li r26,1 c0039cd8: 7c 00 f0 28 lwarx r0,r0,r30 c0039cdc: 30 00 ff ff addic r0,r0,-1 c0039ce0: 7c 00 f1 2d stwcx. r0,r0,r30 c0039ce4: 40 a2 ff f4 bne- c0039cd8 <invalidate_bdev+0xa0> c0039ce8: 80 1f 00 18 lwz r0,24(r31) c0039cec: 70 09 00 10 andi. r9,r0,16 c0039cf0: 40 82 00 24 bne c0039d14 <invalidate_bdev+0xdc> c0039cf4: 3c 60 c0 21 lis r3,-16351 c0039cf8: 3c 80 c0 21 lis r4,-16351 c0039cfc: 38 84 cd b0 addi r4,r4,-12880 c0039d00: 38 63 cc c8 addi r3,r3,-13112 c0039d04: 38 a0 02 a6 li r5,678 c0039d08: 4b fd ae ad bl c0014bb4 <printk> c0039d0c: 38 60 00 00 li r3,0 c0039d10: 48 07 6f 75 bl c00b0c84 <xmon> c0039d14: 80 1f 00 18 lwz r0,24(r31) c0039d18: 70 09 00 02 andi. r9,r0,2 c0039d1c: 41 82 00 10 beq c0039d2c <invalidate_bdev+0xf4> c0039d20: 3c 60 c0 21 lis r3,-16351 c0039d24: 38 63 cd bc addi r3,r3,-12868 c0039d28: 4b fd ae 8d bl c0014bb4 <printk> c0039d2c: 80 1f 00 10 lwz r0,16(r31) c0039d30: 2c 00 00 00 cmpwi r0,0 c0039d34: 40 82 00 30 bne c0039d64 <invalidate_bdev+0x12c> c0039d38: 2c 19 00 00 cmpwi r25,0 c0039d3c: 40 82 00 10 bne c0039d4c <invalidate_bdev+0x114> c0039d40: 80 1f 00 18 lwz r0,24(r31) c0039d44: 70 09 00 02 andi. r9,r0,2 c0039d48: 40 82 00 28 bne c0039d70 <invalidate_bdev+0x138> c0039d4c: 80 1f 00 54 lwz r0,84(r31) c0039d50: 2c 00 00 00 cmpwi r0,0 c0039d54: 41 82 00 1c beq c0039d70 <invalidate_bdev+0x138> c0039d58: 7f e3 fb 78 mr r3,r31 c0039d5c: 4b ff fe 91 bl c0039bec <__remove_inode_queue> c0039d60: 48 00 00 10 b c0039d70 <invalidate_bdev+0x138> c0039d64: 3c 60 c0 21 lis r3,-16351 c0039d68: 38 63 cd d8 addi r3,r3,-12840 c0039d6c: 4b fd ae 49 bl c0014bb4 <printk> c0039d70: 2c 1a 00 00 cmpwi r26,0 c0039d74: 40 82 fe e4 bne c0039c58 <invalidate_bdev+0x20> c0039d78: 37 bd ff ff addic. r29,r29,-1 c0039d7c: 7f 9f e3 78 mr r31,r28 c0039d80: 41 81 ff 08 bgt c0039c88 <invalidate_bdev+0x50> c0039d84: 3a f7 00 01 addi r23,r23,1 c0039d88: 2c 17 00 02 cmpwi r23,2 c0039d8c: 3b 7b 00 04 addi r27,r27,4 c0039d90: 40 81 fe dc ble c0039c6c <invalidate_bdev+0x34> c0039d94: 2c 1a 00 00 cmpwi r26,0 c0039d98: 40 82 fe c0 bne c0039c58 <invalidate_bdev+0x20> c0039d9c: 80 76 00 0c lwz r3,12(r22) c0039da0: 4b fe dc 69 bl c0027a08 <invalidate_inode_pages> c0039da4: 80 01 00 44 lwz r0,68(r1) c0039da8: 7c 08 03 a6 mtlr r0 c0039dac: ba 81 00 10 lmw r20,16(r1) c0039db0: 38 21 00 40 addi r1,r1,64 c0039db4: 4e 80 00 20 blr c0039db8 <__invalidate_buffers>: > From: Michael Schmitz <[EMAIL PROTECTED]> > Date: Mon, 12 Nov 2001 15:27:29 +0100 (CET) > To: Laurent de Segur <[EMAIL PROTECTED]> > Cc: <debian-powerpc@lists.debian.org> > Subject: Re: Kernel Panic with mac-fdisk: It's reproducible... > >>> Now does this happen on other architectures as well (with Mac partitions), >>> or other partition table formats? >>> >> I'll try to check on some x86 early this week. I don't have an ETA for that. >> I'll do my best. My server farm and *tops are mostly Macs (running Debian.) > > Thanks; I'll wait for that. > >>> What time to usleep() seems enough to prevent this from happening? I'll >>> add a short sleep in mac-fdisk as a workaround while this is being fixed >>> on the kernel side. >>> >> The sleeps are already in write_partition_map() (partition.c if memory >> serves well), one of 2, the other of 4. I doubled these values prior to >> recompiling pdisk last night and this workaround seemed ok for the faster >> cpu machines. This is strange as I thought that the sleep time was >> independent of cpu speed, and the last task should finish earlier on faster >> machines. Weird. > > The sleep should be CPU speed independent indeed. But it shouldn't ever be > necessary. Neither should the first sync (BLKRRPART invalidates the device > thereby flushing all data to disk first). I really wonder what these > sleep()s are meant for. > > Please try another thing: move the close_device() after the second > sync/sleep pair. Or double each sync(). If we go the voodo path, we may as > well go all the way ;-) > >> Of course, I can not guarantee that it won't crash anymore. It just doesn't >> easily anymore. BTW, you may want to lower these values to reproduce easily >> on your system. > > I'll take them out for starters. > >>>> current cbdd0000, pid=1222, comm = readmap >>> >>> That's mostly chinese for me without being fed through some ksymoops like >>> tool (or your System.map; what's near c0039d14 in System.map?) >>> >> Michael, that's mostly Chinese for me even with the tool ;-) >> >> Here are the two address in my System.map braketing the lr: >> >> c0039c38 T invalidate_bdev > > That's a good clue. Now I need you to "objdump -d > --start-address=0xc0039c38 vmlinux " your kernel image and post the > section up to and a few lines beyond the PC value. > > Michael > >