On Sat, Jun 22, 2013 at 5:00 PM, Benjamin Herrenschmidt <b...@kernel.crashing.org> wrote: > Afaik e300 is slightly out of order, maybe it's missing a memory barrier > somewhere.... One thing to try is to add some to the dma_map/unmap ops.
I went through the driver and added memory barriers to the dma_map_page/dma_unmap_page and dma_alloc_coherent/dma_free_coherent calls (wmb() calls after each, which resolves to a sync instruction). I still get a kernel panic. I did turn on DEBUG_PAGE_ALLOC to try and get more information, but I'm not finding anything new. However, with the SLAB debugging I do find SLAB corruption, e.g.: Slab corruption: fib6_nodes start=e900c7f8, len=32 Redzone: 0x9f911029d74e35b/0x30a706a6050806. Last user: [<06040001>](0x6040001) 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff Prev obj: start=e900c7c0, len=32 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [< (null)>](0x0) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 Next obj: start=e900c830, len=32 Redzone: 0x30a706a6050aca/0xc8be11029d74e35b. Last user: [< (null)>](0x0) 000: 0d aa 00 00 00 00 00 00 0a ca 0d 49 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 75 8b Which is clearly corrupted with ethernet frames. The only interface connected is the e1000. Eventually this corruption leads to a kernel panic. I'm completely confused on how this could happen. Given the M bit is set for all pages (see below), and with memory barriers on the DMA map/unmap and register operations, the only thing I can think of is something in the IO sequencer (which was suggested in the link I gave earlier). Yet the patch mentioned is in place. > Also audit the driver to ensure that it properly uses barriers when > populating descriptors (and maybe compare to a more recent version of > the driver upstream). I've gone through the driver and didn't see anything missing. And the upstream (v3.10-rc5) driver is the same version (7.3.21-k8-NAPI). And I've used the latest from the e1000 release (8.0.35-NAPI), and I get the same problem. On Sun, Jun 23, 2013 at 6:16 PM, Benjamin Herrenschmidt <b...@kernel.crashing.org> wrote: > Also dbl check that the MMU is indeed mapping all these pages with the > "M" bit. The DBAT's have the M bit set (both have 0x12 in the DBATxL registers)...sometimes. Usually when I halt the CPU and dumps the BAT's, all the IBAT's and DBAT's have zeros. But occasionally I see DBAT2 and DBAT3 with values and the M bit set. I also dumped all the TLB entries, and every one of them has the M bit set (see below). TLB dump: BDI>dtlb 0 63 IDX V RC VSID VPI RPN WIMG PP 0: V 0C 000eee_e9a0000 -> 2e9a0000 --M- 00 1: V 0C 000eee_f401000 -> 2f401000 --M- 00 2: V 1C 000ccc_0502000 -> 00502000 --M- 00 3: V 0C 000eee_f403000 -> 2f403000 --M- 00 4: V 0C 000eee_c124000 -> 2c124000 --M- 00 5: V 0C 000eee_f405000 -> 2f405000 --M- 00 6: V 0C 000eee_e9e6000 -> 2e9e6000 --M- 00 7: V 0C 33afd1_0427000 -> 005f8000 --M- 10 8: V 0C 33afd1_0428000 -> 2ff63000 --M- 10 9: V 0C 000ccc_0349000 -> 00349000 --M- 00 10: V 1C 000ccc_03ca000 -> 003ca000 --M- 00 11: V 1C 000ccc_03cb000 -> 003cb000 --M- 00 12: V 0C 33afd1_040c000 -> 003b4000 --M- 11 13: V 0C 000eee_f40d000 -> 2f40d000 --M- 00 14: V 1C 000eee_fa8e000 -> 2fa8e000 --M- 00 15: V 0- 33afd1_034f000 -> 2e6b1000 --M- 11 16: V 0C 000eee_f470000 -> 2f470000 --M- 00 17: V 0C 33afd1_0411000 -> 2fe54000 --M- 10 18: V 0C 000eee_f4b2000 -> 2f4b2000 --M- 00 19: V 1C 33eb14_8073000 -> 00462000 --M- 10 20: V 0C 000ccc_02f4000 -> 002f4000 --M- 00 21: V 0C 000eee_f415000 -> 2f415000 --M- 00 22: V 1C 000ccc_03f6000 -> 003f6000 --M- 00 23: V 0C 000ccc_02f7000 -> 002f7000 --M- 00 24: V 1C 000ccc_03f8000 -> 003f8000 --M- 00 25: V 0C 000ccc_03d9000 -> 003d9000 --M- 00 26: V 1C 33b304_a31a000 -> 007f4000 --M- 10 27: V 1C 000ccc_03fb000 -> 003fb000 --M- 00 28: V 1C 000ccc_03fc000 -> 003fc000 --M- 00 29: V 0C 000eee_f41d000 -> 2f41d000 --M- 00 30: V 1C 000eee_e87e000 -> 2e87e000 --M- 00 31: V 1C 33afd1_045f000 -> 2fe52000 --M- 10 32: V 0C 000ccc_0000000 -> 00000000 --M- 00 33: V 0C 000eee_e9a1000 -> 2e9a1000 --M- 00 34: V 1C 33b304_8022000 -> 00f44000 --M- 10 35: V 0C 000ccc_0503000 -> 00503000 --M- 00 36: V 0C 33afd1_0744000 -> 2fe17000 --M- 10 37: V 0C 000eee_c125000 -> 2c125000 --M- 00 38: V 0C 33e7e1_0406000 -> 0078e000 --M- 11 39: V 0C 000eee_e987000 -> 2e987000 --M- 00 40: V 0C 000ccc_0008000 -> 00008000 --M- 00 41: V 0C 000ccc_03c9000 -> 003c9000 --M- 00 42: V 1C 33ba7b_f8ea000 -> 005f9000 --M- 10 43: V 1C 33afd1_040b000 -> 2ffe0000 --M- 11 44: V 0C 000ccc_03cc000 -> 003cc000 --M- 00 45: V 0C 000eee_b68d000 -> 2b68d000 --M- 00 46: V 1C 000eee_f40e000 -> 2f40e000 --M- 00 47: V 0C 000eee_fa8f000 -> 2fa8f000 --M- 00 48: V 0C 33afd1_0410000 -> 2fe4a000 --M- 10 49: V 0C 000eee_f471000 -> 2f471000 --M- 00 50: V 0C 000ccc_03f2000 -> 003f2000 --M- 00 51: V 1C 000eee_f473000 -> 2f473000 --M- 00 52: V 0C 000ccc_03f4000 -> 003f4000 --M- 00 53: V 0C 000ccc_03f5000 -> 003f5000 --M- 00 54: V 1C 000eee_f456000 -> 2f456000 --M- 00 55: V 0C 000eee_d2f7000 -> 2d2f7000 --M- 00 56: V 1C 000ccc_03d8000 -> 003d8000 --M- 00 57: V 0C 000eee_e879000 -> 2e879000 --M- 00 58: V 1C 000eee_f41a000 -> 2f41a000 --M- 00 59: V 1C 000ccc_03db000 -> 003db000 --M- 00 60: V 1C 000eee_f43c000 -> 2f43c000 --M- 00 61: V 0C 000ccc_04fd000 -> 004fd000 --M- 00 62: V 1C 000eee_f43e000 -> 2f43e000 --M- 00 63: V 1C 000eee_e93f000 -> 2e93f000 --M- 00 _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev