On 04/29/2016 08:18 PM, Robin Murphy wrote: > This reverts commit 1733a2ad36741b1812cf8b3f3037c28d0af53f50. > > There is apparently something amiss with the way the TTM code handles > DMA buffers, which the above commit was attempting to work around for > arm64 systems with non-coherent PCI. Unfortunately, this completely > breaks systems *with* coherent PCI (which appear to be the majority). > > Booting a plain arm64 defconfig + CONFIG_DRM + CONFIG_DRM_NOUVEAU on > a machine with a PCI GPU having coherent dma_map_ops (in this case a > 7600GT card plugged into an ARM Juno board) results in a fatal crash: > > [ 2.803438] nouveau 0000:06:00.0: DRM: allocated 1024x768 fb: 0x9000, bo > ffffffc976141c00 > [ 2.897662] Unable to handle kernel NULL pointer dereference at virtual > address 000001ac > [ 2.897666] pgd = ffffff8008e00000 > [ 2.897675] [000001ac] *pgd=00000009ffffe003, *pud=00000009ffffe003, > *pmd=0000000000000000 > [ 2.897680] Internal error: Oops: 96000045 [#1] PREEMPT SMP > [ 2.897685] Modules linked in: > [ 2.897692] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc5+ #543 > [ 2.897694] Hardware name: ARM Juno development board (r1) (DT) > [ 2.897699] task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti: > ffffffc9768a8000 > [ 2.897711] PC is at __memcpy+0x7c/0x180 > [ 2.897719] LR is at OUT_RINGp+0x34/0x70 > [ 2.897724] pc : [<ffffff80083465fc>] lr : [<ffffff800854248c>] pstate: > 80000045 > [ 2.897726] sp : ffffffc9768ab360 > [ 2.897732] x29: ffffffc9768ab360 x28: 0000000000000001 > [ 2.897738] x27: ffffffc97624c000 x26: 0000000000000000 > [ 2.897744] x25: 0000000000000080 x24: 0000000000006c00 > [ 2.897749] x23: 0000000000000005 x22: ffffffc97624c010 > [ 2.897755] x21: 0000000000000004 x20: 0000000000000004 > [ 2.897761] x19: ffffffc9763da000 x18: ffffffc976b2491c > [ 2.897766] x17: 0000000000000007 x16: 0000000000000006 > [ 2.897771] x15: 0000000000000001 x14: 0000000000000001 > [ 2.897777] x13: 0000000000e31b70 x12: ffffffc9768a0080 > [ 2.897783] x11: 0000000000000000 x10: fffffffffffffb00 > [ 2.897788] x9 : 0000000000000000 x8 : 0000000000000000 > [ 2.897793] x7 : 0000000000000000 x6 : 00000000000001ac > [ 2.897799] x5 : 00000000ffffffff x4 : 0000000000000000 > [ 2.897804] x3 : 0000000000000010 x2 : 0000000000000010 > [ 2.897810] x1 : ffffffc97624c010 x0 : 00000000000001ac > ... > [ 2.898494] Call trace: > [ 2.898499] Exception stack(0xffffffc9768ab1a0 to 0xffffffc9768ab2c0) > [ 2.898506] b1a0: ffffffc9763da000 0000000000000004 ffffffc9768ab360 > ffffff80083465fc > [ 2.898513] b1c0: ffffffc976801e00 ffffffc9762b8000 ffffffc9768ab1f0 > ffffff80080ec158 > [ 2.898520] b1e0: ffffffc9768ab230 ffffff8008496d04 ffffffc975ce6d80 > ffffffc9768ab36e > [ 2.898527] b200: ffffffc9768ab36f ffffffc9768ab29d ffffffc9768ab29e > ffffffc9768a0000 > [ 2.898533] b220: ffffffc9768ab250 ffffff80080e70c0 ffffffc9768ab270 > ffffff8008496e44 > [ 2.898540] b240: 00000000000001ac ffffffc97624c010 0000000000000010 > 0000000000000010 > [ 2.898546] b260: 0000000000000000 00000000ffffffff 00000000000001ac > 0000000000000000 > [ 2.898552] b280: 0000000000000000 0000000000000000 fffffffffffffb00 > 0000000000000000 > [ 2.898558] b2a0: ffffffc9768a0080 0000000000e31b70 0000000000000001 > 0000000000000001 > [ 2.898566] [<ffffff80083465fc>] __memcpy+0x7c/0x180 > [ 2.898574] [<ffffff800853e164>] nv04_fbcon_imageblit+0x1d4/0x2e8 > [ 2.898582] [<ffffff800853d6d0>] nouveau_fbcon_imageblit+0xd8/0xe0 > [ 2.898591] [<ffffff80083c4db4>] soft_cursor+0x154/0x1d8 > [ 2.898598] [<ffffff80083c47b4>] bit_cursor+0x4fc/0x538 > [ 2.898605] [<ffffff80083c0cfc>] fbcon_cursor+0x134/0x1a8 > [ 2.898613] [<ffffff800841c280>] hide_cursor+0x38/0xa0 > [ 2.898620] [<ffffff800841d420>] redraw_screen+0x120/0x228 > [ 2.898628] [<ffffff80083bf268>] fbcon_prepare_logo+0x370/0x3f8 > [ 2.898635] [<ffffff80083bf640>] fbcon_init+0x350/0x560 > [ 2.898641] [<ffffff800841c634>] visual_init+0xac/0x108 > [ 2.898648] [<ffffff800841df14>] do_bind_con_driver+0x1c4/0x3a8 > [ 2.898655] [<ffffff800841e4f4>] do_take_over_console+0x174/0x1e8 > [ 2.898662] [<ffffff80083bf8c4>] do_fbcon_takeover+0x74/0x100 > [ 2.898669] [<ffffff80083c3e44>] fbcon_event_notify+0x8cc/0x920 > [ 2.898680] [<ffffff80080d7e38>] notifier_call_chain+0x50/0x90 > [ 2.898685] [<ffffff80080d8214>] __blocking_notifier_call_chain+0x4c/0x90 > [ 2.898691] [<ffffff80080d826c>] blocking_notifier_call_chain+0x14/0x20 > [ 2.898696] [<ffffff80083c5e1c>] fb_notifier_call_chain+0x1c/0x28 > [ 2.898703] [<ffffff80083c81ac>] register_framebuffer+0x1cc/0x2e0 > [ 2.898712] [<ffffff800845da80>] drm_fb_helper_initial_config+0x288/0x3e8 > [ 2.898719] [<ffffff800853da20>] nouveau_fbcon_init+0xe0/0x118 > [ 2.898727] [<ffffff800852d2f8>] nouveau_drm_load+0x268/0x890 > [ 2.898734] [<ffffff8008466e24>] drm_dev_register+0xbc/0xc8 > [ 2.898740] [<ffffff8008468a88>] drm_get_pci_dev+0xa0/0x180 > [ 2.898747] [<ffffff800852cb28>] nouveau_drm_probe+0x1a0/0x1e0 > [ 2.898755] [<ffffff80083a32e0>] pci_device_probe+0x98/0x110 > [ 2.898763] [<ffffff800858e434>] driver_probe_device+0x204/0x2b0 > [ 2.898770] [<ffffff800858e58c>] __driver_attach+0xac/0xb0 > [ 2.898777] [<ffffff800858c3e0>] bus_for_each_dev+0x60/0xa0 > [ 2.898783] [<ffffff800858dbc0>] driver_attach+0x20/0x28 > [ 2.898789] [<ffffff800858d7b0>] bus_add_driver+0x1d0/0x238 > [ 2.898796] [<ffffff800858ed50>] driver_register+0x60/0xf8 > [ 2.898802] [<ffffff80083a20dc>] __pci_register_driver+0x3c/0x48 > [ 2.898809] [<ffffff8008468eb4>] drm_pci_init+0xf4/0x120 > [ 2.898818] [<ffffff8008c56fc0>] nouveau_drm_init+0x21c/0x230 > [ 2.898825] [<ffffff80080829d4>] do_one_initcall+0x8c/0x190 > [ 2.898832] [<ffffff8008c31af4>] kernel_init_freeable+0x14c/0x1f0 > [ 2.898839] [<ffffff80088a0c20>] kernel_init+0x10/0x100 > [ 2.898845] [<ffffff8008085e10>] ret_from_fork+0x10/0x40 > [ 2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7) > [ 2.898871] ---[ end trace d5713dcad023ee04 ]--- > [ 2.898888] Kernel panic - not syncing: Attempted to kill init! > exitcode=0x0000000b > > In a toss-up between the GPU seeing stale data artefacts on some systems > vs. catastrophic kernel crashes on other systems, the latter would seem > to take precedence, so revert this change until the real underlying > problem can be fixed. > > Signed-off-by: Robin Murphy <robin.murphy at arm.com> > --- > > Alex, Ben, Dave, > > I know Alex was looking into this, but since we're nearly at -rc6 already > it looks like the only thing to do for 4.6 is pick the lesser of two evils...
Hi Robin, Sorry for the delayed reply - I was offline last week. You are right, so let's pick this patch for now. Reviewed-by: Alexandre Courbot <acourbot at nvidia.com>