On Tue, Jan 29, 2013 at 1:13 PM, Deucher, Alexander <alexander.deuc...@amd.com> wrote: >> -----Original Message----- >> From: Shuah Khan [mailto:shuahk...@gmail.com] >> Sent: Tuesday, January 29, 2013 2:11 PM >> To: Deucher, Alexander >> Cc: Linus Torvalds; Linux Kernel Mailing List >> Subject: Re: Linux 3.8-rc4 >> >> On Tue, Jan 29, 2013 at 6:05 AM, Deucher, Alexander >> <alexander.deuc...@amd.com> wrote: >> >> -----Original Message----- >> >> I was out sick for a few days and finally picked this bisect backup >> >> again. I started at 3.7 tag instead of 3.8-rc1 that I did in the past >> >> and also did bisect at drivers/gpu/drm/radeon instead. Here are the >> >> results: >> >> >> >> 6253e4c75d96006c06b9ac8f417eba873de2497b is the first bad commit >> >> commit 6253e4c75d96006c06b9ac8f417eba873de2497b >> >> Author: Alex Deucher <alexander.deuc...@amd.com> >> >> Date: Wed Dec 12 14:30:32 2012 -0500 >> >> >> >> drm/radeon: improve mc_stop/mc_resume on r5xx-r7xx >> >> >> >> Along the same lines of what was done for evergreen+ >> >> in the last kernel. >> >> >> >> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> >> >> >> >> git bisect log attached. >> >> >> > >> > Try the attached patch. I think it should fix the issue. I just applied >> > a similar >> patch for newer asics. >> > >> > Alex >> > >> >> I reverted 6253e4c75d96006c06b9ac8f417eba873de2497b and DMAR faults >> went away. Undid the revert and applied your new patch. DMAR faults >> are back again. >> >> >> [ 25.158653] [drm] PCIE GART of 512M enabled (table at >> 0x0000000000040000). >> [ 25.158715] radeon 0000:01:00.0: WB enabled >> [ 25.158719] radeon 0000:01:00.0: fence driver on ring 0 use gpu >> addr 0x0000000008000c00 and cpu addr 0xffff88002f143c00 >> [ 25.158721] radeon 0000:01:00.0: fence driver on ring 3 use gpu >> addr 0x0000000008000c0c and cpu addr 0xffff88002f143c0c >> >> A few observations and questions about r600_startup() code sequence: >> >> I notice DMAR faults right after >> >> [drm] Loading RV620 Microcode message which is from >> r600_init_microcode(). This routine does a series of >> request_firmware() calls. btw. don't see release_firmware() calls in >> regular code path, only from error legs in r600_init_microcode(). >> >> However, this routine doesn't do any loading yet. When this routine >> returns, I am assuming request_firmware() step isn't complete yet >> based on my reading request_firmware() interface. At this point >> r600_startup() keeps chugging along, and does r600_mc_program() which >> in turn calls rv515_mc_stop() which was changed with the >> 6253e4c75d96006c06b9ac8f417eba873de2497b commit. >> >> I am thinking the changes somehow eliminated a wait or delay that used >> be there for request_firmware() step to complete (?) >> >> I can see from dmesg that the faults occur right after: >> >> r600_init_microcode(rdev); >> >> and stop before r600_pcie_gart_enable() > > r600_init_microcode() doesn't actually touch the hardware it just calls > request_firmware() to fetch the microcode images from disk. The microcode > doesn't get loaded onto the hardware until r600_cp_load_microcode() much > later in the function. I don't think the microcode has anything to do with > this. > > rv515_mc_stop() stops GPU memory clients (e.g., the displays) and blacks out > the GPU memory controller so that we can change the location of VRAM within > the GPU's address space. If one of the display controllers memory request > stop requests takes too long to go through for some reason, it's possible > that the display hardware may attempt to read from a GPU memory location > no-longer backed by vram (since we changed the location of vram in > r600_mc_program()) momentarily until the stop request goes through. Does the > attached updated version of the patch help? Alternatively, you can try > adding delays to the end of rv515_mc_stop() and see if that helps. > > Alex >
This v2 patch didn't help. I added mdelay(15); at the end of rv515_mc_stop() on top of this v2 patch and that fixed the problem. mdelay(15) is a bit much I am sure. Shouldn't rv515_mc_wait_for_idle() take care of the delay? It waits for idle usec_timeout? -- Shuah -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/