On Tue, Jan 29, 2013 at 4:45 PM, Shuah Khan <shuahk...@gmail.com> wrote: > On Tue, Jan 29, 2013 at 3:02 PM, Deucher, Alexander > <alexander.deuc...@amd.com> wrote: >>> -----Original Message----- >>> From: Shuah Khan [mailto:shuahk...@gmail.com] >>> Sent: Tuesday, January 29, 2013 4:40 PM >>> To: Deucher, Alexander >>> Cc: Linus Torvalds; Linux Kernel Mailing List >>> Subject: Re: Linux 3.8-rc4 >>> >>> On Tue, Jan 29, 2013 at 1:13 PM, Deucher, Alexander >>> <alexander.deuc...@amd.com> wrote: >>> >> -----Original Message----- >>> >> From: Shuah Khan [mailto:shuahk...@gmail.com] >>> >> Sent: Tuesday, January 29, 2013 2:11 PM >>> >> To: Deucher, Alexander >>> >> Cc: Linus Torvalds; Linux Kernel Mailing List >>> >> Subject: Re: Linux 3.8-rc4 >>> >> >>> >> On Tue, Jan 29, 2013 at 6:05 AM, Deucher, Alexander >>> >> <alexander.deuc...@amd.com> wrote: >>> >> >> -----Original Message----- >>> >> >> I was out sick for a few days and finally picked this bisect backup >>> >> >> again. I started at 3.7 tag instead of 3.8-rc1 that I did in the past >>> >> >> and also did bisect at drivers/gpu/drm/radeon instead. Here are the >>> >> >> results: >>> >> >> >>> >> >> 6253e4c75d96006c06b9ac8f417eba873de2497b is the first bad commit >>> >> >> commit 6253e4c75d96006c06b9ac8f417eba873de2497b >>> >> >> Author: Alex Deucher <alexander.deuc...@amd.com> >>> >> >> Date: Wed Dec 12 14:30:32 2012 -0500 >>> >> >> >>> >> >> drm/radeon: improve mc_stop/mc_resume on r5xx-r7xx >>> >> >> >>> >> >> Along the same lines of what was done for evergreen+ >>> >> >> in the last kernel. >>> >> >> >>> >> >> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> >>> >> >> >>> >> >> git bisect log attached. >>> >> >> >>> >> > >>> >> > Try the attached patch. I think it should fix the issue. I just >>> >> > applied a >>> similar >>> >> patch for newer asics. >>> >> > >>> >> > Alex >>> >> > >>> >> >>> >> I reverted 6253e4c75d96006c06b9ac8f417eba873de2497b and DMAR >>> faults >>> >> went away. Undid the revert and applied your new patch. DMAR faults >>> >> are back again. >>> >> >>> >> >>> >> [ 25.158653] [drm] PCIE GART of 512M enabled (table at >>> >> 0x0000000000040000). >>> >> [ 25.158715] radeon 0000:01:00.0: WB enabled >>> >> [ 25.158719] radeon 0000:01:00.0: fence driver on ring 0 use gpu >>> >> addr 0x0000000008000c00 and cpu addr 0xffff88002f143c00 >>> >> [ 25.158721] radeon 0000:01:00.0: fence driver on ring 3 use gpu >>> >> addr 0x0000000008000c0c and cpu addr 0xffff88002f143c0c >>> >> >>> >> A few observations and questions about r600_startup() code sequence: >>> >> >>> >> I notice DMAR faults right after >>> >> >>> >> [drm] Loading RV620 Microcode message which is from >>> >> r600_init_microcode(). This routine does a series of >>> >> request_firmware() calls. btw. don't see release_firmware() calls in >>> >> regular code path, only from error legs in r600_init_microcode(). >>> >> >>> >> However, this routine doesn't do any loading yet. When this routine >>> >> returns, I am assuming request_firmware() step isn't complete yet >>> >> based on my reading request_firmware() interface. At this point >>> >> r600_startup() keeps chugging along, and does r600_mc_program() which >>> >> in turn calls rv515_mc_stop() which was changed with the >>> >> 6253e4c75d96006c06b9ac8f417eba873de2497b commit. >>> >> >>> >> I am thinking the changes somehow eliminated a wait or delay that used >>> >> be there for request_firmware() step to complete (?) >>> >> >>> >> I can see from dmesg that the faults occur right after: >>> >> >>> >> r600_init_microcode(rdev); >>> >> >>> >> and stop before r600_pcie_gart_enable() >>> > >>> > r600_init_microcode() doesn't actually touch the hardware it just calls >>> request_firmware() to fetch the microcode images from disk. The microcode >>> doesn't get loaded onto the hardware until r600_cp_load_microcode() much >>> later in the function. I don't think the microcode has anything to do with >>> this. >>> > >>> > rv515_mc_stop() stops GPU memory clients (e.g., the displays) and blacks >>> out the GPU memory controller so that we can change the location of VRAM >>> within the GPU's address space. If one of the display controllers memory >>> request stop requests takes too long to go through for some reason, it's >>> possible that the display hardware may attempt to read from a GPU memory >>> location no-longer backed by vram (since we changed the location of vram in >>> r600_mc_program()) momentarily until the stop request goes through. Does >>> the attached updated version of the patch help? Alternatively, you can try >>> adding delays to the end of rv515_mc_stop() and see if that helps. >>> > >>> > Alex >>> > >>> >>> This v2 patch didn't help. I added mdelay(15); at the end of >>> rv515_mc_stop() on top of this v2 patch and that fixed the problem. >>> mdelay(15) is a bit much I am sure. Shouldn't rv515_mc_wait_for_idle() >>> take care of the delay? It waits for idle usec_timeout? >> >> >> It only waits that long if the MC never goes idle. If the MC happens to be >> idle at the time, it will return immediately. Does the attached patch fix >> the issue? It waits for the update pending bit to clear in addition to >> waiting for the next frame. >> >> Alex >> > > No. This patch didn't fix the problem. > > -- Shuah
ok. I did more debugging in rv515_mc_stop() and here is what's happening. It has two display controllers and one of them is enabled and the other is in disabled state when AVIVO_D1CRTC_CONTROL is checked. The current code doesn't blank the disabled crtc. However, it needs to be blanked to avoid DMAR faults it appears. I think that is what the original code prior to 6253e4c75d96006c06b9ac8f417eba873de2497b commit was doing: - WREG32(R_0060E8_D1CRTC_UPDATE_LOCK, 1); - WREG32(R_0068E8_D2CRTC_UPDATE_LOCK, 1); - WREG32(R_006080_D1CRTC_CONTROL, 0); - WREG32(R_006880_D2CRTC_CONTROL, 0); - WREG32(R_0060E8_D1CRTC_UPDATE_LOCK, 0); - WREG32(R_0068E8_D2CRTC_UPDATE_LOCK, 0); - WREG32(R_000330_D1VGA_CONTROL, 0); - WREG32(R_000338_D2VGA_CONTROL, 0); Anyways, here is the diff for the change (by no means a patch) I made that fixed the problem: diff --git a/drivers/gpu/drm/radeon/rv515.c b/drivers/gpu/drm/radeon/rv515.c index 2bb6d0e..29ac184 100644 --- a/drivers/gpu/drm/radeon/rv515.c +++ b/drivers/gpu/drm/radeon/rv515.c @@ -298,6 +298,10 @@ void rv515_mc_stop(struct radeon_device *rdev, struct rv515 /* blank the display controllers */ for (i = 0; i < rdev->num_crtc; i++) { crtc_enabled = RREG32(AVIVO_D1CRTC_CONTROL + crtc_offsets[i]) & + dev_info(rdev->dev, "num_crtc = %d crtc_enabled %d for %d.\n", + rdev->num_crtc, crtc_enabled, i); + crtc_enabled = 1; + if (crtc_enabled) { save->crtc_enabled[i] = true; tmp = RREG32(AVIVO_D1CRTC_CONTROL + crtc_offsets[i]); -- Shuah -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/