On Tue, Jan 29, 2013 at 4:45 PM, Shuah Khan <shuahk...@gmail.com> wrote:
> On Tue, Jan 29, 2013 at 3:02 PM, Deucher, Alexander
> <alexander.deuc...@amd.com> wrote:
>>> -----Original Message-----
>>> From: Shuah Khan [mailto:shuahk...@gmail.com]
>>> Sent: Tuesday, January 29, 2013 4:40 PM
>>> To: Deucher, Alexander
>>> Cc: Linus Torvalds; Linux Kernel Mailing List
>>> Subject: Re: Linux 3.8-rc4
>>>
>>> On Tue, Jan 29, 2013 at 1:13 PM, Deucher, Alexander
>>> <alexander.deuc...@amd.com> wrote:
>>> >> -----Original Message-----
>>> >> From: Shuah Khan [mailto:shuahk...@gmail.com]
>>> >> Sent: Tuesday, January 29, 2013 2:11 PM
>>> >> To: Deucher, Alexander
>>> >> Cc: Linus Torvalds; Linux Kernel Mailing List
>>> >> Subject: Re: Linux 3.8-rc4
>>> >>
>>> >> On Tue, Jan 29, 2013 at 6:05 AM, Deucher, Alexander
>>> >> <alexander.deuc...@amd.com> wrote:
>>> >> >> -----Original Message-----
>>> >> >> I was out sick for a few days and finally picked this bisect backup
>>> >> >> again. I started at 3.7 tag instead of 3.8-rc1 that I did in the past
>>> >> >> and also did bisect at drivers/gpu/drm/radeon instead. Here are the
>>> >> >> results:
>>> >> >>
>>> >> >> 6253e4c75d96006c06b9ac8f417eba873de2497b is the first bad commit
>>> >> >> commit 6253e4c75d96006c06b9ac8f417eba873de2497b
>>> >> >> Author: Alex Deucher <alexander.deuc...@amd.com>
>>> >> >> Date:   Wed Dec 12 14:30:32 2012 -0500
>>> >> >>
>>> >> >>     drm/radeon: improve mc_stop/mc_resume on r5xx-r7xx
>>> >> >>
>>> >> >>     Along the same lines of what was done for evergreen+
>>> >> >>     in the last kernel.
>>> >> >>
>>> >> >>     Signed-off-by: Alex Deucher <alexander.deuc...@amd.com>
>>> >> >>
>>> >> >> git bisect log attached.
>>> >> >>
>>> >> >
>>> >> > Try the attached patch.  I think it should fix the issue.  I just 
>>> >> > applied a
>>> similar
>>> >> patch for newer asics.
>>> >> >
>>> >> > Alex
>>> >> >
>>> >>
>>> >> I reverted 6253e4c75d96006c06b9ac8f417eba873de2497b and DMAR
>>> faults
>>> >> went away. Undid the revert and applied your new patch. DMAR faults
>>> >> are back again.
>>> >>
>>> >>
>>> >> [   25.158653] [drm] PCIE GART of 512M enabled (table at
>>> >> 0x0000000000040000).
>>> >> [   25.158715] radeon 0000:01:00.0: WB enabled
>>> >> [   25.158719] radeon 0000:01:00.0: fence driver on ring 0 use gpu
>>> >> addr 0x0000000008000c00 and cpu addr 0xffff88002f143c00
>>> >> [   25.158721] radeon 0000:01:00.0: fence driver on ring 3 use gpu
>>> >> addr 0x0000000008000c0c and cpu addr 0xffff88002f143c0c
>>> >>
>>> >> A few observations and questions about r600_startup() code sequence:
>>> >>
>>> >> I notice DMAR faults right after
>>> >>
>>> >> [drm] Loading RV620 Microcode message which is from
>>> >> r600_init_microcode(). This routine does a series of
>>> >> request_firmware() calls. btw. don't see release_firmware() calls in
>>> >> regular code path, only from error legs in r600_init_microcode().
>>> >>
>>> >> However, this routine doesn't do any loading yet. When this routine
>>> >> returns, I am assuming request_firmware() step isn't complete yet
>>> >> based on my reading request_firmware() interface. At this point
>>> >> r600_startup() keeps chugging along, and does r600_mc_program() which
>>> >> in turn calls rv515_mc_stop() which was changed with the
>>> >> 6253e4c75d96006c06b9ac8f417eba873de2497b commit.
>>> >>
>>> >> I am thinking the changes somehow eliminated a wait or delay that used
>>> >> be there for request_firmware() step to complete (?)
>>> >>
>>> >> I can see from dmesg that the faults occur right after:
>>> >>
>>> >> r600_init_microcode(rdev);
>>> >>
>>> >> and stop before r600_pcie_gart_enable()
>>> >
>>> > r600_init_microcode() doesn't actually touch the hardware it just calls
>>> request_firmware() to fetch the microcode images from disk.  The microcode
>>> doesn't get loaded onto the hardware until r600_cp_load_microcode() much
>>> later in the function.  I don't think the microcode has anything to do with 
>>> this.
>>> >
>>> > rv515_mc_stop() stops GPU memory clients (e.g., the displays) and blacks
>>> out the GPU memory controller so that we can change the location of VRAM
>>> within the GPU's address space.  If one of the display controllers memory
>>> request stop requests takes too long to go through for some reason, it's
>>> possible that the display hardware may attempt to read from a GPU memory
>>> location no-longer backed by vram (since we changed the location of vram in
>>> r600_mc_program()) momentarily until the stop request goes through.  Does
>>> the attached updated version of the patch help?  Alternatively, you can try
>>> adding delays to the end of rv515_mc_stop() and see if that helps.
>>> >
>>> > Alex
>>> >
>>>
>>> This v2 patch didn't help. I added  mdelay(15); at the end of
>>> rv515_mc_stop() on top of this v2 patch and that fixed the problem.
>>> mdelay(15) is a bit much I am sure. Shouldn't rv515_mc_wait_for_idle()
>>> take care of the delay? It waits for idle usec_timeout?
>>
>>
>>  It only waits that long if the MC never goes idle.  If the MC happens to be 
>> idle at the time, it will return immediately.  Does the attached patch fix 
>> the issue?  It waits for the update pending bit to clear in addition to 
>> waiting for the next frame.
>>
>> Alex
>>
>
> No. This patch didn't fix the problem.
>
> -- Shuah

ok. I did more debugging in rv515_mc_stop() and here is what's
happening. It has two display controllers and one of them is enabled
and the other is in disabled state when AVIVO_D1CRTC_CONTROL is
checked. The current code doesn't blank the disabled crtc. However, it
needs to be blanked to avoid DMAR faults it appears. I think that is
what the original code prior to
6253e4c75d96006c06b9ac8f417eba873de2497b commit was doing:

-       WREG32(R_0060E8_D1CRTC_UPDATE_LOCK, 1);
-       WREG32(R_0068E8_D2CRTC_UPDATE_LOCK, 1);
-       WREG32(R_006080_D1CRTC_CONTROL, 0);
-       WREG32(R_006880_D2CRTC_CONTROL, 0);
-       WREG32(R_0060E8_D1CRTC_UPDATE_LOCK, 0);
-       WREG32(R_0068E8_D2CRTC_UPDATE_LOCK, 0);
-       WREG32(R_000330_D1VGA_CONTROL, 0);
-       WREG32(R_000338_D2VGA_CONTROL, 0);

Anyways, here is the diff for the change (by no means a patch) I made
that fixed the problem:

diff --git a/drivers/gpu/drm/radeon/rv515.c b/drivers/gpu/drm/radeon/rv515.c
index 2bb6d0e..29ac184 100644
--- a/drivers/gpu/drm/radeon/rv515.c
+++ b/drivers/gpu/drm/radeon/rv515.c
@@ -298,6 +298,10 @@ void rv515_mc_stop(struct radeon_device *rdev, struct rv515
        /* blank the display controllers */
        for (i = 0; i < rdev->num_crtc; i++) {
                crtc_enabled = RREG32(AVIVO_D1CRTC_CONTROL + crtc_offsets[i]) &
+               dev_info(rdev->dev, "num_crtc = %d crtc_enabled %d for %d.\n",
+                        rdev->num_crtc, crtc_enabled, i);
+               crtc_enabled = 1;
+
                if (crtc_enabled) {
                        save->crtc_enabled[i] = true;
                        tmp = RREG32(AVIVO_D1CRTC_CONTROL + crtc_offsets[i]);

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to