On Tue, Jan 29, 2013 at 1:13 PM, Deucher, Alexander
<alexander.deuc...@amd.com> wrote:
>> -----Original Message-----
>> From: Shuah Khan [mailto:shuahk...@gmail.com]
>> Sent: Tuesday, January 29, 2013 2:11 PM
>> To: Deucher, Alexander
>> Cc: Linus Torvalds; Linux Kernel Mailing List
>> Subject: Re: Linux 3.8-rc4
>>
>> On Tue, Jan 29, 2013 at 6:05 AM, Deucher, Alexander
>> <alexander.deuc...@amd.com> wrote:
>> >> -----Original Message-----
>> >> I was out sick for a few days and finally picked this bisect backup
>> >> again. I started at 3.7 tag instead of 3.8-rc1 that I did in the past
>> >> and also did bisect at drivers/gpu/drm/radeon instead. Here are the
>> >> results:
>> >>
>> >> 6253e4c75d96006c06b9ac8f417eba873de2497b is the first bad commit
>> >> commit 6253e4c75d96006c06b9ac8f417eba873de2497b
>> >> Author: Alex Deucher <alexander.deuc...@amd.com>
>> >> Date:   Wed Dec 12 14:30:32 2012 -0500
>> >>
>> >>     drm/radeon: improve mc_stop/mc_resume on r5xx-r7xx
>> >>
>> >>     Along the same lines of what was done for evergreen+
>> >>     in the last kernel.
>> >>
>> >>     Signed-off-by: Alex Deucher <alexander.deuc...@amd.com>
>> >>
>> >> git bisect log attached.
>> >>
>> >
>> > Try the attached patch.  I think it should fix the issue.  I just applied 
>> > a similar
>> patch for newer asics.
>> >
>> > Alex
>> >
>>
>> I reverted 6253e4c75d96006c06b9ac8f417eba873de2497b and DMAR faults
>> went away. Undid the revert and applied your new patch. DMAR faults
>> are back again.
>>
>>
>> [   25.158653] [drm] PCIE GART of 512M enabled (table at
>> 0x0000000000040000).
>> [   25.158715] radeon 0000:01:00.0: WB enabled
>> [   25.158719] radeon 0000:01:00.0: fence driver on ring 0 use gpu
>> addr 0x0000000008000c00 and cpu addr 0xffff88002f143c00
>> [   25.158721] radeon 0000:01:00.0: fence driver on ring 3 use gpu
>> addr 0x0000000008000c0c and cpu addr 0xffff88002f143c0c
>>
>> A few observations and questions about r600_startup() code sequence:
>>
>> I notice DMAR faults right after
>>
>> [drm] Loading RV620 Microcode message which is from
>> r600_init_microcode(). This routine does a series of
>> request_firmware() calls. btw. don't see release_firmware() calls in
>> regular code path, only from error legs in r600_init_microcode().
>>
>> However, this routine doesn't do any loading yet. When this routine
>> returns, I am assuming request_firmware() step isn't complete yet
>> based on my reading request_firmware() interface. At this point
>> r600_startup() keeps chugging along, and does r600_mc_program() which
>> in turn calls rv515_mc_stop() which was changed with the
>> 6253e4c75d96006c06b9ac8f417eba873de2497b commit.
>>
>> I am thinking the changes somehow eliminated a wait or delay that used
>> be there for request_firmware() step to complete (?)
>>
>> I can see from dmesg that the faults occur right after:
>>
>> r600_init_microcode(rdev);
>>
>> and stop before r600_pcie_gart_enable()
>
> r600_init_microcode() doesn't actually touch the hardware it just calls 
> request_firmware() to fetch the microcode images from disk.  The microcode 
> doesn't get loaded onto the hardware until r600_cp_load_microcode() much 
> later in the function.  I don't think the microcode has anything to do with 
> this.
>
> rv515_mc_stop() stops GPU memory clients (e.g., the displays) and blacks out 
> the GPU memory controller so that we can change the location of VRAM within 
> the GPU's address space.  If one of the display controllers memory request 
> stop requests takes too long to go through for some reason, it's possible 
> that the display hardware may attempt to read from a GPU memory location 
> no-longer backed by vram (since we changed the location of vram in 
> r600_mc_program()) momentarily until the stop request goes through.  Does the 
> attached updated version of the patch help?  Alternatively, you can try 
> adding delays to the end of rv515_mc_stop() and see if that helps.
>
> Alex
>

This v2 patch didn't help. I added  mdelay(15); at the end of
rv515_mc_stop() on top of this v2 patch and that fixed the problem.
mdelay(15) is a bit much I am sure. Shouldn't rv515_mc_wait_for_idle()
take care of the delay? It waits for idle usec_timeout?

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to