Yeah, that will work around it for now. But the general problem is that we have a memory corruption here, we just didn't noticed it earlier because clearing a texture or vectors with zero only results in random mis rendering.
Only when you hit a shader or in this case a page table it really manifests in a bad crash. Going to dig deeper into this, Christian. Am 29.05.2014 18:51, schrieb Marek Ol??k: > Can disable evictions for page tables, e.g. by removing them from the LRU > list? > > Marek > > On Thu, May 29, 2014 at 6:30 PM, Christian K?nig > <deathsimple at vodafone.de> wrote: >> Hi Marek & Alex, >> >> I've found the issue why forcefully evicting page tables sometimes crashes >> the box. >> >> Well this is a typical hexdump page table before it is moved to GART: >> 000117f000 02914061 00000000 >> 000117f008 02915061 00000000 >> 000117f010 02916061 00000000 >> 000117f018 02917061 00000000 >> 000117f020 02918061 00000000 >> >> And it looks like this when it comes back: >> 0006102000 00000000 00000000 >> * >> >> Ideas? I don't really have an explanation for this. Moving buffers around >> otherwise seems to work perfectly fine. >> >> Thanks, >> Christian. >> >> Am 28.05.2014 12:38, schrieb Christian K?nig: >> >>> I already tried a similar patch as well, without any more noticeable >>> crashes. But going to give this another round with your patch and openarena. >>> >>> Thanks, >>> Christian. >>> >>> Am 27.05.2014 23:55, schrieb Marek Ol??k: >>>> Hi Christian, >>>> >>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not >>>> fixed yet. They are very rare and very random. Therefore, I have come >>>> up with a patch which evicts page tables between IBs. See the >>>> attachment. With that patch applied, the system starts fine, compiz >>>> and glxgears work, but once I start playing openarena, it locks up >>>> pretty quickly. >>>> >>>> The patch shouldn't do anything in theory, because pages are moved >>>> back to VRAM immediately after that. However, the VRAM address of page >>>> tables may end up being different from before, which might be the root >>>> cause. >>>> >>>> Marek >>>> >>>> On Wed, May 14, 2014 at 2:11 PM, Christian K?nig >>>> <deathsimple at vodafone.de> wrote: >>>>> Crap, any chance you can narrow it down a bit more? >>>>> >>>>> I've just tried a piglit quick test on my Bonaire and it seems to work >>>>> perfectly fine. >>>>> >>>>> What hw do you test on? >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> Am 13.05.2014 23:21, schrieb Marek Ol??k: >>>>> >>>>>> Hi Christian, >>>>>> >>>>>> Even though some regressions are fixed by these patches: >>>>>> >>>>>> drm/radeon: fix page directory update size estimation >>>>>> drm/radeon: fix buffer placement under memory pressure v2 >>>>>> >>>>>> and indeed, the texelFetch tests no longer hang, there is one more >>>>>> hang which needs to be fixed. :( All I know is the exact same commit >>>>>> causes it and it can only be reproduced by running whole piglit with >>>>>> concurrency enabled. >>>>>> >>>>>> My kernel git log: >>>>>> >>>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2 >>>>>> (10 hours ago) <Christian K?nig> >>>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21 >>>>>> hours ago) <Christian K?nig> >>>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2 >>>>>> months ago) <Christian K?nig> >>>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2 >>>>>> months ago) <Christian K?nig> >>>>>> >>>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either >>>>>> of the two fixes is the first bad commit. >>>>>> >>>>>> Marek >>>>>> >>>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k <maraeo at gmail.com> wrote: >>>>>>> Hi Christian, >>>>>>> >>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire: >>>>>>> >>>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592 >>>>>>> Author: Christian K?nig <christian.koenig at amd.com> >>>>>>> Date: Thu Feb 20 13:42:17 2014 +0100 >>>>>>> >>>>>>> drm/radeon: use normal BOs for the page tables v4 >>>>>>> >>>>>>> No need to make it more complicated than necessary, >>>>>>> just allocate the page tables as normal BO and >>>>>>> flush whenever the address change. >>>>>>> >>>>>>> v2: update comments and function name >>>>>>> v3: squash bug fixes, page directory and tables patch >>>>>>> v4: rebased on Mareks changes >>>>>>> >>>>>>> Signed-off-by: Christian K?nig <christian.koenig at amd.com> >>>>>>> >>>>>>> >>>>>>> Reverting the commit gives me a lot of merge conflicts. >>>>>>> >>>>>>> The simplest way to reproduce the hangs is to run piglit with these >>>>>>> parameters: >>>>>>> -t texelFetch.fs >>>>>>> >>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also >>>>>>> run in parallel, which creates a lot of memory pressure and probably >>>>>>> causes buffer evictions. >>>>>>> >>>>>>> Any idea what is wrong with it? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Marek >>>>>