On 13.05.2014 22:27, Marek Ol??k wrote: > I applied these two patches Christian sent to dri-devel: > > drm/radeon: fix page directory update size estimation > drm/radeon: fix buffer placement under memory pressure v2 > > on top of torvalds's master branch. >
With latest kernel master (a991639c) I still see a regression, compared to 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower. OpenArena and Unigine Tropics are also noticeably slower, but I didn't record accurate numbers. Maybe the improved memory management has some overhead, but this is not acceptable IMHO. I'll try to investigate further. Best regards Grigori > Marek > > On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg at chown.ath.cx> > wrote: >> On 13.05.2014 21:50, Marek Ol??k wrote: >>> >>> Hi Christian, >>> >>> The performance regression I saw with piglit seems to be fixed with >>> latest kernel git. It's difficult to bisect the kernel, because there >>> are only merges between 3.14 and 3.15 and the merged committs are >>> actually based on 3.14-rc1 and 3.14-rc4. >>> >>> All seems to be fine with your fixes. >>> >> >> Which fixes have you applied? There are quite a few pending patches on >> dri-devel, that aren't yet part of drm-fixes-3.15. >> >> Grigori >> >> >>> Marek >>> >>> On Tue, May 13, 2014 at 5:31 PM, Christian K?nig >>> <deathsimple at vodafone.de> wrote: >>>> >>>> Is the performance regression regression caused by the page table changes >>>> or >>>> something else? >>>> >>>> I did made some tests with xonotic while developing it and it didn't >>>> showed >>>> anything obvious, but I didn't made tests on different systems. >>>> >>>> Christian. >>>> >>>> Am 13.05.2014 17:19, schrieb Marek Ol??k: >>>> >>>>> Your latest patches fix the regression. >>>>> >>>>> The performance regression can also be reproduced with piglit "-t >>>>> texelFetch.fs". >>>>> >>>>> Kernel 3.14: >>>>> real 0m17.724s >>>>> user 0m41.905s >>>>> sys 0m11.299s >>>>> >>>>> The problematic commit checked out + your fixes (without the PTE patch I >>>>> think): >>>>> real 0m23.474s >>>>> user 1m1.008s >>>>> sys 0m13.812s >>>>> >>>>> Marek >>>>> >>>>> >>>>> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig >>>>> <deathsimple at vodafone.de> wrote: >>>>>> >>>>>> >>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher: >>>>>> >>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg at chown.ath.cx> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> I can confirm this fixes it for me, too. >>>>>>>> >>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up >>>>>>>> being >>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder >>>>>>>> what's >>>>>>>> going on. >>>>>>> >>>>>>> >>>>>>> Allocation overhead? >>>>>> >>>>>> >>>>>> >>>>>> Unlikely, Xonotic just allocates a single page table at start, which >>>>>> then >>>>>> gets extended to a certain rate until they no longer need more address >>>>>> space >>>>>> and are done with it. >>>>>> >>>>>> Grigori, can you bisect and/or try to figure out what's wrong here? >>>>>> >>>>>> Christian. >>>>>> >>>>>> >>>>>>> >>>>>>>> Grigori >>>>>>>> >>>>>>>> >>>>>>>> On 12.05.2014 14:50, Christian K?nig wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I could reproduce the problem with xonotic and I think I've found >>>>>>>>> the >>>>>>>>> issue. >>>>>>>>> >>>>>>>>> Please test the attached patch. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>> Am 11.05.2014 11:06, schrieb Christian K?nig: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I have tested it and it doesn't fix the hangs. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yeah, thought so. Well it was just a guess. >>>>>>>>>> >>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I >>>>>>>>>>> added >>>>>>>>>>> for userspace buffers.) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Actually it shouldn't affect that. The alternative domain always >>>>>>>>>> contains GART even when userspace only specified VRAM as placement >>>>>>>>>> (as >>>>>>>>>> long as it is technical possible to do so). >>>>>>>>>> >>>>>>>>>> So what should happen is that TTM sees the current placement, >>>>>>>>>> matches >>>>>>>>>> that with the desired placement and should find that it doesn't >>>>>>>>>> need >>>>>>>>>> to move the buffer (we should just test if this behavior really >>>>>>>>>> works >>>>>>>>>> as expected). >>>>>>>>>> >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Ol??k: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Christian, >>>>>>>>>>> >>>>>>>>>>> I have tested it and it doesn't fix the hangs. >>>>>>>>>>> >>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I >>>>>>>>>>> added >>>>>>>>>>> for userspace buffers.) >>>>>>>>>>> >>>>>>>>>>> Marek >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig >>>>>>>>>>> <deathsimple at vodafone.de> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is >>>>>>>>>>>> just >>>>>>>>>>>> a >>>>>>>>>>>> complete shoot into the dark found by rereading the code, but it >>>>>>>>>>>> might >>>>>>>>>>>> actually be the problem. >>>>>>>>>>>> >>>>>>>>>>>> Please give it a try. >>>>>>>>>>>> >>>>>>>>>>>> Going to keep testing in the meantime, >>>>>>>>>>>> Christian. >>>>>>>>>>>> >>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian K?nig: >>>>>>>>>>>> >>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. >>>>>>>>>>>>>> if >>>>>>>>>>>>>> I boot >>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with >>>>>>>>>>>>>> high >>>>>>>>>>>>>> settings. >>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a >>>>>>>>>>>>>> similar >>>>>>>>>>>>>> problem. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case. >>>>>>>>>>>>> >>>>>>>>>>>>>> Any idea what is wrong with it? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Actually I already wondered that it went so smooth without any >>>>>>>>>>>>> regression >>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet. >>>>>>>>>>>>> >>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests >>>>>>>>>>>>>> also >>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and >>>>>>>>>>>>>> probably >>>>>>>>>>>>>> causes buffer evictions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict some >>>>>>>>>>>>> part of a >>>>>>>>>>>>> page table without updating the page directory. Going to dig >>>>>>>>>>>>> into >>>>>>>>>>>>> it today, >>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM code. >>>>>>>>>>>>> >>>>>>>>>>>>> Christian. >>>>>>>>>>>>> >>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Ol??k wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on >>>>>>>>>>>>>>> Bonaire: >>>>>>>>>>>>>>> [...] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with >>>>>>>>>>>>>>> these >>>>>>>>>>>>>>> parameters: >>>>>>>>>>>>>>> -t texelFetch.fs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the >>>>>>>>>>>>>>> tests >>>>>>>>>>>>>>> also >>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and >>>>>>>>>>>>>>> probably >>>>>>>>>>>>>>> causes buffer evictions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. >>>>>>>>>>>>>> if >>>>>>>>>>>>>> I boot >>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with >>>>>>>>>>>>>> high >>>>>>>>>>>>>> settings. >>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a >>>>>>>>>>>>>> similar >>>>>>>>>>>>>> problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Grigori >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dri-devel mailing list >>>>>>>> dri-devel at lists.freedesktop.org >>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>> >>>>>> >>>>>> >>>> >>