Am 23.07.2014 05:54, schrieb Michel D?nzer: > On 21.07.2014 17:07, Christian K?nig wrote: >> Am 19.07.2014 03:15, schrieb Michel D?nzer: >>> On 19.07.2014 00:47, Christian K?nig wrote: >>>> Am 18.07.2014 05:07, schrieb Michel D?nzer: >>>>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI >>>>>> I'm still not very keen with this change since I still don't >>>>>> understand >>>>>> the reason why it's faster than with GTT. Definitely needs more >>>>>> testing >>>>>> on a wider range of systems. >>>>> Sure. If anyone wants to give this patch a spin and see if they can >>>>> measure any performance difference, good or bad, that would be >>>>> interesting. >>>>> >>>>>> Maybe limit it to APUs for now? >>>>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an >>>>> even >>>>> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my >>>>> system. I suspect it may depend on the bandwidth available for PCIe vs. >>>>> system memory though. >>>> I've made a few tests today with the kernel part of the patches running >>>> Xonotic on Ultra in 1920 x 1080. >>>> >>>> Without any patches I get around ~47.0fps on average with my dedicated >>>> HD7870. >>>> >>>> Adding only "drm/radeon: Use write-combined CPU mappings of rings and >>>> IBs on >= SI" and that goes down to ~45.3fps. >>>> >>>> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >= >>>> SI" and the frame rate goes down to ~27.74fps. >>> Hmm, looks like I'll need to do more benchmarking of 3D workloads as >>> well. > I haven't been able to consistently[0] measure any significant > difference between all placements of the rings and IBs with Xonotic or > Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU > memory bandwidth bound rather than CS bound anyway, so a ~40% hit from > that kernel patch alone is very surprising. Are you sure it wasn't just > the same kind of variation as described below?
Yes, I've measured that multiple times and the results where quite consistent. But I didn't measured it on a Bonaire, where the bottleneck probably isn't the CPU load. I measured it on a fast Pitcairn and there Xonotic was clearly affected by the patches. > > [0] There were slightly different results sometimes, but next time I > tried the same setup again, it was back to the same as always. So it > seemed to depend more on the particular system boot / test run / moon > phase / ... than the kernel patches themselves. > > >>> Alex, given those numbers, it's probably best if you remove the "Use >>> write-combined CPU mappings of rings and IBs on >= SI" change from your >>> tree as well for now. >> I wouldn't go as far as reverting the patch. It just needs a bit more >> fine tuning and that can happen in the 3.17rc cycle. > There's no need to revert it, just drop it from the tree. I'd still > prefer that for now. > > >> My tests clearly show that we still can use USWC for the ring buffer on >> SI and probably earlier chips as well. > Yeah, that might be the safest approach for now. How about using USWC for the rings on all chips since R600 and for the IB only on CIK? As far as I can see that should do the trick quite well. Christian.