On Wed, Jul 23, 2014 at 9:21 AM, Michel D?nzer <michel at daenzer.net> wrote: > On 23.07.2014 15:42, Christian K?nig wrote: >> Am 23.07.2014 05:54, schrieb Michel D?nzer: >>> On 21.07.2014 17:07, Christian K?nig wrote: >>>> Am 19.07.2014 03:15, schrieb Michel D?nzer: >>>>> On 19.07.2014 00:47, Christian K?nig wrote: >>>>>> Am 18.07.2014 05:07, schrieb Michel D?nzer: >>>>>>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI >>>>>>>> I'm still not very keen with this change since I still don't >>>>>>>> understand >>>>>>>> the reason why it's faster than with GTT. Definitely needs more >>>>>>>> testing >>>>>>>> on a wider range of systems. >>>>>>> Sure. If anyone wants to give this patch a spin and see if they can >>>>>>> measure any performance difference, good or bad, that would be >>>>>>> interesting. >>>>>>> >>>>>>>> Maybe limit it to APUs for now? >>>>>>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an >>>>>>> even >>>>>>> bigger win with dedicated GPUs than with the Kaveri built-in GPU >>>>>>> on my >>>>>>> system. I suspect it may depend on the bandwidth available for >>>>>>> PCIe vs. >>>>>>> system memory though. >>>>>> I've made a few tests today with the kernel part of the patches >>>>>> running >>>>>> Xonotic on Ultra in 1920 x 1080. >>>>>> >>>>>> Without any patches I get around ~47.0fps on average with my dedicated >>>>>> HD7870. >>>>>> >>>>>> Adding only "drm/radeon: Use write-combined CPU mappings of rings and >>>>>> IBs on >= SI" and that goes down to ~45.3fps. >>>>>> >>>>>> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >= >>>>>> SI" and the frame rate goes down to ~27.74fps. >>>>> Hmm, looks like I'll need to do more benchmarking of 3D workloads as >>>>> well. >>> I haven't been able to consistently[0] measure any significant >>> difference between all placements of the rings and IBs with Xonotic or >>> Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU >>> memory bandwidth bound rather than CS bound anyway, so a ~40% hit from >>> that kernel patch alone is very surprising. Are you sure it wasn't just >>> the same kind of variation as described below? >> >> Yes, I've measured that multiple times and the results where quite >> consistent. >> >> But I didn't measured it on a Bonaire, where the bottleneck probably >> isn't the CPU load. I measured it on a fast Pitcairn > > Ahem, my Bonaire is cranking out ~90fps of Xonotic Ultra at 1920x1080. > :) (And AFAIK there are even faster Bonaire variants) > >> and there Xonotic was clearly affected by the patches. > > Okay, I hadn't realized we're not doing any command stream checking as > of CIK, that probably explains the difference.
I think CIK is doing CS checking for VCE, but not for graphics. SI is doing CS checking for everything. Marek