On Fri, Jul 18, 2014 at 7:47 PM, Marek Ol??k <maraeo at gmail.com> wrote: > On Fri, Jul 18, 2014 at 5:47 PM, Christian K?nig > <deathsimple at vodafone.de> wrote: >> Am 18.07.2014 05:07, schrieb Michel D?nzer: >>>>> >>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI >>>> >>>> I'm still not very keen with this change since I still don't understand >>>> the reason why it's faster than with GTT. Definitely needs more testing >>>> on a wider range of systems. >>> >>> Sure. If anyone wants to give this patch a spin and see if they can >>> measure any performance difference, good or bad, that would be >>> interesting. >>> >>>> Maybe limit it to APUs for now? >>> >>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an even >>> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my >>> system. I suspect it may depend on the bandwidth available for PCIe vs. >>> system memory though. >> >> >> I've made a few tests today with the kernel part of the patches running >> Xonotic on Ultra in 1920 x 1080. >> >> Without any patches I get around ~47.0fps on average with my dedicated >> HD7870. >> >> Adding only "drm/radeon: Use write-combined CPU mappings of rings and IBs on >>>= SI" and that goes down to ~45.3fps. >> >> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >= SI" >> and the frame rate goes down to ~27.74fps. >> >> So enabling this unconditionally is definitely not a good idea. What I don't >> understand yet is why using USWC reduces the fps on SI as well. It looks >> like the reads from the IB buffer for command stream validation on SI affect >> that more than thought. > > Yes, there is a CS parser with SI, but shouldn't the parser read from > the CPU copy that came with the ioctl instead? Anyway, I recommend > only using VRAM for IBs which are not parsed and patched by the CPU > (which reduces it down to CIK graphics and DMA IBs, right?)
Oh, sorry. There is no CPU copy, just the IB. My recommendation still stands. Marek