Am 18.07.2014 05:07, schrieb Michel D?nzer: > On 17.07.2014 19:09, Christian K?nig wrote: >> Am 17.07.2014 12:01, schrieb Michel D?nzer: >>> In order to try and improve X(Shm)PutImage performance with glamor, I >>> implemented support for write-combined CPU mappings of BOs in GTT. >>> >>> This did provide a nice speedup, but to my surprise, using VRAM >>> instead >>> of write-combined GTT turned out to be even faster in general on my >>> Kaveri machine, both for the internal GPU and for discrete GPUs. >>> >>> However, I've kept the changes from GTT to VRAM separated, in case >>> this >>> turns out to be a loss on other setups. >>> >>> Kernel patches: >>> >>> [PATCH 1/5] drm/radeon: Remove radeon_gart_restore() >>> [PATCH 2/5] drm/radeon: Pass GART page flags to >>> [PATCH 3/5] drm/radeon: Allow write-combined CPU mappings of BOs in >>> [PATCH 4/5] drm/radeon: Use write-combined CPU mappings of rings and >> >> Those four are Reviewed-by: Christian K?nig <christian.koenig at amd.com> > > Thanks! > > >>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI >> >> I'm still not very keen with this change since I still don't >> understand >> the reason why it's faster than with GTT. Definitely needs more >> testing >> on a wider range of systems. > > Sure. If anyone wants to give this patch a spin and see if they can > measure any performance difference, good or bad, that would be > interesting. > >> Maybe limit it to APUs for now? > > But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an > even > bigger win with dedicated GPUs than with the Kaveri built-in GPU on my > system. I suspect it may depend on the bandwidth available for PCIe vs. > system memory though.
Michel, please, please do NOT change anything on this!;-) You all know that I currently can only run this on my poor Duron 1800 with RV730 (AGP), but... With this all 'objview' demos (mesa-demos) run at 60 fps (vsync), even with chip set/CPU power management enabled (athcool on). If I set vblank_mode=0 the slowest GreatLakesBiplaneHP.obj run at ~100 fps (~16 fps before) => 6x speedup. (Even 5 planes run at 30 fps) - Wow!!! 'buddha' went from ~40 fps up to ~175 fps 'bunny' went from ~60 fps up to ~215 fps 'bobcat' show not such a big improvement 'only' 70 fps more R600_HYPERZ=1 help somewhat, too but not in all cases. Overall X/Kwin eXperience is much better. Let me know which benchmarks you need. Cheers, Dieter BTW Do anyone know how I can override BIOS GTT settings? I can only set 256 MB max. - BIOS patching? With agpmode=-1 I can run with 1024 MB GTT