This series improves performance for the cases when there is not enough VRAM 
for all buffers.

First of all, I'd like to mention that if you set both VRAM and GTT domains for 
a buffer, you pretty much say you don't care where the buffer ends up. It 
usually makes the performance even worse.

This work was largely benchmark-driven and I tried a lot of ideas before I 
found out which ones work. The patches describe what they do and they're quite 
simple, so I'll just share the results here.


Card: Evergreen Redwood (HD 5670), 512 MB of VRAM
Test: Unigine Heaven 4.0, High settings

1) 1280x720, 4x MSAA, need 525 MB of VRAM

Without patches: 16.6 FPS
With patches: 16.6 FPS
Improvement: 0 %

2) 1600x900, 4x MSAA, need 642 MB of VRAM

Without patches: 7.1 FPS
With patches: 9.7 FPS
Improvement: 36 %

3) 1920x1080, 4x MSAA, need 743 MB of VRAM

Without patches: 3.7 FPS
With patches: 5.6 FPS
Improvement: 51 %

4) 1600x900, 8x MSAA, need 838 MB of VRAM
Without patches: 2.9 FPS
With patches: 4.6 FPS
Improvement: 58 %

These results don't change if you run the benchmark several times, which proves 
the improvement is stable.


To conclude this, here are ideas for future work:

1) Add virtual memory support for VRAM. Our GPUs support virtual memory, which 
not only solves fragmentation issues, but it also allows each buffer to be 
partially in VRAM and partially in GTT, which becomes more important with large 
buffers like 100 MB. Moving whole buffers back and forth between VRAM and GTT 
is inefficient if you can do it at page granularity. Also, due to 
fragmentation, we can never really use all of VRAM, but only about 90-95%.

2) Add support for uncached GTT. I think it should improve performance for 
dGPUs under memory pressure, but some testing needs to be done to confirm that. 
Uncached GTT doesn't seem to work for me on Evergreen, but it's said to be 
working on some later chips.


The patches for Mesa will follow later today. Please review.

Marek

Reply via email to