On 15.04.2016 12:50, Grigori Goronzy wrote:
apps that cause a lot of synchronization benefit from small IB
sizes. The current IB size is a bit on the large side for this class
of apps. On the other hand, if there isn't much synchronization going
on, increasing the IB size can slightly improve performance, too.
Here's a quick hack that tunes the IB size based on feedback from
buffer_wait_time. What do you think? I see good results with Unigine
Heaven (no synchronization, benefits from larger IB size), Metro Last
Light (lots of synchronization, benefits from small IBs) as well as
OpenArena and Xonotic (same).
Interesting, and thanks for poking at this issue. I've been thinking
about tuning IB sizes as well. I'd like for us to get this right, so I
wonder: What's your theory for _why_ your change helps?
I'll be honest with you: Right now, I think your approach contains too
much unexplained "magic". What's the theory that explains using buffer
wait averages in this way?
My theory for why your change helps is about CPU/GPU parallelism. When
we wait for buffer idle, this most likely means the GPU becomes idle.[1]
If you use a large IB to start the GPU up again, you'll wait a longer
time before the GPU starts doing work again. Basically, in ASCII art:
GPU idle
GPU =========+..............+=====
| | |
CPU ==+......+==============+=====
buffer
wait
By reducing the size of the IB, the picture changes like this:
GPU idle
GPU =========+......+=====
| | |
CPU ==+......+======+=====
buffer
wait
It takes a shorter amount of CPU time before the GPU gets new work, the
GPU is utilized more fully and the program runs faster.
If this explanation is correct and all there is to it, then it suggest
the logic for when IBs should be shorter. Basically, we should use short
IBs when the GPU is idle.[2]
There are a bunch of different options. A simple one that comes closest
to what your patch does - without actually querying for GPU idle - is to
just make the first IB after each buffer wait a small one. The length of
the buffer wait doesn't seem important because what we need to address
is the fact that the GPU is idle. That's a boolean matter.
Because of [1] it would probably be a better approach to use fences to
determine whether or how many previous IBs are still in flight.
Cheers,
Nicolai
[1] Although not necessarily. We may be trying to map a buffer that is
still in flight, but only referenced by an older IB.
[2] Or about to become idle, since we want to keep the pipeline full.
Although both doesn't apply in the rare case where the CPU driver
overhead for constructing the IBs is consistently higher than the GPU
work that needs to be done for those IBs. In that case, we should still
use large IBs to reduce the driver overhead.
Note: this patch applies on top of Bas' constant engine patchset.
Grigori
In-Reply-To:
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev