Interesting, and thanks for poking at this issue. I've been thinking
about tuning IB sizes as well. I'd like for us to get this right, so I
wonder: What's your theory for _why_ your change helps?


See below. I think you discovered it yourself.

I'll be honest with you: Right now, I think your approach contains too
much unexplained "magic". What's the theory that explains using buffer
wait averages in this way?


I agree that there is too much, magic, e.g. the cutoff buffer-wait-time for small IBs is quite magical and can't be explained well.

My theory for why your change helps is about CPU/GPU parallelism. When
we wait for buffer idle, this most likely means the GPU becomes
idle.[1] If you use a large IB to start the GPU up again, you'll wait
a longer time before the GPU starts doing work again. Basically, in
ASCII art:

                GPU idle
GPU =========+..............+=====
      |      |              |
CPU ==+......+==============+=====
       buffer
        wait

By reducing the size of the IB, the picture changes like this:

             GPU idle
GPU =========+......+=====
      |      |      |
CPU ==+......+======+=====
       buffer
       wait

It takes a shorter amount of CPU time before the GPU gets new work,
the GPU is utilized more fully and the program runs faster.


Yes, that is the basic idea. :)
When it is likely that we need to synchronize work with the GPU later on, it pays off to queue work sooner to keep the GPU busy most of the time, and that is enforced by smaller IBs.

If this explanation is correct and all there is to it, then it suggest
the logic for when IBs should be shorter. Basically, we should use
short IBs when the GPU is idle.[2]


Right, but the problem is to cheaply and reliably determine idleness.

There are a bunch of different options. A simple one that comes
closest to what your patch does - without actually querying for GPU
idle - is to just make the first IB after each buffer wait a small
one. The length of the buffer wait doesn't seem important because what
we need to address is the fact that the GPU is idle. That's a boolean
matter.


Let me give that a try, sounds like a good idea. Particularly, we could use *really* small IBs without affecting general performance in this case, at least in theory.

For the moment, the slightly "magic" way with buffer-wait-time still leads to consistent improvements (I did not see any regressions, either). So I'll try to describe the magic somewhat in a upcoming patch and hope that's alright for inclusion.

Grigori

PS: "about to become idle" is probably hard to measure, so the small IB approach maybe has some merit even if we can easily check idleness.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to