Am 09.09.2016 um 03:38 schrieb Michel Dänzer:
On 08/09/16 05:59 PM, Christian König wrote:
Am 08.09.2016 um 10:42 schrieb Michel Dänzer:
On 08/09/16 05:05 PM, Christian König wrote:
Am 08.09.2016 um 08:23 schrieb Michel Dänzer:
On 08/09/16 01:13 PM, Nayan Deshmukh wrote:
On Thu, Sep 8, 2016 at 9:03 AM, Michel Dänzer <mic...@daenzer.net
<mailto:mic...@daenzer.net>> wrote:
On 08/09/16 02:48 AM, Nayan Deshmukh wrote:
use a linear buffer in case of back buffer
Signed-off-by: Nayan Deshmukh <nayan26deshm...@gmail.com
<mailto:nayan26deshm...@gmail.com>>
However, as we discussed before, for various reasons it would
probably be better to create separate linear buffers instead of making
all buffers linear.
So should I maintain a single linear buffer and copy the back
buffer to
it before sending it via the present extension?
It's better to create one linear buffer corresponding to each
non-linear
buffer with contents to be presented. Otherwise the rendering GPU may
overwrite the linear buffer contents while the presentation GPU is
still
reading from it, resulting in tearing-like artifacts.
That approach isn't necessary. VDPAU has functions to query if an output
surface is still displayed or not.
If the application starts to render into a buffer while it is still
being displayed tearing-like artifacts are the expected result.
You're talking about the buffers exposed to applications via VDAPU. I
was talking about using a single separate linear buffer which would be
used for presentation of all VDPAU buffers. There's no way for the
application to know when that's idle.
Ok, yes that makes more sense.
Additional to that I've made the VDPAU output surfaces linear a while
ago anyway, because it showed that tiling actually wasn't beneficial in
this use case (a single quad rendered over the whole texture).
That's fine as long as the buffers are in VRAM, but when they're pinned
to GTT for sharing between GPUs, rendering to them with the 3D engine
results in bad PCIe bandwidth utilization, as Marek explained recently.
So even if the original buffers are already linear, it's better to keep
those in VRAM and use separate buffers for sharing between GPUs.
Mhm at least for VDPAU most compositions should happen on temporary
buffers anyway when there are any filters enabled.
In that case, do the contents get into the final buffer via a blit or
some kind of triangle / quad draw operation?
It's a quad draw operation.
And yeah thinking about it using a blit (e.g. the DMA) is probably a
memory access pattern which is much more friendly to bus transactions.
Anyway I would clearly suggest to handle that in the VDPAU state tracker
and not in the DRI3 code, cause the handling needed seems to be
different for VA-API and I would really like to avoid any additional
copy for 4k playback.
The thing is, with a discrete GPU, having separate buffers for sharing
between GPUs and transferring the final contents to be presented to
those buffers using a blit might be faster than having any of the
previous steps render to the shared buffer in GTT directly. Only the
DRI3 specific code knows about this.
Indeed, but I was wondering if we couldn't export that information to
the state tracker somehow.
4k playbacks nearly max out our memory bandwidth limits, every copy
avoided is very helpful with that.
Regards,
Christian.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev