Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer copies are done before submitting them

Steve Lhomme Fri, 07 Aug 2020 22:10:22 -0700

On 2020-08-07 23:59, Soft Works wrote:

-----Original Message-----
From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of
Steve Lhomme
Sent: Friday, August 7, 2020 3:05 PM
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer
copies are done before submitting them


I experimented a bit more with this. Here are the 3 scenarii in other of least
frame late:

- GetData waiting for 1/2s and releasing the lock
- No use of GetData (current code)
- GetData waiting for 1/2s and keeping the lock

The last option has horrible perfomance issues and should not be used.

The first option gives about 50% less late frames compared to the current
code. *But* it requires to unlock the Video Context. There are 2 problems
with this:

- the same ID3D11Asynchronous is used to wait on multiple concurrent
thread. This can confuse D3D11 which emits a warning in the logs.
- another thread might Get/Release some buffers and submit them before
this thread is finished processing. That can result in distortions, for example 
if
the second thread/frame depends on the first thread/frame which is not
submitted yet.

The former issue can be solved by using a ID3D11Asynchronous per thread.
That requires some TLS storage which FFmpeg doesn't seem to support yet.
With this I get virtually no frame late.

The latter issue only occur if the wait is too long. For example waiting by
increments of 10ms is too long in my test. Using increments of 1ms or 2ms
works fine in the most stressing sample I have (Sony Camping HDR HEVC high
bitrate). But this seems hackish. There's still potentially a quick frame (alt
frame in VPx/AV1 for example) that might get through to the decoder too
early. (I suppose that's the source of the distortions I
see)

It's also possible to change the order of the buffer sending, by starting with
the bigger one (D3D11_VIDEO_DECODER_BUFFER_BITSTREAM). But it seems
to have little influence, regardless if we wait for buffer submission or not.

The results are consistent between integrated GPU and dedicated GPU.


Hi Steven,

Hi,

A while ago I had extended D3D11VA implementation to support single
(non-array textures) for interoperability with Intel QSV+DX11.

Looking at your code, it seems you are copying from an array texture toa single slice texture to achieve this. With double the amount of RAM.It may be a design issue with the new D3D11 API, which forces you to dothat, but I'm not using that API. I'm using the old API.

In my case I directly render the texture slices coming out of thedecoder with no copying (and no extra memory allocation). It ishappening in a different thread than the decoder thread(s).

Also in VLC we also support direct D3D11 to QSV encoding. It doesrequire a copy to "shadow" textures to feed QSV. I never managed to makeit work without a copy.

I noticed a few bottlenecks making D3D11VA significantly slower than DXVA2.

The solution was to use ID3D10Multithread_SetMultithreadProtected and
remove all the locks which are currently applied.


I am also using that.

Hence, I don't think that your patch is the best possible way .

Removing locks and saying "it works for me" is neither a correctsolution. At the very least the locks are needed inside libavcodec toavoid setting DXVA buffers concurrently from different threads. It willmost likely result in very bad distortions if not crashes. Maybe you'reonly using 1 decoding thread with DXVA (which a lot of people do) so youdon't have this issue, but this is not my case.

Also ID3D10Multithread::SetMultithreadProtected means that the resourcescan be accessed from multiple threads. It doesn't mean that calls toID3D11DeviceContext are safe from multithreading. And my experienceshows that it is not. In fact if you have the Windows SDK installed andyou have concurrent accesses, you'll get a big warning in your debuglogs that you are doing something fishy. On WindowsPhone it would evencrash. This is how I ended up adding the mutex to the old API(e3d4784eb31b3ea4a97f2d4c698a75fab9bf3d86).


The documentation for ID3D11DeviceContext is very clear about that [1]:

"Because each ID3D11DeviceContext is single threaded, only one threadcan call a ID3D11DeviceContext at a time. If multiple threads mustaccess a single ID3D11DeviceContext, they must use some synchronizationmechanism, such as critical sections, to synchronize access to thatID3D11DeviceContext."

The DXVA documentation is a lot less clearer on the subject. But giventhe ID3D11VideoContext derives from a ID3D11DeviceContext (but is not aID3D11DeviceContext) it's seem correct to assume it has the samerestrictions.

[1]https://docs.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-render-multi-thread-intro

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer copies are done before submitting them

Reply via email to