On 2020-08-10 12:04, Soft Works wrote:
the very least the locks are needed inside libavcodec to avoid
setting DXVA buffers concurrently from different threads. It will
most likely result in very bad distortions if not crashes. Maybe
you're only using 1 decoding thread with DXVA (which a lot of people
do) so you don't have this issue, but this is not my case.

I see no point in employing multiple threads for hw accelerated decoding.
To be honest I never looked into or tried whether ffmpeg even supports
multiple threads with dva2 or d3d11va hw acceleration.

Maybe you're in an ideal situation where all the files you play through
libavcodec are hardware accelerated (so also with matching hardware). In
this case you don't need to care about the case where it will fallback to
software decoding. Using a single thread in that case would have terrible
performance.

I think we need to clarify the use cases we're talking about.

There is no "my case". All I'm talking about is using D3D11VA hardware
acceleration using the ffmpeg.exe CLI.

You seem to have a rather special case where you are using parts of
the ffmpeg DXVA/D3D11VA code from another application (VLC?)

I don't think there is anything special about using libavcodec in another application. That's where the code we're discussing is, not the ffmpeg CLI or ffplay. The API has to be designed to work in all host apps, not just these simpler use cases.

Did I understand that correctly?

The documentation for ID3D11DeviceContext is very clear about that [2]:
"Because each ID3D11DeviceContext is single threaded, only one thread
can call a ID3D11DeviceContext at a time. If multiple threads must
access a single ID3D11DeviceContext, they must use some
synchronization mechanism, such as critical sections, to synchronize
access to that ID3D11DeviceContext."

Yes, but this doesn't apply to accessing staging textures IIRC.

It does. To copy to a staging texture you need to use
ID3D11DeviceContext::CopySubresourceRegion().

Correct. And with DX11 and using SetMultithreadProtected it is legal
to call this from multiple threads without synchronization.

No. I already explained it and pointed to the Microsoft documentation [1]. SetMultithreadProtected relates to ID3D11Device. ID3D11DeviceContext needs to be managed as non-thread safe resource. If you want, you can even create one ID3D11DeviceContext per thread [2]. I'd be curious to see the effect on multithreaded decoding.

Also it seems SetMultithreadProtected() is not even needed by default. It enables the "thread-safe layer" [3]. But in d3d11 that's the default behavior. See [4] "Use this flag if your application will only call methods of Direct3D 11 interfaces from a single thread. By default, the ID3D11Device object is thread-safe."
SetMultithreadProtected() only made sense for D3D10:
"Direct3D 11 has been designed from the ground up to support multithreading. Direct3D 10 implements limited support for multithreading using the thread-safe layer."

You probably don't have any synchronization issues in your pipeline because
it seems you copy from GPU to CPU. In that case it forces the
ID3D11DeviceContext::GetData() internally to make sure all the commands
to produce your source texture on that video context are finished
processing. You may not see it, but there's a wait happening there.

I've looked back into my work history and gladly most memory
came back.

Yes, it's correct, there's a "wait happening". From your wording I
would assume that you've already realized that I was right in stating
that there's no need for an external locking:

- Not for uploading
- Not for downloading (at least not for the regular ffmpeg use case)

There is still some locking applied: Internally inside the DX11 runtime
(because we are using SetMultithreadProtected). And there's also
the "wait happening".

As the doc says, you have to use some synchronization. It may work in your case (FFmpeg CLI I suppose). As you mentioned you only use one thread. There's less chance that it can fail. But copying memory to/from CPU/GPU is probably the slowest part of the whole decoding (hence we don't do any in VLC in normal playback). So if you have one decoding thread doing that copy and another thread reading on the same ID3D11DeviceContext you're likely going to race-condition issues. I don't know what FFmpeg CLI does, so I can't tell.

Let's go through an example: Downloading of a texture

1. Context_CopySubresourceRegion: Copy GPU texture to staging texture

CopySubresourceRegion is asynchronous anyway. It just puts the copy
request into the DX11 processing queue. Using SetMultithreadProtected
avoids any race conditions, but this call always returns immediately.

2. Context_Map: Make the staging texture accessible for the CPU

When called without MapFlags, this call blocks until the texture is
mapped (and we can be sure that CopySubresourceRegion is executed
by then).

=> This is the 'wait' you've been talking about

Yes.

3. av_image_copy : Copy the image from the staging texture

Takes its time for copying obviously.

4. Context_Unmap: Release the texture mapping

Returns immediately

-----------------

We've seen that there is no locking required with regards to DX11,
but there's still one thing left: The staging texture. To resolve this
I'm using multiple staging textures (it's system memory, not GPU
memory).

When we look at the sequence 1 - 2 - 3 - 4, it's obvious that
It can run much faster when just the individual steps are synchronized
(by DX11) as when we would put one big lock around 1234 from
our side.

I've been struggling a long time with this, because DXVA2 was
often much faster than D3D11VA and this kind of parallelism
was finally the way to get it working equally fast.

That's one very particular case where you do a copy to CPU. There is some synchronization happening the memory mapping. But that only covers a small part of the possibilities of D3D11VA in libavcodec. And that's certainly not what I use. You can't deduce from that usage that synchronization (access to ID3D11DeviceContext) is not needed. In fact the Microsoft documentation and my experience show the exact opposite.


-----------------

It is not really obvious from the documentation that it is legal
to use CopySubresourceRegion, Map and UnMap in (pseudo-)
parallel even on multiple indexes of the same ArrayTexture.
IIRC I got one hint at this from an internal (yet public) Nvidia
presentation about DX11 and another one from the source code
of a game engine, but I haven't saved those links.

-----------------

@Steven

My name is Steve.

I don't know anything about your specific way of
using the ffmpeg code.  Perhaps, the above information is
useful for you in some way, but maybe those locks are unavoidable
in your case.

My only concern is that your changes do not slow down normal
ffmpeg operation - like the locks you had added earlier.
Maybe those could be put into some condition?

My change in e3d4784eb31b3ea4a97f2d4c698a75fab9bf3d86 is optional. So much that it even requires to use the creator helper function to make sure the mutex is properly initialized to an empty value and retain backward compatibility. If you don't want to use many threads you can safely ignore this field.

That being said, that's for the old API. I suppose the one you're talking about is the new API for which I have done nothing. If the mutex is always set, that's not my fault.

If you want the lock to have no effect, you can set the lock/unlock callbacks of AVD3D11VADeviceContext to functions that do nothing. If you don't set them, the documentation says: "If unset on init, the hwcontext implementation will set them to use an internal mutex."

It's certainly better than commenting out a whole bunch of code [5].

Kind regards,
softworkz

[1] https://docs.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-render-multi-thread-intro [2] https://docs.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-render-multi-thread-render [3] https://docs.microsoft.com/en-us/windows/win32/api/d3d10/nn-d3d10-id3d10multithread [4] https://docs.microsoft.com/en-us/windows/win32/api/d3d11/ne-d3d11-d3d11_create_device_flag [5] https://github.com/softworkz/ffmpeg_dx11/commit/c09cc37ce7f513717493e060df740aa0e7374257
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to