Re: [FFmpeg-devel] [PATCH 6/6] ffv1enc_vulkan: switch to receive_packet

Jerome Martinez Sun, 24 Nov 2024 07:52:25 -0800

Le 24/11/2024 à 04:41, Lynne via ffmpeg-devel a écrit :

On 11/23/24 23:10, Jerome Martinez wrote:
Le 23/11/2024 à 20:58, Lynne via ffmpeg-devel a écrit :
This allows the encoder to fully saturate all queues the GPU
has, giving a good 10% in certain cases and resolutions.
Using a RTX 4070:
+50% (!!!) with 2K 10-bit content.
+17% with 4K 16-bit content.
Also the speed with 2K content is now 4x the speed of 4K contentwhich is similar to the SW encoder (with similar count of slices) andwhich is the expected result, it seems that a bottleneck with smallerresolutions is removed.
Unfortunatly, it has a drawback, a 6K5K content which was wellhandled without this patch is now having an immediate error:[vost#0:0/ffv1_vulkan @ 0x10467840] [enc:ffv1_vulkan @ 0x12c011c0]Error submitting video frame to the encoder[vost#0:0/ffv1_vulkan @ 0x10467840] [enc:ffv1_vulkan @ 0x12c011c0]Error encoding a frame: Cannot allocate memory[vost#0:0/ffv1_vulkan @ 0x10467840] Task finished with error code:-12 (Cannot allocate memory)[vost#0:0/ffv1_vulkan @ 0x10467840] Terminating thread with returncode -12 (Cannot allocate memory)
Which is a problem, the handling of 6K5K being good on the RTX 4070(3x faster than a CPU at the same price) before this patch.Is it possible to keep the handling of bigger resolutions on suchcard while keeping the performance boost of this patch?
To an extent. At high resolutions, -async_depth 0 (maximum) harmsperformance for higherresolution. I get the best results with it set to 2 or 3 for 6kcontent, on my odd setup.Increasing async_depth increases the amount of VRAM used, so that'sthe tradeoff.Automatically detecting it is difficult, as Vulkan doesn't give youmetrics on how much free
VRAM there is, so there's nothing we can do

I am torn between a default having as much performance as possible and adefault working for sure (a default value of 1 is OK for the 6K5Kcontent on the RTX 4070, not 2).Surprisingly, default async_depth works on 4K (51 MiB) but async_depth 2does not work on 6K5K (183 MiB), but I don't know what is the value ofnb_queues.Maybe real use case is a user managing 6K5K with the biggest GPUavailable so it does not hurt much to have a default crashing with suchbig content.

The encoder catches the allocation error and sends a nice message,wouldn't it possible to reduce automatically async_depth and retryinstead of sending immediately the error, in the case async_depth is notprovided, and error only if -async_depth 1 does not work?

than to document it and hope users follow the instructions in casethey run out of memory.

If not possible to try automatically smaller values, is it possible toadd "use -async_depth with a value smaller than (here the currentvalue)" to the error message?

The good news is that -async_depth 1 uses less VRAM than before thispatch.Must of the VRAM used is from somewhere within Nvidia's black-boxdriver, as RADVuses 1/3rd of the VRAM at the same content and async_depth settings.Nothing we
can do about this too.
This also improves error resilience if an allocation fails,
and properly cleans up after itself if it does.
Looks like that this part does not work, still a freeze if anallocation fails.
This is due to Nvidia's drivers. If you switch to using their GSP
firmware, recovery is instant, pretty much.


Beyond my knowledge, and it does not make things worse so not blocking.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 6/6] ffv1enc_vulkan: switch to receive_packet

Reply via email to