Re: [FFmpeg-devel] [PATCH] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)
Thanks for pointing that out. Submitted v2 https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250527092731.51819-1-f1k2fa...@gmail.com/ Changes in v2: - Fixed documentation copy-paste error in hstack_cuda section - Corrected filter reference from @ref{vstack} to @ref{hstack} - Fixed typo "orignal" -> "original" - Improved description accuracy for horizontal stacking ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)
Add hardware-accelerated stack filters for CUDA that provide equivalent functionality to the software stack filters but with GPU acceleration. Features: - Support for hstack, vstack, and xstack operations - Compatible pixel formats such as: yuv420p, nv12, yuv444p, p010le, p016le, yuv444p16le, rgb0, bgr0, rgba, bgra - Fill color support with automatic RGB to YUV conversion for YUV formats - Proper chroma subsampling handling for all supported formats - Integration with existing stack filter infrastructure via stack_internal.h The implementation follows the established CUDA filter pattern from vf_scale_cuda.c, using PTX modules for kernel execution and proper CUDA context management. Copy operations handle frame placement while color operations fill background areas when using fill colors. This enables efficient video composition workflows entirely on GPU without CPU-GPU memory transfers, significantly improving performance for multi-input video processing pipelines. Examples: $ ffmpeg -hwaccel cuda -i input.h265 -filter_complex "[0:v][0:v]hstack_cuda" -c:v hevc_nvenc out.h265 $ ffmpeg \ -hwaccel cuda -i input1.mp4 \ -hwaccel cuda -i input2.mp4 \ -hwaccel cuda -i input3.mp4 \ -hwaccel cuda -i input4.mp4 \ -filter_complex "[0:v]hwupload_cuda[0v];[1:v]hwupload_cuda[1v];[2:v]hwupload_cuda[2v];[3:v]hwupload_cuda[3v];[0v][1v][2v][3v]xstack_cuda=inputs=4:fill=black:layout=0_0|w0_0|0_h0|w0_h0" \ -c:v hevc_nvenc out.mp4 Signed-off-by: Faeez Kadiri --- Changelog| 1 + configure| 6 + doc/filters.texi | 78 + libavfilter/Makefile | 3 + libavfilter/allfilters.c | 3 + libavfilter/vf_stack_cuda.c | 589 +++ libavfilter/vf_stack_cuda.cu | 389 +++ 7 files changed, 1069 insertions(+) create mode 100644 libavfilter/vf_stack_cuda.c create mode 100644 libavfilter/vf_stack_cuda.cu diff --git a/Changelog b/Changelog index 4217449438..0dec3443d4 100644 --- a/Changelog +++ b/Changelog @@ -18,6 +18,7 @@ version : - APV encoding support through a libopenapv wrapper - VVC decoder supports all content of SCC (Screen Content Coding): IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform +- hstack_cuda, vstack_cuda and xstack_cuda filters version 7.1: diff --git a/configure b/configure index 3730b0524c..5c2d6e132d 100755 --- a/configure +++ b/configure @@ -4033,6 +4033,12 @@ xfade_vulkan_filter_deps="vulkan spirv_compiler" yadif_cuda_filter_deps="ffnvcodec" yadif_cuda_filter_deps_any="cuda_nvcc cuda_llvm" yadif_videotoolbox_filter_deps="metal corevideo videotoolbox" +hstack_cuda_filter_deps="ffnvcodec" +hstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" +vstack_cuda_filter_deps="ffnvcodec" +vstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" +xstack_cuda_filter_deps="ffnvcodec" +xstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" hstack_vaapi_filter_deps="vaapi_1" vstack_vaapi_filter_deps="vaapi_1" xstack_vaapi_filter_deps="vaapi_1" diff --git a/doc/filters.texi b/doc/filters.texi index 6d2df07508..f616843880 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -26850,6 +26850,84 @@ Only deinterlace frames marked as interlaced. The default value is @code{all}. @end table +@section hstack_cuda +Stack input videos horizontally. + +This is the CUDA variant of the @ref{hstack} filter, each input stream may +have different height, this filter will scale down/up each input stream while +keeping the original aspect ratio. + +It accepts the following options: + +@table @option +@item inputs +See @ref{hstack}. + +@item shortest +See @ref{hstack}. + +@item height +Set height of output. If set to 0, this filter will set height of output to +height of the first input stream. Default value is 0. +@end table + +@section vstack_cuda +Stack input videos vertically. + +This is the CUDA variant of the @ref{vstack} filter, each input stream may +have different width, this filter will scale down/up each input stream while +keeping the original aspect ratio. + +It accepts the following options: + +@table @option +@item inputs +See @ref{vstack}. + +@item shortest +See @ref{vstack}. + +@item width +Set width of output. If set to 0, this filter will set width of output to +width of the first input stream. Default value is 0. +@end table + +@section xstack_cuda +Stack video inputs into custom layout. + +This is the CUDA variant of the @ref{xstack} filter, each input stream may +have different size, this filter will scale down/up each input stream to the +given output size, or the size of the first input stream. + +It accepts the following options: + +@table @option +@item inputs +See @ref{xstack}. + +@item shortest +See @ref{xstack}. + +@item layout +See @ref{xstack}. +Moreover, this permits the user to supply output
Re: [FFmpeg-devel] [PATCH] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)
Hi all, Friendly ping on the patch below (sent 23 May, link in Patchwork: https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250523215814.365246-1-f1k2fa...@gmail.com/ ). Patch summary - * Adds a CUDA implementation of the existing stack_* filter family (parallels stack_qsv / stack_vaapi). * Supports up to 16 inputs and both horizontal/vertical layouts. If anything needs adjustment (coding-style, fate naming, etc.) please let me know and I’ll resend an updated v2. Many thanks for your time! Best regards, Faeez Kadiri On Sat, May 24, 2025 at 3:28 AM Faeez Kadiri wrote: > Add hardware-accelerated stack filters for CUDA that provide equivalent > functionality to the software stack filters but with GPU acceleration. > > Features: > - Support for hstack, vstack, and xstack operations > - Compatible pixel formats such as: > yuv420p, nv12, yuv444p, p010le, p016le, yuv444p16le, rgb0, bgr0, rgba, > bgra > - Fill color support with automatic RGB to YUV conversion for YUV formats > - Proper chroma subsampling handling for all supported formats > - Integration with existing stack filter infrastructure via > stack_internal.h > > The implementation follows the established CUDA filter pattern from > vf_scale_cuda.c, using PTX modules for kernel execution and proper > CUDA context management. Copy operations handle frame placement while > color operations fill background areas when using fill colors. > > This enables efficient video composition workflows entirely on GPU > without CPU-GPU memory transfers, significantly improving performance > for multi-input video processing pipelines. > > Examples: > $ ffmpeg -hwaccel cuda -i input.h265 -filter_complex > "[0:v][0:v]hstack_cuda" -c:v hevc_nvenc out.h265 > > $ ffmpeg \ > -hwaccel cuda -i input1.mp4 \ > -hwaccel cuda -i input2.mp4 \ > -hwaccel cuda -i input3.mp4 \ > -hwaccel cuda -i input4.mp4 \ > -filter_complex > "[0:v]hwupload_cuda[0v];[1:v]hwupload_cuda[1v];[2:v]hwupload_cuda[2v];[3:v]hwupload_cuda[3v];[0v][1v][2v][3v]xstack_cuda=inputs=4:fill=black:layout=0_0|w0_0|0_h0|w0_h0" > \ > -c:v hevc_nvenc out.mp4 > > Signed-off-by: Faeez Kadiri > --- > Changelog| 1 + > configure| 6 + > doc/filters.texi | 78 + > libavfilter/Makefile | 3 + > libavfilter/allfilters.c | 3 + > libavfilter/vf_stack_cuda.c | 589 +++ > libavfilter/vf_stack_cuda.cu | 389 +++ > 7 files changed, 1069 insertions(+) > create mode 100644 libavfilter/vf_stack_cuda.c > create mode 100644 libavfilter/vf_stack_cuda.cu > > diff --git a/Changelog b/Changelog > index 4217449438..0dec3443d4 100644 > --- a/Changelog > +++ b/Changelog > @@ -18,6 +18,7 @@ version : > - APV encoding support through a libopenapv wrapper > - VVC decoder supports all content of SCC (Screen Content Coding): >IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform > +- hstack_cuda, vstack_cuda and xstack_cuda filters > > > version 7.1: > diff --git a/configure b/configure > index 3730b0524c..5c2d6e132d 100755 > --- a/configure > +++ b/configure > @@ -4033,6 +4033,12 @@ xfade_vulkan_filter_deps="vulkan spirv_compiler" > yadif_cuda_filter_deps="ffnvcodec" > yadif_cuda_filter_deps_any="cuda_nvcc cuda_llvm" > yadif_videotoolbox_filter_deps="metal corevideo videotoolbox" > +hstack_cuda_filter_deps="ffnvcodec" > +hstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" > +vstack_cuda_filter_deps="ffnvcodec" > +vstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" > +xstack_cuda_filter_deps="ffnvcodec" > +xstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" > hstack_vaapi_filter_deps="vaapi_1" > vstack_vaapi_filter_deps="vaapi_1" > xstack_vaapi_filter_deps="vaapi_1" > diff --git a/doc/filters.texi b/doc/filters.texi > index 6d2df07508..1c9afac9eb 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -26850,6 +26850,84 @@ Only deinterlace frames marked as interlaced. > The default value is @code{all}. > @end table > > +@section hstack_cuda > +Stack input videos horizontally. > + > +This is the CUDA variant of the @ref{vstack} filter, each input stream may > +have different width, this filter will scale down/up each input stream > while > +keeping the orignal aspect. > + > +It accepts the following options: > + > +@table @option > +@item inputs > +See @ref{hstack}. > + > +@item shortest > +See @ref{hstack}. > + > +@item height > +Set height of output. If set to 0, this filter will set height
Re: [FFmpeg-devel] [PATCH v2] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)
Hi all, Just a gentle reminder regarding my patch submission: *[PATCH v2] **avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)* https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250527092731.51819-1-f1k2fa...@gmail.com/ This version addresses the feedback from the initial submission (thanks Marvin for pointing out the typo, now corrected). I'd appreciate it if someone could take a look and share any further thoughts. Once this patch is accepted, I plan to begin work on a *pad_cuda* filter, reusing the existing CUDA kernels from *stack_cuda*. CCing Marvin for visibility, since he provided the earlier feedback. Thanks in advance! ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)
Add hardware-accelerated stack filters for CUDA that provide equivalent functionality to the software stack filters but with GPU acceleration. Features: - Support for hstack, vstack, and xstack operations - Compatible pixel formats such as: yuv420p, nv12, yuv444p, p010le, p016le, yuv444p16le, rgb0, bgr0, rgba, bgra - Fill color support with automatic RGB to YUV conversion for YUV formats - Proper chroma subsampling handling for all supported formats - Integration with existing stack filter infrastructure via stack_internal.h The implementation follows the established CUDA filter pattern from vf_scale_cuda.c, using PTX modules for kernel execution and proper CUDA context management. Copy operations handle frame placement while color operations fill background areas when using fill colors. This enables efficient video composition workflows entirely on GPU without CPU-GPU memory transfers, significantly improving performance for multi-input video processing pipelines. Examples: $ ffmpeg -hwaccel cuda -i input.h265 -filter_complex "[0:v][0:v]hstack_cuda" -c:v hevc_nvenc out.h265 $ ffmpeg \ -hwaccel cuda -i input1.mp4 \ -hwaccel cuda -i input2.mp4 \ -hwaccel cuda -i input3.mp4 \ -hwaccel cuda -i input4.mp4 \ -filter_complex "[0:v]hwupload_cuda[0v];[1:v]hwupload_cuda[1v];[2:v]hwupload_cuda[2v];[3:v]hwupload_cuda[3v];[0v][1v][2v][3v]xstack_cuda=inputs=4:fill=black:layout=0_0|w0_0|0_h0|w0_h0" \ -c:v hevc_nvenc out.mp4 Signed-off-by: Faeez Kadiri --- Changelog| 1 + configure| 6 + doc/filters.texi | 78 + libavfilter/Makefile | 3 + libavfilter/allfilters.c | 3 + libavfilter/vf_stack_cuda.c | 589 +++ libavfilter/vf_stack_cuda.cu | 389 +++ 7 files changed, 1069 insertions(+) create mode 100644 libavfilter/vf_stack_cuda.c create mode 100644 libavfilter/vf_stack_cuda.cu diff --git a/Changelog b/Changelog index 4217449438..0dec3443d4 100644 --- a/Changelog +++ b/Changelog @@ -18,6 +18,7 @@ version : - APV encoding support through a libopenapv wrapper - VVC decoder supports all content of SCC (Screen Content Coding): IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform +- hstack_cuda, vstack_cuda and xstack_cuda filters version 7.1: diff --git a/configure b/configure index 3730b0524c..5c2d6e132d 100755 --- a/configure +++ b/configure @@ -4033,6 +4033,12 @@ xfade_vulkan_filter_deps="vulkan spirv_compiler" yadif_cuda_filter_deps="ffnvcodec" yadif_cuda_filter_deps_any="cuda_nvcc cuda_llvm" yadif_videotoolbox_filter_deps="metal corevideo videotoolbox" +hstack_cuda_filter_deps="ffnvcodec" +hstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" +vstack_cuda_filter_deps="ffnvcodec" +vstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" +xstack_cuda_filter_deps="ffnvcodec" +xstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm" hstack_vaapi_filter_deps="vaapi_1" vstack_vaapi_filter_deps="vaapi_1" xstack_vaapi_filter_deps="vaapi_1" diff --git a/doc/filters.texi b/doc/filters.texi index 6d2df07508..1c9afac9eb 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -26850,6 +26850,84 @@ Only deinterlace frames marked as interlaced. The default value is @code{all}. @end table +@section hstack_cuda +Stack input videos horizontally. + +This is the CUDA variant of the @ref{vstack} filter, each input stream may +have different width, this filter will scale down/up each input stream while +keeping the orignal aspect. + +It accepts the following options: + +@table @option +@item inputs +See @ref{hstack}. + +@item shortest +See @ref{hstack}. + +@item height +Set height of output. If set to 0, this filter will set height of output to +height of the first input stream. Default value is 0. +@end table + +@section vstack_cuda +Stack input videos vertically. + +This is the CUDA variant of the @ref{vstack} filter, each input stream may +have different width, this filter will scale down/up each input stream while +keeping the orignal aspect. + +It accepts the following options: + +@table @option +@item inputs +See @ref{vstack}. + +@item shortest +See @ref{vstack}. + +@item width +Set width of output. If set to 0, this filter will set width of output to +width of the first input stream. Default value is 0. +@end table + +@section xstack_cuda +Stack video inputs into custom layout. + +This is the CUDA variant of the @ref{xstack} filter, each input stream may +have different size, this filter will scale down/up each input stream to the +given output size, or the size of the first input stream. + +It accepts the following options: + +@table @option +@item inputs +See @ref{xstack}. + +@item shortest +See @ref{xstack}. + +@item layout +See @ref{xstack}. +Moreover, this permits the user to supply output size for each