Re: [FFmpeg-devel] [PATCH] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)

2025-05-27 Thread faeez kadiri
Thanks for pointing that out.

Submitted v2
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250527092731.51819-1-f1k2fa...@gmail.com/

Changes in v2:
- Fixed documentation copy-paste error in hstack_cuda section
- Corrected filter reference from @ref{vstack} to @ref{hstack}
- Fixed typo "orignal" -> "original"
- Improved description accuracy for horizontal stacking
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)

2025-05-27 Thread Faeez Kadiri
Add hardware-accelerated stack filters for CUDA that provide equivalent
functionality to the software stack filters but with GPU acceleration.

Features:
- Support for hstack, vstack, and xstack operations
- Compatible pixel formats such as:
  yuv420p, nv12, yuv444p, p010le, p016le, yuv444p16le, rgb0, bgr0, rgba, bgra
- Fill color support with automatic RGB to YUV conversion for YUV formats
- Proper chroma subsampling handling for all supported formats
- Integration with existing stack filter infrastructure via stack_internal.h

The implementation follows the established CUDA filter pattern from
vf_scale_cuda.c, using PTX modules for kernel execution and proper
CUDA context management. Copy operations handle frame placement while
color operations fill background areas when using fill colors.

This enables efficient video composition workflows entirely on GPU
without CPU-GPU memory transfers, significantly improving performance
for multi-input video processing pipelines.

Examples:
$ ffmpeg -hwaccel cuda -i input.h265 -filter_complex "[0:v][0:v]hstack_cuda" 
-c:v hevc_nvenc out.h265

$ ffmpeg \
  -hwaccel cuda -i input1.mp4 \
  -hwaccel cuda -i input2.mp4 \
  -hwaccel cuda -i input3.mp4 \
  -hwaccel cuda -i input4.mp4 \
  -filter_complex 
"[0:v]hwupload_cuda[0v];[1:v]hwupload_cuda[1v];[2:v]hwupload_cuda[2v];[3:v]hwupload_cuda[3v];[0v][1v][2v][3v]xstack_cuda=inputs=4:fill=black:layout=0_0|w0_0|0_h0|w0_h0"
 \
  -c:v hevc_nvenc out.mp4

Signed-off-by: Faeez Kadiri 
---
 Changelog|   1 +
 configure|   6 +
 doc/filters.texi |  78 +
 libavfilter/Makefile |   3 +
 libavfilter/allfilters.c |   3 +
 libavfilter/vf_stack_cuda.c  | 589 +++
 libavfilter/vf_stack_cuda.cu | 389 +++
 7 files changed, 1069 insertions(+)
 create mode 100644 libavfilter/vf_stack_cuda.c
 create mode 100644 libavfilter/vf_stack_cuda.cu

diff --git a/Changelog b/Changelog
index 4217449438..0dec3443d4 100644
--- a/Changelog
+++ b/Changelog
@@ -18,6 +18,7 @@ version :
 - APV encoding support through a libopenapv wrapper
 - VVC decoder supports all content of SCC (Screen Content Coding):
   IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform
+- hstack_cuda, vstack_cuda and xstack_cuda filters
 
 
 version 7.1:
diff --git a/configure b/configure
index 3730b0524c..5c2d6e132d 100755
--- a/configure
+++ b/configure
@@ -4033,6 +4033,12 @@ xfade_vulkan_filter_deps="vulkan spirv_compiler"
 yadif_cuda_filter_deps="ffnvcodec"
 yadif_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 yadif_videotoolbox_filter_deps="metal corevideo videotoolbox"
+hstack_cuda_filter_deps="ffnvcodec"
+hstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
+vstack_cuda_filter_deps="ffnvcodec"
+vstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
+xstack_cuda_filter_deps="ffnvcodec"
+xstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 hstack_vaapi_filter_deps="vaapi_1"
 vstack_vaapi_filter_deps="vaapi_1"
 xstack_vaapi_filter_deps="vaapi_1"
diff --git a/doc/filters.texi b/doc/filters.texi
index 6d2df07508..f616843880 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -26850,6 +26850,84 @@ Only deinterlace frames marked as interlaced.
 The default value is @code{all}.
 @end table
 
+@section hstack_cuda
+Stack input videos horizontally.
+
+This is the CUDA variant of the @ref{hstack} filter, each input stream may
+have different height, this filter will scale down/up each input stream while
+keeping the original aspect ratio.
+
+It accepts the following options:
+
+@table @option
+@item inputs
+See @ref{hstack}.
+
+@item shortest
+See @ref{hstack}.
+
+@item height
+Set height of output. If set to 0, this filter will set height of output to
+height of the first input stream. Default value is 0.
+@end table
+
+@section vstack_cuda
+Stack input videos vertically.
+
+This is the CUDA variant of the @ref{vstack} filter, each input stream may
+have different width, this filter will scale down/up each input stream while
+keeping the original aspect ratio.
+
+It accepts the following options:
+
+@table @option
+@item inputs
+See @ref{vstack}.
+
+@item shortest
+See @ref{vstack}.
+
+@item width
+Set width of output. If set to 0, this filter will set width of output to
+width of the first input stream. Default value is 0.
+@end table
+
+@section xstack_cuda
+Stack video inputs into custom layout.
+
+This is the CUDA variant of the @ref{xstack} filter,  each input stream may
+have different size, this filter will scale down/up each input stream to the
+given output size, or the size of the first input stream.
+
+It accepts the following options:
+
+@table @option
+@item inputs
+See @ref{xstack}.
+
+@item shortest
+See @ref{xstack}.
+
+@item layout
+See @ref{xstack}.
+Moreover, this permits the user to supply output

Re: [FFmpeg-devel] [PATCH] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)

2025-05-26 Thread faeez kadiri
Hi all,

Friendly ping on the patch below (sent 23 May, link in Patchwork:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250523215814.365246-1-f1k2fa...@gmail.com/
).

Patch summary
-
* Adds a CUDA implementation of the existing stack_* filter family
(parallels stack_qsv / stack_vaapi).
* Supports up to 16 inputs and both horizontal/vertical layouts.

If anything needs adjustment (coding-style, fate naming, etc.) please let
me know and I’ll resend an updated v2.

Many thanks for your time!

Best regards,
Faeez Kadiri

On Sat, May 24, 2025 at 3:28 AM Faeez Kadiri  wrote:

> Add hardware-accelerated stack filters for CUDA that provide equivalent
> functionality to the software stack filters but with GPU acceleration.
>
> Features:
> - Support for hstack, vstack, and xstack operations
> - Compatible pixel formats such as:
>   yuv420p, nv12, yuv444p, p010le, p016le, yuv444p16le, rgb0, bgr0, rgba,
> bgra
> - Fill color support with automatic RGB to YUV conversion for YUV formats
> - Proper chroma subsampling handling for all supported formats
> - Integration with existing stack filter infrastructure via
> stack_internal.h
>
> The implementation follows the established CUDA filter pattern from
> vf_scale_cuda.c, using PTX modules for kernel execution and proper
> CUDA context management. Copy operations handle frame placement while
> color operations fill background areas when using fill colors.
>
> This enables efficient video composition workflows entirely on GPU
> without CPU-GPU memory transfers, significantly improving performance
> for multi-input video processing pipelines.
>
> Examples:
> $ ffmpeg -hwaccel cuda -i input.h265 -filter_complex
> "[0:v][0:v]hstack_cuda" -c:v hevc_nvenc out.h265
>
> $ ffmpeg \
>   -hwaccel cuda -i input1.mp4 \
>   -hwaccel cuda -i input2.mp4 \
>   -hwaccel cuda -i input3.mp4 \
>   -hwaccel cuda -i input4.mp4 \
>   -filter_complex
> "[0:v]hwupload_cuda[0v];[1:v]hwupload_cuda[1v];[2:v]hwupload_cuda[2v];[3:v]hwupload_cuda[3v];[0v][1v][2v][3v]xstack_cuda=inputs=4:fill=black:layout=0_0|w0_0|0_h0|w0_h0"
> \
>   -c:v hevc_nvenc out.mp4
>
> Signed-off-by: Faeez Kadiri 
> ---
>  Changelog|   1 +
>  configure|   6 +
>  doc/filters.texi |  78 +
>  libavfilter/Makefile |   3 +
>  libavfilter/allfilters.c |   3 +
>  libavfilter/vf_stack_cuda.c  | 589 +++
>  libavfilter/vf_stack_cuda.cu | 389 +++
>  7 files changed, 1069 insertions(+)
>  create mode 100644 libavfilter/vf_stack_cuda.c
>  create mode 100644 libavfilter/vf_stack_cuda.cu
>
> diff --git a/Changelog b/Changelog
> index 4217449438..0dec3443d4 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -18,6 +18,7 @@ version :
>  - APV encoding support through a libopenapv wrapper
>  - VVC decoder supports all content of SCC (Screen Content Coding):
>IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform
> +- hstack_cuda, vstack_cuda and xstack_cuda filters
>
>
>  version 7.1:
> diff --git a/configure b/configure
> index 3730b0524c..5c2d6e132d 100755
> --- a/configure
> +++ b/configure
> @@ -4033,6 +4033,12 @@ xfade_vulkan_filter_deps="vulkan spirv_compiler"
>  yadif_cuda_filter_deps="ffnvcodec"
>  yadif_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
>  yadif_videotoolbox_filter_deps="metal corevideo videotoolbox"
> +hstack_cuda_filter_deps="ffnvcodec"
> +hstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
> +vstack_cuda_filter_deps="ffnvcodec"
> +vstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
> +xstack_cuda_filter_deps="ffnvcodec"
> +xstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
>  hstack_vaapi_filter_deps="vaapi_1"
>  vstack_vaapi_filter_deps="vaapi_1"
>  xstack_vaapi_filter_deps="vaapi_1"
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 6d2df07508..1c9afac9eb 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -26850,6 +26850,84 @@ Only deinterlace frames marked as interlaced.
>  The default value is @code{all}.
>  @end table
>
> +@section hstack_cuda
> +Stack input videos horizontally.
> +
> +This is the CUDA variant of the @ref{vstack} filter, each input stream may
> +have different width, this filter will scale down/up each input stream
> while
> +keeping the orignal aspect.
> +
> +It accepts the following options:
> +
> +@table @option
> +@item inputs
> +See @ref{hstack}.
> +
> +@item shortest
> +See @ref{hstack}.
> +
> +@item height
> +Set height of output. If set to 0, this filter will set height

Re: [FFmpeg-devel] [PATCH v2] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)

2025-05-28 Thread faeez kadiri
Hi all,

Just a gentle reminder regarding my patch submission:

*[PATCH v2] **avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda,
xstack_cuda)*
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20250527092731.51819-1-f1k2fa...@gmail.com/

This version addresses the feedback from the initial submission (thanks
Marvin for pointing out the typo, now corrected). I'd appreciate it if
someone could take a look and share any further thoughts.

Once this patch is accepted, I plan to begin work on a *pad_cuda* filter,
reusing the existing CUDA kernels from *stack_cuda*.

CCing Marvin for visibility, since he provided the earlier feedback.

Thanks in advance!
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] avfilter: add CUDA stack filters (hstack_cuda, vstack_cuda, xstack_cuda)

2025-05-23 Thread Faeez Kadiri
Add hardware-accelerated stack filters for CUDA that provide equivalent
functionality to the software stack filters but with GPU acceleration.

Features:
- Support for hstack, vstack, and xstack operations
- Compatible pixel formats such as:
  yuv420p, nv12, yuv444p, p010le, p016le, yuv444p16le, rgb0, bgr0, rgba, bgra
- Fill color support with automatic RGB to YUV conversion for YUV formats
- Proper chroma subsampling handling for all supported formats
- Integration with existing stack filter infrastructure via stack_internal.h

The implementation follows the established CUDA filter pattern from
vf_scale_cuda.c, using PTX modules for kernel execution and proper
CUDA context management. Copy operations handle frame placement while
color operations fill background areas when using fill colors.

This enables efficient video composition workflows entirely on GPU
without CPU-GPU memory transfers, significantly improving performance
for multi-input video processing pipelines.

Examples:
$ ffmpeg -hwaccel cuda -i input.h265 -filter_complex "[0:v][0:v]hstack_cuda" 
-c:v hevc_nvenc out.h265

$ ffmpeg \
  -hwaccel cuda -i input1.mp4 \
  -hwaccel cuda -i input2.mp4 \
  -hwaccel cuda -i input3.mp4 \
  -hwaccel cuda -i input4.mp4 \
  -filter_complex 
"[0:v]hwupload_cuda[0v];[1:v]hwupload_cuda[1v];[2:v]hwupload_cuda[2v];[3:v]hwupload_cuda[3v];[0v][1v][2v][3v]xstack_cuda=inputs=4:fill=black:layout=0_0|w0_0|0_h0|w0_h0"
 \
  -c:v hevc_nvenc out.mp4

Signed-off-by: Faeez Kadiri 
---
 Changelog|   1 +
 configure|   6 +
 doc/filters.texi |  78 +
 libavfilter/Makefile |   3 +
 libavfilter/allfilters.c |   3 +
 libavfilter/vf_stack_cuda.c  | 589 +++
 libavfilter/vf_stack_cuda.cu | 389 +++
 7 files changed, 1069 insertions(+)
 create mode 100644 libavfilter/vf_stack_cuda.c
 create mode 100644 libavfilter/vf_stack_cuda.cu

diff --git a/Changelog b/Changelog
index 4217449438..0dec3443d4 100644
--- a/Changelog
+++ b/Changelog
@@ -18,6 +18,7 @@ version :
 - APV encoding support through a libopenapv wrapper
 - VVC decoder supports all content of SCC (Screen Content Coding):
   IBC (Inter Block Copy), Palette Mode and ACT (Adaptive Color Transform
+- hstack_cuda, vstack_cuda and xstack_cuda filters
 
 
 version 7.1:
diff --git a/configure b/configure
index 3730b0524c..5c2d6e132d 100755
--- a/configure
+++ b/configure
@@ -4033,6 +4033,12 @@ xfade_vulkan_filter_deps="vulkan spirv_compiler"
 yadif_cuda_filter_deps="ffnvcodec"
 yadif_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 yadif_videotoolbox_filter_deps="metal corevideo videotoolbox"
+hstack_cuda_filter_deps="ffnvcodec"
+hstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
+vstack_cuda_filter_deps="ffnvcodec"
+vstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
+xstack_cuda_filter_deps="ffnvcodec"
+xstack_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 hstack_vaapi_filter_deps="vaapi_1"
 vstack_vaapi_filter_deps="vaapi_1"
 xstack_vaapi_filter_deps="vaapi_1"
diff --git a/doc/filters.texi b/doc/filters.texi
index 6d2df07508..1c9afac9eb 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -26850,6 +26850,84 @@ Only deinterlace frames marked as interlaced.
 The default value is @code{all}.
 @end table
 
+@section hstack_cuda
+Stack input videos horizontally.
+
+This is the CUDA variant of the @ref{vstack} filter, each input stream may
+have different width, this filter will scale down/up each input stream while
+keeping the orignal aspect.
+
+It accepts the following options:
+
+@table @option
+@item inputs
+See @ref{hstack}.
+
+@item shortest
+See @ref{hstack}.
+
+@item height
+Set height of output. If set to 0, this filter will set height of output to
+height of the first input stream. Default value is 0.
+@end table
+
+@section vstack_cuda
+Stack input videos vertically.
+
+This is the CUDA variant of the @ref{vstack} filter, each input stream may
+have different width, this filter will scale down/up each input stream while
+keeping the orignal aspect.
+
+It accepts the following options:
+
+@table @option
+@item inputs
+See @ref{vstack}.
+
+@item shortest
+See @ref{vstack}.
+
+@item width
+Set width of output. If set to 0, this filter will set width of output to
+width of the first input stream. Default value is 0.
+@end table
+
+@section xstack_cuda
+Stack video inputs into custom layout.
+
+This is the CUDA variant of the @ref{xstack} filter,  each input stream may
+have different size, this filter will scale down/up each input stream to the
+given output size, or the size of the first input stream.
+
+It accepts the following options:
+
+@table @option
+@item inputs
+See @ref{xstack}.
+
+@item shortest
+See @ref{xstack}.
+
+@item layout
+See @ref{xstack}.
+Moreover, this permits the user to supply output size for each