[FFmpeg-devel] [PATCH] avfilter/dnn: add zero-shot image classification using CLIP models

2025-01-29 Thread m.kaindl0208
Add a new filter 'dnn_clip' that performs zero-shot image classification using CLIP (Contrastive Language-Image Pre-Training) models. The filter supports: - Loading and running CLIP models through the LibTorch backend - Outputting classification confidence scores as frame side data Requires token

Re: [FFmpeg-devel] [PATCH] avfilter/dnn: add zero-shot image classification using CLIP models

2025-02-18 Thread m.kaindl0208
The new backend is an extension of the existing Torch backend rather than a separate implementation. Inference in CLIP differs from other models as it encodes (embeds) both images and tokenized text labels, then calculates the similarity between the encoded vectors. As a result, its forward pas

[FFmpeg-devel] [PATCH FFmpeg 1/15] libavutil: add detectionbbox util functions

2025-03-08 Thread m.kaindl0208
Those functions will be used by classify in the upcoming patches. Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- libavutil/detection_bbox.c | 54 ++

[FFmpeg-devel] [PATCH FFmpeg 3/15] libavfilter: tokenizer implementation for batch tokenization using tokenizer-cpp library

2025-03-08 Thread m.kaindl0208
Implements batch tokenization support using the tokenizers-cpp library, providing functions to load tokenizers and encode text batches. This is crucial for CLIP/CLAP models that need to process text prompts. https://github.com/mlc-ai/tokenizers-cpp Try the new filters using my Github Repo htt

[FFmpeg-devel] [PATCH FFmpeg 4/15] libavfilter: dnn interface definitions for CLIP/CLAP Inference

2025-03-08 Thread m.kaindl0208
Defines new DNNFunctionType enums for CLIP and CLAP inference and adds new data structures like DNNExecZeroShotClassificationParams to support zero-shot classification models. Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedb

[FFmpeg-devel] [PATCH FFmpeg 2/15] libavfilter/dnn: move existing contain_valid_detection_bbox from openvino backend to dnn_backend_common

2025-03-08 Thread m.kaindl0208
Moves the contain_valid_detection_bbox function from the OpenVINO backend to the common backend code, making it available for all DNN backends to use when checking bounding box validity. Will be used by the Torch backend in an upcoming patch in this series. Try the new filters using my Github R

[FFmpeg-devel] [PATCH FFmpeg 5/15] libavfilter: filter common introduce interfaces for CLIP/CLAP Classification and model loading with tokenizer

2025-03-08 Thread m.kaindl0208
Extends the DNN filter common code to support CLIP/CLAP classification and model loading with tokenizers. Adds new execution functions for both image and audio classification. Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedba

[FFmpeg-devel] [PATCH FFmpeg 7/15] libavfilter: classify filter CLIP/CLAP implementation

2025-03-08 Thread m.kaindl0208
This major patch completely rewrites the dnn_classify filter, transforming it from a video-only filter into a versatile media classification system that supports: Standard image classification (via OpenVINO) CLIP-based image classification (via LibTorch) CLAP-based audio classification (via LibT

[FFmpeg-devel] [PATCH FFmpeg 10/15] libavfilter: add avgclass filter for average classification across multiple frames for both audio and video streams

2025-03-08 Thread m.kaindl0208
This patch introduces a new avgclass filter that aggregates classification results across multiple frames or audio segments. Key features: Works with both video and audio streams Collects and averages classification probabilities over time Exports results both to logs and optional CSV files Suppo

[FFmpeg-devel] [PATCH FFmpeg 12/15] doc: move classify Filter doc to Multimedia Filters chapter

2025-03-08 Thread m.kaindl0208
Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- doc/filters.texi | 170 +++ 1 file changed, 85 insertions(+), 85 deletions(-

[FFmpeg-devel] [PATCH FFmpeg 11/15] doc: avgclass Filter Documentation

2025-03-08 Thread m.kaindl0208
Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- doc/filters.texi | 64 1 file changed, 64 insertions(+) diff --git a/d

[FFmpeg-devel] [PATCH FFmpeg 13/15] libavfilter/dnn: more common clip input dimensions to test

2025-03-08 Thread m.kaindl0208
Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_torch.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/libavfilte

[FFmpeg-devel] [PATCH FFmpeg 15/15] configure: add tokenizers-cpp support

2025-03-08 Thread m.kaindl0208
Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- configure | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/configure b/configure index b2a19488

[FFmpeg-devel] [PATCH FFmpeg 14/15] configure: libtorch cuda check with new HAVE variable HAVE_LIBTORCH_CUDA

2025-03-08 Thread m.kaindl0208
Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- configure | 2 ++ 1 file changed, 2 insertions(+) diff --git a/configure b/configure index 2c08901adc..b2a1948855 10075

[FFmpeg-devel] [PATCH FFmpeg 6/15] libavfilter/dnn: torch backend CLIP/CLAP implementation

2025-03-08 Thread m.kaindl0208
This substantial patch implements the LibTorch backend support for CLIP (Contrastive Language-Image Pre-training) and CLAP (Contrastive Language-Audio Pre-training) models. Key features include: - Text tokenization and processing for language-based classification - Support for both image and aud

[FFmpeg-devel] [PATCH FFmpeg 8/15] libavfilter: add missing temperature application in apply_softmax function and set default temperature to 1. apply_softmax refactoring and improved error handling

2025-03-08 Thread m.kaindl0208
Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- libavfilter/avf_dnn_classify.c| 2 +- libavfilter/dnn/dnn_backend_torch.cpp | 66 --- 2

[FFmpeg-devel] [PATCH FFmpeg 9/15] doc: Filters.texi updated classify

2025-03-08 Thread m.kaindl0208
Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- doc/filters.texi | 106 +-- 1 file changed, 76 insertions(+), 30 deletions(-

[FFmpeg-devel] [PATCH v2 FFmpeg 5/20] libavfilter/dnn: libtorch add CUDA suppoort

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_torch.cpp | 37 +++ 1 file changed, 37 insertions(+) diff --git a/libavfilter/dnn/dnn_backend_torch.cpp b/libavfilter/dnn/dnn_backend_torch.cpp index 2e4326d9d4..062821949d 100644 --- a/libavfilter/dnn/dnn_b

[FFmpeg-devel] [PATCH v2 FFmpeg 1/20] configure: add tokenizers-cpp support

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- configure | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 04b83a8868..7219faeaf2 100755 --- a/configure +++ b/configure @@ -285,6 +285,7 @@ External library support: --enable-libtls enable LibreS

Re: [FFmpeg-devel] [PATCH FFmpeg 11/15] doc: avgclass Filter Documentation

2025-03-09 Thread m.kaindl0208
Hi Michael, You are right. The workflow is that any classification above the confidence value parameter (default 0.5) gets written to the Side data of the Frame, then read by the avgclass filter and averaged. Given the parameter was set to 0.01 or lower, if one frame detects a cat with 0.99 con

[FFmpeg-devel] [PATCH v2 FFmpeg 7/20] libavfilter/dnn_interface.h: define new DNNExecParams DNNExecZeroShotClassificationParams

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn_interface.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h index f4552d4287..4252cd2231 100644 --- a/libavfilter/dnn_interface.h +++ b/libavfilter/dnn_interface.h @@ -92,

[FFmpeg-devel] [PATCH v2 FFmpeg 17/20] libavfilter: turn dnn_classify to multimedia filter. Classify CLIP/CLAP implementation.

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/Makefile |2 +- libavfilter/allfilters.c |2 +- libavfilter/avf_dnn_classify.c | 1283 libavfilter/vf_dnn_classify.c | 308 4 files changed, 1285 insertions(+), 310 deletions(-) cr

[FFmpeg-devel] [PATCH v2 FFmpeg 13/20] libavfilter/dnn/dnn_backend_torch: Clxp model loading implementation

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_torch.cpp | 297 +- 1 file changed, 288 insertions(+), 9 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_torch.cpp b/libavfilter/dnn/dnn_backend_torch.cpp index ea09845e05..3a0ef931f9 100644 --- a/liba

[FFmpeg-devel] [PATCH v2 FFmpeg 4/20] configure: libtorch CUDA support

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- configure | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 7219faeaf2..ac074279cb 100755 --- a/configure +++ b/configure @@ -285,6 +285,7 @@ External library support: --enable-libtls enable LibreSS

[FFmpeg-devel] [PATCH v2 FFmpeg 12/20] libavfilter/dnn/dnn_backend_torch: Add ClxpContext to THModel

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_torch.cpp | 26 ++ 1 file changed, 26 insertions(+) diff --git a/libavfilter/dnn/dnn_backend_torch.cpp b/libavfilter/dnn/dnn_backend_torch.cpp index 062821949d..ea09845e05 100644 --- a/libavfilter/dnn/dnn_ba

[FFmpeg-devel] [PATCH v2 FFmpeg 16/20] libavfilter/dnn/dnn_backend_torch: CLIP/CLAP Inference handling and support for detection bboxes from dnn_detect filter

2025-03-10 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_torch.cpp | 411 +++--- 1 file changed, 311 insertions(+), 100 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_torch.cpp b/libavfilter/dnn/dnn_backend_torch.cpp index 1d2bfb191a..26b57f08f3 100644 --- a/li

[FFmpeg-devel] [PATCH v2 FFmpeg 0/20] Zero-Shot Classification Support for FFMPEG (CLIP and CLAP)

2025-03-11 Thread m.kaindl0208
Hi, I'm excited to propose a series of patches adding support for modern zero-shot classification models to FFmpeg. These patches enable FFmpeg to leverage CLIP (Contrastive Language-Image Pre-training) and CLAP (Contrastive Language-Audio Pre-training) models for media classification. Key Fea

[FFmpeg-devel] [PATCH v2 FFmpeg 18/20] doc/filters.texi: add classify documentation

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- doc/filters.texi | 124 --- 1 file changed, 85 insertions(+), 39 deletions(-) diff --git a/doc/filters.texi b/doc/filters.texi index 0ba7d3035f..a7046e0f4e 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -119

[FFmpeg-devel] [PATCH v2 FFmpeg 14/20] libavfilter/dnn/dnn_backend_torch: Similarity and Softmax calculation functions for CLIP/CLAP

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_torch.cpp | 76 +++ 1 file changed, 76 insertions(+) diff --git a/libavfilter/dnn/dnn_backend_torch.cpp b/libavfilter/dnn/dnn_backend_torch.cpp index 3a0ef931f9..12ba2674b3 100644 --- a/libavfilter/dnn/dnn_b

[FFmpeg-devel] [PATCH FFmpeg 1/20] configure: add tokenizers-cpp support

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- configure | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 04b83a8868..7219faeaf2 100755 --- a/configure +++ b/configure @@ -285,6 +285,7 @@ External library support: --enable-libtls enable LibreS

[FFmpeg-devel] [PATCH v2 FFmpeg 10/20] libavfilter/dnn_filter_common: add support for new loading function with ff_dnn_init_with_tokenizer

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn_filter_common.c | 33 - libavfilter/dnn_filter_common.h | 1 + 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/libavfilter/dnn_filter_common.c b/libavfilter/dnn_filter_common.c index c4ad000409..89

[FFmpeg-devel] [PATCH v2 FFmpeg 6/20] libavfilter/dnn_interface.h: define new Function Types CLIP and CLAP

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn_interface.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h index 66086409be..f4552d4287 100644 --- a/libavfilter/dnn_interface.h +++ b/libavfilter/dnn_interface.h @@ -58,6 +58,8 @@

[FFmpeg-devel] [PATCH v2 FFmpeg 2/20] libavfilter/dnn_filter_common: batch tokenizer implementation

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn_filter_common.c | 154 libavfilter/dnn_filter_common.h | 19 2 files changed, 173 insertions(+) diff --git a/libavfilter/dnn_filter_common.c b/libavfilter/dnn_filter_common.c index 6b9c6f8d7f..c4ad000409 1

[FFmpeg-devel] [PATCH v2 FFmpeg 9/20] libavfilter/dnn_interface.h: new model loading function load_model_with_tokenizer

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn_interface.h | 40 - 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h index f284665768..914bd76240 100644 --- a/libavfilter/dnn_interface

[FFmpeg-devel] [PATCH v2 FFmpeg 3/20] libavfilter/dnn: move contain_valid_detection_bbox to dnn_backend_common

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_common.c | 38 + libavfilter/dnn/dnn_backend_common.h | 8 ++ libavfilter/dnn/dnn_backend_openvino.c | 39 +- 3 files changed, 47 insertions(+), 38 deletions(-) diff --git a/l

[FFmpeg-devel] [PATCH v2 FFmpeg 11/20] libavfilter/dnn_filter_common: dnn execute functions for CLIP and CLAP

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn_filter_common.c | 81 +++-- libavfilter/dnn_filter_common.h | 34 +++--- 2 files changed, 76 insertions(+), 39 deletions(-) diff --git a/libavfilter/dnn_filter_common.c b/libavfilter/dnn_filter_common.c index

[FFmpeg-devel] [PATCH v2 FFmpeg 8/20] libavfilter/dnn_interface.h: define new fields in THOptions struct

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn_interface.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h index 4252cd2231..f284665768 100644 --- a/libavfilter/dnn_interface.h +++ b/libavfilter/dnn_interface.h @@ -14

[FFmpeg-devel] [PATCH v2 FFmpeg 19/20] libavfilter: New filter avgclass. Average Detection BBox Classifications over all incomming Frames for Audio and Video

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/Makefile | 1 + libavfilter/allfilters.c | 1 + libavfilter/avf_avgclass.c | 505 + 3 files changed, 507 insertions(+) create mode 100644 libavfilter/avf_avgclass.c diff --git a/libavfilter/Makefile

[FFmpeg-devel] [PATCH v2 FFmpeg 20/20] doc/filters.texi: avgclass documentation

2025-03-12 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- doc/filters.texi | 64 1 file changed, 64 insertions(+) diff --git a/doc/filters.texi b/doc/filters.texi index a7046e0f4e..340ce39e2a 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -30776,6 +30776,70 @@

[FFmpeg-devel] [PATCH v2 FFmpeg 21/20] libavfilter: classify fix category post_processing with very low temperature

2025-03-12 Thread m.kaindl0208
Patch attached. I hope this correctly links to my series. Signed-off-by: MaximilianKaindl 0021-libavfilter-classify-fix-category-post_processing-wi.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.or

[FFmpeg-devel] [PATCH v2 FFmpeg 15/20] libavfilter/dnn/dnn_backend_torch: Audio and Video preprocessing for CLIP/CLAP models

2025-03-11 Thread m.kaindl0208
Signed-off-by: MaximilianKaindl --- libavfilter/dnn/dnn_backend_torch.cpp | 128 ++ 1 file changed, 128 insertions(+) diff --git a/libavfilter/dnn/dnn_backend_torch.cpp b/libavfilter/dnn/dnn_backend_torch.cpp index 12ba2674b3..1d2bfb191a 100644 --- a/libavfilter/dnn/dnn_

[FFmpeg-devel] Integration of CLIP/ CLAP Functionallity

2025-03-25 Thread m.kaindl0208
Hi, I have been working on this feature and have completed my project already a while ago. I believe this could be a valuable addition to the FFmpeg project and would like to ask whether the community is interested in it.