Signed-off-by: MaximilianKaindl <m.kaindl0...@gmail.com> --- doc/filters.texi | 64 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+)
diff --git a/doc/filters.texi b/doc/filters.texi index a7046e0f4e..340ce39e2a 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -30776,6 +30776,70 @@ bench=start,selectivecolor=reds=-.2 .12 -.49,bench=stop @end example @end itemize +@section avgclass + +Average classification probabilities across multiple frames for both audio and video streams. + +This filter analyzes classification data from frame side data (bounding boxes) and calculates average confidence scores for each label. The filter processes classification metadata from the @code{dnn_classify} filter or other sources that generate AVDetectionBBox side data, computing averages over the entire stream. + +At the end of the stream (or when manually triggered), the filter outputs the average probability for each detected class, both to console logs and optionally to a CSV file. + +@table @option +@item output_file +Path to a CSV output file where average classification results will be written. If not specified, results are only printed to log output. + +@item v +Specify the number of video streams (default: 1). + +@item a +Specify the number of audio streams (default: 0). +@end table + +This filter supports the following commands: + +@table @option +@item writeinfo +Immediately write current average classification results to the log and output file (if specified) without waiting for the stream to end. + +@item flush +Force the filter to write results and flush all its internal state. +@end table + +@subsection Examples + +Process a video with object detection and classification, then calculate average classification probabilities: +@example +ffmpeg -i input.mp4 -vf "dnn_detect=model=detection.xml:input=data:output=detection_out:confidence=0.5,dnn_classify=model=classification.pt:dnn_backend=torch:tokenizer=tokenizer.json:labels=labels.txt,avgclass=output_file=results.csv" -f null - +@end example + +Process both audio and video classification: +@example +ffmpeg -i input.mkv -filter_complex "[0:v]dnn_classify[v0]; [0:a]aformat=sample_fmts=fltp,dnn_classify=dnn_backend=torch:model=clap_model.pt:is_audio=1:tokenizer=tokenizer.json:labels=audio_labels.txt[a0]; [v0][a0]avgclass=v=1:a=1:output_file=av_results.csv" -f null - +@end example + +@subsection Output Format + +When the filter completes processing (or when the @code{writeinfo} command is sent), it outputs classification results in this format: + +@example +Classification averages: +Stream #0: + Label: cat: Average probability 0.8765, Appeared 120 times + Label: dog: Average probability 0.3421, Appeared 42 times +Stream #1: + Label: music: Average probability 0.9823, Appeared 315 times + Label: speech: Average probability 0.1245, Appeared 15 times +@end example + +If an output file is specified, the same data is written in CSV format: +@example +stream_id,label,avg_probability,count +0,cat,0.8765,120 +0,dog,0.3421,42 +1,music,0.9823,315 +1,speech,0.1245,15 +@end example + @section concat Concatenate audio and video streams, joining them together one after the -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".