[FFmpeg-devel] [PATCH] avfilter/af_whisper: fix srt index (PR #20567)

2025-09-21 Thread Vittorio Palmisano via ffmpeg-devel
PR #20567 opened by Vittorio Palmisano (vpalmisano) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20567 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20567.patch avfilter/af_whisper: fix srt index The srt index should be incremented for each segment. >F

[FFmpeg-devel] [PATCH] avfilter/af_whisper: fix srt index and int64 printf format (PR #20566)

2025-09-21 Thread Vittorio Palmisano via ffmpeg-devel
PR #20566 opened by Vittorio Palmisano (vpalmisano) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20566 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20566.patch avfilter/af_whisper: fix srt index and int64 printf format - Use PRId64 for printing int64_t values in the SRT output

[FFmpeg-devel] [PATCH] avfilter/af_whisper: fix srt file format (PR #20368)

2025-08-29 Thread Vittorio Palmisano via ffmpeg-devel
PR #20368 opened by Vittorio Palmisano (vpalmisano) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20368 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20368.patch SRT file format requires a comma in the time string. >From 103c02ef96f360b862a0477a794d54af22baa41f Mon Sep 17 00:00

[FFmpeg-devel] Re: [PATCH] libavfilter: Whisper audio filter

2025-08-29 Thread Vittorio Palmisano via ffmpeg-devel
Hi Wang, thank you for your comments! > 1) Instead of 00:00:00.000 --> 00:00:02.440 >srt files usually use comma: >00:00:00,000 --> 00:00:02,440 Ok, I will post a fix for that. > 2) There usually is a leading empty line at the beginning, i.e., > >$ cat output.srt #should be > >1

[FFmpeg-devel] [PATCH] libavfilter: add af_whisper codeowner (PR #20189)

2025-08-08 Thread Vittorio Palmisano
PR #20189 opened by Vittorio Palmisano (vpalmisano) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20189 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20189.patch >From cc3235c4a04bc119a429e993aff31e8aaab3cfb6 Mon Sep 17 00:00:00 2001 From: Vittorio Palmisano Date: Fri, 8 Aug 2025

Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-23 Thread Vittorio Palmisano
Update: the correct time base is stored inside inlink->time_base, not in frame->time_base On Wed, Jul 23, 2025 at 12:19 PM Vittorio Palmisano wrote: > > > > To understand why this is a problem, consider some audio input device > > > which samples at 16khz. This har

Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-23 Thread Vittorio Palmisano
#x27;ve found that: frame->time_base=1/48000 frame->sample_rate=16000 Using `1000 * frame->pts * frame->time_base` returns wrong results. The only way to get the correct value seems `1000 * frame->pts / frame->sample_rate` -- /Vittorio Palmisano/ ___

Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-23 Thread Vittorio Palmisano
Hi, I've applied some changes and created a pull request: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20022 > > > +frames = FFMAX(0, FFMIN(frames, wctx->audio_buffer_fill_size)); > > I would call it samples, sample_count or nb_samples > > why are you cliping the number of samples ? > > I assum

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-19 Thread Vittorio Palmisano
> Hi Vittorio > > On Thu, Jul 17, 2025 at 10:51:57AM +0200, Vittorio Palmisano wrote: > > It adds a new audio filter for running audio transcriptions with the > > whisper model. > > Documentation and examples are included into the patch. > > > > Signe

[FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-19 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 107 + libavfilter/Makefile | 2

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-17 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 107 + libavfilter/Makefile | 2

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-15 Thread Vittorio Palmisano
> > +@item gpu_device > > +The GPU device to use. > > +Default value: @code{"0"} > > is this always a number ? > if so the documenattion could say that Yes, it is the device index. > > +@item destination > > +If set, the transcription output will be sent to the specified file or URL > > +(use one

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-14 Thread Vittorio Palmisano
Hi, I've added some changes to improve the VAD mechanism. You can find the changes here too: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/17/files Signed-off-by: Vittorio Palmisano --- configure | 5 + doc/filters.texi | 106 + libavfilter/Makefile | 2 + libavfilter/af_whisper.c

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-13 Thread Vittorio Palmisano
Thanks, I've applied your suggestions. Signed-off-by: Vittorio Palmisano --- configure | 5 + doc/filters.texi | 106 + libavfilter/Makefile | 2 + libavfilter/af_whisper.c | 454 +++ libavfilter/allfilters.c | 2 + 5 files changed, 569 inser

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-11 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 105 + libavfilter/Makefile | 2

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-11 Thread Vittorio Palmisano
> > + > > +memcpy(wctx->audio_buffer, wctx->audio_buffer + end_pos, > > + end_pos * sizeof(float)); > > sizeof(*wctx->audio_buffer) is more robust than float But end_pos is not necessarily equal to the audio_buffer size, it could be lower. > > not sure how others think of this, but

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 101 libavfilter/Makefile | 2

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
> Leaving out parameter names is a C++ thing, its not allowed in C. > Ok, I've added some modifications and fixed the empty transcription output. -- /Vittorio Palmisano/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https:/

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
> While the filter provides great value, the accelerating pace of AI innovation > raises concerns > about its longevity. Given how rapidly newer models emerge, is there a risk > of this filter > becoming deprecated in the near term? I think that the design of the whisper.cpp library allows us to

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
gt;next_pts = AV_NOPTS_VALUE; > > + > > +wctx->avio_context = NULL; > > arent things already initialized to 0 ? Yes, maybe we can keep the AV_NOPTS_VALUE assignment (it is not zero). -- /Vittorio Palmisano/ ___ ffmpeg-devel ma

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-09 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 101 libavfilter/Makefile | 2