[FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-19 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 107 + libavfilter/Makefile | 2

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-19 Thread Vittorio Palmisano
> Hi Vittorio > > On Thu, Jul 17, 2025 at 10:51:57AM +0200, Vittorio Palmisano wrote: > > It adds a new audio filter for running audio transcriptions with the > > whisper model. > > Documentation and examples are included into the patch. > > > > Signe

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-09 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 101 libavfilter/Makefile | 2

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 101 libavfilter/Makefile | 2

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-11 Thread Vittorio Palmisano
> > + > > +memcpy(wctx->audio_buffer, wctx->audio_buffer + end_pos, > > + end_pos * sizeof(float)); > > sizeof(*wctx->audio_buffer) is more robust than float But end_pos is not necessarily equal to the audio_buffer size, it could be lower. > > not sure how others think of this, but

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-13 Thread Vittorio Palmisano
Thanks, I've applied your suggestions. Signed-off-by: Vittorio Palmisano --- configure | 5 + doc/filters.texi | 106 + libavfilter/Makefile | 2 + libavfilter/af_whisper.c | 454 +++ libavfilter/allfilters.c | 2 + 5 files changed, 569 inser

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
> Leaving out parameter names is a C++ thing, its not allowed in C. > Ok, I've added some modifications and fixed the empty transcription output. -- /Vittorio Palmisano/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https:/

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
> While the filter provides great value, the accelerating pace of AI innovation > raises concerns > about its longevity. Given how rapidly newer models emerge, is there a risk > of this filter > becoming deprecated in the near term? I think that the design of the whisper.cpp library allows us to

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-10 Thread Vittorio Palmisano
gt;next_pts = AV_NOPTS_VALUE; > > + > > +wctx->avio_context = NULL; > > arent things already initialized to 0 ? Yes, maybe we can keep the AV_NOPTS_VALUE assignment (it is not zero). -- /Vittorio Palmisano/ ___ ffmpeg-devel ma

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-11 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 105 + libavfilter/Makefile | 2

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-14 Thread Vittorio Palmisano
Hi, I've added some changes to improve the VAD mechanism. You can find the changes here too: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/17/files Signed-off-by: Vittorio Palmisano --- configure | 5 + doc/filters.texi | 106 + libavfilter/Makefile | 2 + libavfilter/af_whisper.c

Re: [FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-15 Thread Vittorio Palmisano
> > +@item gpu_device > > +The GPU device to use. > > +Default value: @code{"0"} > > is this always a number ? > if so the documenattion could say that Yes, it is the device index. > > +@item destination > > +If set, the transcription output will be sent to the specified file or URL > > +(use one

[FFmpeg-devel] [PATCH] Whisper audio filter

2025-07-17 Thread Vittorio Palmisano
It adds a new audio filter for running audio transcriptions with the whisper model. Documentation and examples are included into the patch. Signed-off-by: Vittorio Palmisano --- configure| 5 + doc/filters.texi | 107 + libavfilter/Makefile | 2

Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-23 Thread Vittorio Palmisano
Hi, I've applied some changes and created a pull request: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20022 > > > +frames = FFMAX(0, FFMIN(frames, wctx->audio_buffer_fill_size)); > > I would call it samples, sample_count or nb_samples > > why are you cliping the number of samples ? > > I assum

Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-23 Thread Vittorio Palmisano
Update: the correct time base is stored inside inlink->time_base, not in frame->time_base On Wed, Jul 23, 2025 at 12:19 PM Vittorio Palmisano wrote: > > > > To understand why this is a problem, consider some audio input device > > > which samples at 16khz. This har

Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

2025-07-23 Thread Vittorio Palmisano
#x27;ve found that: frame->time_base=1/48000 frame->sample_rate=16000 Using `1000 * frame->pts * frame->time_base` returns wrong results. The only way to get the correct value seems `1000 * frame->pts / frame->sample_rate` -- /Vittorio Palmisano/ ___