On 2025-08-14 22:23, Rob Hallam wrote:

On Thu, 14 Aug 2025 at 22:15, Bernhard Döbler <program...@bardware.de> wrote:

yesterday, news made the round, that ffmpeg 8 is going to be released,
soon, and it will contain whisper, an AI software that can understand
spoken text and create subtitles.

Their github page https://github.com/ggml-org/whisper.cpp says they
offer a handful of models.

Model   Disk    Mem
tiny    75 MiB  ~273 MB
base    142 MiB         ~388 MB
small   466 MiB         ~852 MB
medium  1.5 GiB         ~2.1 GB
large   2.9 GiB         ~3.9 GB

There is a commit [1] adding Whisper support [2]. As the docs note you
will need to provide a model.

How does this work? Will all of this be compiled into the ffmpeg binary?

--enable-whisper config option is added (default: no) [3] so up to
whoever compiles your binary and you provide the model.

[1]: 
https://github.com/FFmpeg/FFmpeg/commit/13ce36fef98a3f4e6d8360c24d6b8434cbb8869b
[2]: https://ffmpeg.org/ffmpeg-filters.html#whisper-1
[3]: 
https://github.com/FFmpeg/FFmpeg/blob/47c6af7d299c96b2e65f5f10526e0f34e00b23c8/configure#L339

Enlarging the question somewhat, is there existing AI that could be used to process existing recordings that contain both speech and music, and highlight or extract the areas, say by creating cut points, that contain music?

Does anyone here know if this is possible?

_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to