[FFmpeg-devel] I've written a filter in Rust

Leandro Santiago Thu, 20 Feb 2025 05:07:17 -0800

[insert meme here]

(this will be a long e-mail)

Dear FFmpeg devs,

in the past days I've been experimenting hacking FFmpeg using Rust.

As I am becoming more familiar with the libavfilter, and it is not a dependency
for any other of the libav* libs, I decided this is a good candidate.

It's also convenient as I use FFmpeg libs heavily in a commercial product, and
one of the features I've been working on involves a basic multi object tracking.

In my case, it does not need to be a "perfect" tracking algorithm, as I need to
compromise quality of the result in exchange of performance executing in the
CPU only, so most of the algorithms out there that need a GPU are out of my
range.

I decided then use as first experiment a filter called `track_sort` that
implements the 2016 paper SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP
ASSOCIATION METRIC, as known as SORT [1].

The filter already works well based on the `master` branch, but the code itself
is in very early stages and far from being "production ready", so please do not
read the code assuming it's in its final form. It's ugly and needs lots of
refactoring.

I've created a PR on forgejo [4] to make it easier for others to track
progress, although I use gitlab.com as my main forge.

Here is a description of the filter:

- It perform only object tracking, needing the object detection to be performed
elsewhere. It feeds from the detection boxes generated by `dnn_detect`. That
means that the quality of the the tracking is closely related to the quality of
the detection.

- SORT is a simple algorithm that uses spatial data only, and it not able to
handle cases such as object occlusion. It's good enough for my use case, as I
mentioned earlier.

- The filter works with the default options, so you can pass it without any
arguments. In this mode, it will try to track any objects from the boxes
available. You can change this behaviour by specifying the list of labels to
track, for example: `track_sort=labels=person|dog|cat`. Such labels come from
the ML model you used in the detection filter. It also has the options
`threshold`, `min_hits` and `max_age`, which control how the tracking algorithm
works, but the default values should work well on most cases.

- The filter will add the tracking information as label on a new frame side
data entry of type `AV_FRAME_DATA_DETECTION_BBOXES`. It **WILL NOT** override
the side data from `dnn_detect`,, meaning that the frame will have side data
two entries of this type. I've created a PR that make it possible to fetch such
entry [2].

- The labels in the detection boxes have the format
"track:<track_num>:<track_age>", and this is not the final format. I did this
way as a quick hack to have some visual information when drawing the boxes and
labels with the `drawtext` and `drawbox` filters. I believe this can be
improved by putting the tracking information as metadata of the
`AVDetectionBBox`es, but this would on API and ABI breaking, so this is still
an open question.

What has not been done so far:

I had quite a few goals in this task:

- 1: get a working and efficient implementation of the SORT algorithm.
- 2: start learning Rust again (it's been ~5 years since I used it)
- 3: learn more about the libavfilter codebase
- 4: evaluate whether Rust could work as a second language for hacking FFMpeg.

Results:

- 1: I managed to reuse lots of high quality code, available on crates (the
repository of Rust packages), preventing me of needing to write hairy math
heavy code. I personally suck in maths, especially linear algebra. Using the
paper and the reference implementation [3] was enough, although I do not
understand all the math magic. For instance, I reused an existing crate for
Kalman filters that I probably would need to implement by hand, as the
alternative in C would probably be using the implementation that OpenCV offers.
And I am aware that it's not practical to make OpenCV a dependency of FFmpeg.

- 2: yay! Back to Rust!

- 3: I've learned more not only about avfilter, but a bit about other
components as well.

- 4: I have more notes on that later, but it feels for me that Rust is natural
candidate for new code in large C codebases, as it integrates quite tell, with
some warts. I also have no idea whether the FFmpeg community has discussed
about Rust in the codebase in the past and, if, not, why not now?

Some notes on using Rust:

In general I enjoyed using Rust in the project, and if you have a look at the
code, you'll notice that I am not reusing any of the nice C macros that make a
lot of stuff easier on writing new filters. That means that the Rust code looks
like the expanded macro versions from C. And that's a lot of boilerplate and
ugly code.

There were some reasons for that: One is that I am still learning Rust macros,
and wanted to focus on getting stuff done for now. Second is that Rust has a
much more powerful macro system than C does, and avoiding macros now allow me
to feel all the pain of writing the manual code. Such pain, I believe, can help
a set of Rust macros to "emerge" from the codebase, rather than one designing a
set of macros that will probably look like the C ones, which might not be
"rusty" enough. And I don't find a good practise to design APIs before having
some implementation (looking at you, C++ committee).

I've been developing on Manjaro Linux and for now building FFmpeg statically
with `--disable-stripping --enable-debug=3 --disable-optimizations` and the
Rust code in `Debug` mode. That means slow code and static builds, which are
easy to debug a profile.

Debugging is easy, as I can simply use GDB and it simply works with the Rust
and C code mixed. I stil don't have pretty-printer for the Rust part, but this
is probably an issue on my setup.

Profiling also works well. Even though the Rust code is in Debug mode,
profiling with Hotspot/Perf shows that the tracking code is very efficient (you
almost cannot see it in the flamegraph!).

Memory management is a breeze, as the standard library has generic versions of
many useful containers, such as Vectors and BTrees. The algorithms there also
make transforming and filtering very convenient and type safe.

You get support for unit tests for free. No hassle, no complex setup. Simply
write unit tests anywhere and run them with `cargo test`.

It feels very good to get the code to work and not being afraid of things going
badly (in the code which is not unsafe, of course!).

WARTS

I did not implement any wrapper on top of the avfilter private API (yay
`bindgen`!), so it's used directly on the Rust code. It forces you to write the
code as `unsafe` on any interaction with libav* API. Nevertheless, even on
unsafe code, working on non owned data is very convenient, as you can turn
almost anything into slices, which provide you with lots of convenient
algorithms (map, filter, zip, etc.).

Working with C pointers is a very painful and ugly. Especially `**` and `***`.
Rust is very verbose on using them in the rust side (they become things like
`&*mut *mut *mut`, not really easy to reason about). Rust also does not have
the `->` operator, forcing you do do stuff like ``(*foo).bar`, which is simply
ugly.

Interacting with the C API is also not trivial, as in Rust one must be explicit
about ownership and lifetimes, something which is done implicitely (and often
wrongly) in C.

Struct members in Rust must always be explicitely initialized, even for global
static variables, which C initializes with zero implicitely.

C unions. Luckily Rust supports them, but they are always unsafe.

`bindgen` does not generate wrappers for `static av_always_inline blah()`
functions, as those are... inlined, so when in the need of using those, I had
to simply reimplement them in Rust.

In general my impression is that Rust code is more verbose than C in
"dangerous" code, but way less verbose in safe code, due to the compiler checks.

WHY? WHY? WHY?????

Ok, why do I, who never really took part on the FFmpeg community come
apparently now throwing Rust on your faces? Am I saying you folks should
rewrite ffmpeg in rust? I know that especially the Rust community have been
involved recently in a lot of conflicts involving large C codebases, and it's
not my intention to tell you what or not to do. I recognize having no authority
in this group for that and I am essentially just a FFmpeg user.

My intention, first of all, was to get some stuff I needed done. I'm working on
a commercial product, and developing in Rust was the quickest way I could get
it done (considering my requirements). I've enjoyed a lot working in this
project, and I believe my learnings can be useful for the FFmpeg community as a
whole.

Demo time

Requirements: Cargo/Rust installed. I am using `1.84.0`, the latest stable, via
`rustup`.

You'll need openvino, harfbuzz and freetype installed.

First of all, check out the code from the PR at [4] and compile FFmpeg with:

```sh
./configure ./configure --disable-stripping --enable-debug=3
--disable-optimizations --enable-libopenvino --enable-libharfbuzz
--enable-libfreetype --enable-openssl
cargo build && make
```

I added a `--enable-rust` flag to the PR, but at the moment it does nothing :-)

Next you should download a pre-trained YOLO4 model and associated files, for
perform the object detections:

```sh
pip install openvino-dev tensorflow
omz_downloader --name yolo-v4-tiny-tf
omz_converter --name yolo-v4-tiny-tf
wget
https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/refs/heads/master/data/dataset_classes/coco_80cl.txt
```

Here we'll use a video from MOT Challenge 2016, [5] which is the one shown in
the original SORT paper. You can use it with the command:

```sh
./ffplay https://motchallenge.net/sequenceVideos/MOT16-06-raw.webm -vf
'dnn_detect=dnn_backend=openvino:model=public/yolo-v4-tiny-tf/FP16/yolo-v4-tiny-tf.xml:input=image_input:confidence=0.1:model_type=yolov4:anchors=81&82&135&169&344&319:labels=coco_80cl.txt:async=0:nb_classes=80,track_sort=labels=person,drawbox=box_source=side_data_detection_bboxes:color=red:skip=1,drawtext=text_source=side_data_detection_bboxes:fontcolor=yellow:bordercolor=yellow:fontsize=20:fontfile=DroidSans-Bold.ttf:skip=1'
```

The `dnn_detect` options were obtained from the YOLO4 model at [6].

Please also noticed I passed the extra option `skip=1` to both the `drawtext`
and the `drawbox` filters. This is to make them render the boxes information
from `track_sort` , instead of the ones from `dnn_detect`. More at [2].

I also recorded a video showing the filter in action [7].

Cheers,

Leandro

[1] https://arxiv.org/pdf/1703.07402
[2] https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/10
[3] https://github.com/abewley/sort
[4] https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/11
[5] https://motchallenge.net/vis/MOT16-06
[6]
https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/yolo-v4-tiny-tf/README.md
[7] https://youtu.be/U_y4-NnaINg

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] I've written a filter in Rust

Reply via email to