Hello all!

I discovered while investigating FLINK-32008[1] that we can write to the
filesystem connector with the protobuf format, but today, the resulting
file is pretty unlikely to be useful or rereadable.

There's no real standard for storing many protobuf messages in a single
file container, although the documentation mentions writing size-delimited
messages sequentially[2].  In practice, I've never encountered protobuf
binaries stored on filesystems without using some other sort of "framing"
(like how parquet can be accessed with either an Avro or a protobuf
oriented API).

Does anyone have any use cases for bulk storage of protobuf messages on a
filesystem?  Should these files just be considered temporary storage for
Flink jobs, or do they need to be compatible with other systems?  Is there
a splittable / compressable file format?

The alternative might be to just forbid file storage for protobuf
messages!  Any opinions?

All my best, Ryan Skraba

[1]: https://issues.apache.org/jira/browse/FLINK-32008
[2]: https://protobuf.dev/programming-guides/techniques/#streaming

Reply via email to