Hello all! I discovered while investigating FLINK-32008[1] that we can write to the filesystem connector with the protobuf format, but today, the resulting file is pretty unlikely to be useful or rereadable.
There's no real standard for storing many protobuf messages in a single file container, although the documentation mentions writing size-delimited messages sequentially[2]. In practice, I've never encountered protobuf binaries stored on filesystems without using some other sort of "framing" (like how parquet can be accessed with either an Avro or a protobuf oriented API). Does anyone have any use cases for bulk storage of protobuf messages on a filesystem? Should these files just be considered temporary storage for Flink jobs, or do they need to be compatible with other systems? Is there a splittable / compressable file format? The alternative might be to just forbid file storage for protobuf messages! Any opinions? All my best, Ryan Skraba [1]: https://issues.apache.org/jira/browse/FLINK-32008 [2]: https://protobuf.dev/programming-guides/techniques/#streaming