Hi, In Hadoop MapReduce there is the notion of "splittable" in the FileInputFormat. This has the effect that a single input file can be fed into multiple separate instances of the mapper that read the data. A lot has been documented (i.e. text is splittable per line, gzipped text is not splittable) and designed into the various file formats (like Avro and Parquet) to allow splittability.
The goal is that reading and parsing files can be done by multiple cpus/systems in parallel. How is this handled in Flink? Can Flink read a single file in parallel? How does Flink administrate/handle the possibilities regarding the various file formats? The reason I ask is because I want to see if I can port this (now Hadoop specific) hobby project of mine to work with Flink: https://github.com/nielsbasjes/splittablegzip Thanks. -- Best regards / Met vriendelijke groeten, Niels Basjes