AFAIK Flink has a similar notion of splittable as Hadoop. Furthermore you can set for custom Fileibputformats the attribute unsplittable = true if your file format cannot be split
> On 18. Feb 2018, at 13:28, Niels Basjes <ni...@basjes.nl> wrote: > > Hi, > > In Hadoop MapReduce there is the notion of "splittable" in the > FileInputFormat. This has the effect that a single input file can be fed > into multiple separate instances of the mapper that read the data. > A lot has been documented (i.e. text is splittable per line, gzipped text > is not splittable) and designed into the various file formats (like Avro > and Parquet) to allow splittability. > > The goal is that reading and parsing files can be done by multiple > cpus/systems in parallel. > > How is this handled in Flink? > Can Flink read a single file in parallel? > How does Flink administrate/handle the possibilities regarding the various > file formats? > > > The reason I ask is because I want to see if I can port this (now Hadoop > specific) hobby project of mine to work with Flink: > https://github.com/nielsbasjes/splittablegzip > > Thanks. > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes