The split functionality is in the FileInputFormat and the functionality that takes care of lines across splits is in the DelimitedIntputFormat.
On Wed, Oct 7, 2015 at 3:24 PM, Fabian Hueske <fhue...@gmail.com> wrote: > I'm sorry there is no such documentation. > You need to look at the code :-( > > 2015-10-07 15:19 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> And what is the split policy for the FileInputFormat?it depends on the fs >> block size? >> Is there a pointer to the several flink input formats and a description >> of their internals? >> >> On Wed, Oct 7, 2015 at 3:09 PM, Fabian Hueske <fhue...@gmail.com> wrote: >> >>> Hi Flavio, >>> >>> it is not possible to split by line count because that would mean to >>> read and parse the file just for splitting. >>> >>> Parallel processing of data sources depends on the input splits created >>> by the InputFormat. Local files can be split just like files in HDFS. >>> Usually, each file corresponds to at least one split but multiple files >>> could also be put into a single split if necessary.The logic for that would >>> go into to the InputFormat.createInputSplits() method. >>> >>> Cheers, Fabian >>> >>> 2015-10-07 14:47 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>> >>>> Hi to all, >>>> >>>> is there a way to split a single local file by line count (e.g. a split >>>> every 100 lines) in a LocalEnvironment to speed up a simple map function? >>>> For me it is not very clear how the local files (files into directory if >>>> recursive=true) are managed by Flink..is there any ref to this internals? >>>> >>>> Best, >>>> Flavio >>>> >>> >>> >> >> >