And what is the split policy for the FileInputFormat?it depends on the fs block size? Is there a pointer to the several flink input formats and a description of their internals?
On Wed, Oct 7, 2015 at 3:09 PM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Flavio, > > it is not possible to split by line count because that would mean to read > and parse the file just for splitting. > > Parallel processing of data sources depends on the input splits created by > the InputFormat. Local files can be split just like files in HDFS. Usually, > each file corresponds to at least one split but multiple files could also > be put into a single split if necessary.The logic for that would go into to > the InputFormat.createInputSplits() method. > > Cheers, Fabian > > 2015-10-07 14:47 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> Hi to all, >> >> is there a way to split a single local file by line count (e.g. a split >> every 100 lines) in a LocalEnvironment to speed up a simple map function? >> For me it is not very clear how the local files (files into directory if >> recursive=true) are managed by Flink..is there any ref to this internals? >> >> Best, >> Flavio >> > >