Hi Márton, Thanks for the reply! I suppose I have to implement `createInputSplits` too then. I tried looking at the documentation for the InputFormat interface, but I can't see how I could force it to load separate files on separate task managers, instead of one file on the job manager. Where is this behavior decided? Or am I misunderstanding something about how this all works?
Cheers, Daniel On Sun, Jun 14, 2015 at 7:02 PM, Márton Balassi <balassi.mar...@gmail.com> wrote: > Hi Dani, > > The batch API does not expose an addSourse-like method, but you can always > write your own inputformat and pass that directly to constructor of the > DataSource. DataSource extends DataSet, so you will get all the usual > methods in the end. For an example you can have a look e.g. here. [1] > > [1] > https://github.com/dataArtisans/flink-dataflow/blob/master/src/main/java/com/dataartisans/flink/dataflow/translation/FlinkTransformTranslators.java#L133 > > Best, > > Marton > > On Sun, Jun 14, 2015 at 4:34 PM, Dániel Bali <balijanosdan...@gmail.com> > wrote: > >> Hello! >> >> We are running an experiment on a cluster and we have a large input split >> into multiple files. We'd like to run a Flink job that reads the local file >> on each instance and processes that. Is there a way to do this in the batch >> environment? `readTextFile` wants to read the file on the JobManager and >> split that right there, which is not what we want. >> >> We solved it in the streaming environment by using `addSource`, but there >> is no similar function in the batch version. Does anybody know how this >> could be done? >> >> Thanks! >> Daniel >> > >