Hi Daniel,

Are the files in HDFS?
what do you exactly mean by "`readTextFile` wants to read the file on the
JobManager" ?
The JobManager is not reading input files.
Also, Flink is assigning input splits locally (when reading from
distributed file systems). In the JobManager log you can see how many
splits are assigned locally and how many do remote reads. Usually the
number of remote reads is very low.



On Sun, Jun 14, 2015 at 11:18 AM, Dániel Bali <balijanosdan...@gmail.com>
wrote:

> Hi Márton,
>
> Thanks for the reply! I suppose I have to implement `createInputSplits`
> too then. I tried looking at the documentation for the InputFormat
> interface, but I can't see how I could force it to load separate files on
> separate task managers, instead of one file on the job manager. Where is
> this behavior decided? Or am I misunderstanding something about how this
> all works?
>
> Cheers,
> Daniel
>
> On Sun, Jun 14, 2015 at 7:02 PM, Márton Balassi <balassi.mar...@gmail.com>
> wrote:
>
>> Hi Dani,
>>
>> The batch API does not expose an addSourse-like method, but you can
>> always write your own inputformat and pass that directly to constructor of
>> the DataSource. DataSource extends DataSet, so you will get all the usual
>> methods in the end. For an example you can have a look e.g. here. [1]
>>
>> [1]
>> https://github.com/dataArtisans/flink-dataflow/blob/master/src/main/java/com/dataartisans/flink/dataflow/translation/FlinkTransformTranslators.java#L133
>>
>> Best,
>>
>> Marton
>>
>> On Sun, Jun 14, 2015 at 4:34 PM, Dániel Bali <balijanosdan...@gmail.com>
>> wrote:
>>
>>> Hello!
>>>
>>> We are running an experiment on a cluster and we have a large input
>>> split into multiple files. We'd like to run a Flink job that reads the
>>> local file on each instance and processes that. Is there a way to do this
>>> in the batch environment? `readTextFile` wants to read the file on the
>>> JobManager and split that right there, which is not what we want.
>>>
>>> We solved it in the streaming environment by using `addSource`, but
>>> there is no similar function in the batch version. Does anybody know how
>>> this could be done?
>>>
>>> Thanks!
>>> Daniel
>>>
>>
>>
>

Reply via email to