It sounds like what you want to do could also be done in the same way as
"addSource()" with a GenericInputFormat.
It would not look like a FileInputFormat to Flink, and the JobManager would
assign one generic meaningless split to each parallel instance.
Inside the input format, you could still in
Hi,
reading local files in a distributed setting is a tricky thing because
Flink assumes that all InputSplits can be read from all TaskManagers. This
is obviously not possible if files are located on the local file systems
different physical machines. Hence, you cannot use one of the provided
File
Hi Robert,
We are not using HDFS. We have a large file that's already split into 8
parts, each of them on a node that runs a separate task manager, at the
same place, with the same name. The job manager is in another node. If I
start a job that uses readTextFile, I get an exception, saying that th
Hi Daniel,
Are the files in HDFS?
what do you exactly mean by "`readTextFile` wants to read the file on the
JobManager" ?
The JobManager is not reading input files.
Also, Flink is assigning input splits locally (when reading from
distributed file systems). In the JobManager log you can see how man
Hi Márton,
Thanks for the reply! I suppose I have to implement `createInputSplits` too
then. I tried looking at the documentation for the InputFormat interface,
but I can't see how I could force it to load separate files on separate
task managers, instead of one file on the job manager. Where is t
Hi Dani,
The batch API does not expose an addSourse-like method, but you can always
write your own inputformat and pass that directly to constructor of the
DataSource. DataSource extends DataSet, so you will get all the usual
methods in the end. For an example you can have a look e.g. here. [1]
[
Hello!
We are running an experiment on a cluster and we have a large input split
into multiple files. We'd like to run a Flink job that reads the local file
on each instance and processes that. Is there a way to do this in the batch
environment? `readTextFile` wants to read the file on the JobMana