Hi Lu,

Implementing your own *InputFormat* and *InputSplitAssigner*(which has the
interface "InputSplit getNextInputSplit(String host, int taskId)") created
by it should work if you want to assign InputSplit to tasks according to
the task index and file name patterns.
To assign 2 *InputSplit*s in one request, you can implement a new
*InputSplit* which wraps multiple *FileInputSplit*s. And you may need to
define in your *InputFormat* on how to process the new *InputSplit*.

Thanks,
Zhu Zhu

Lu Niu <qqib...@gmail.com> 于2019年8月15日周四 上午12:26写道:

> Hi,
>
> I have a data set backed by a directory of files in which file names are
> meaningful.
>
> folder1
>    +-----file01
>    +-----file02
>    +-----file03
>    +-----file04
>
> I want to control the file assignments in my flink application. For
> example, when parallelism is 2, worker 1 get file01 and file02 to read and
> worker2 get 3 and 4. Also each worker get 2 files all at once because
> reading requires jumping back and forth between those two files.
>
> What's the best way to do this? It seems like FileInputFormat is not
> extensible in this case.
>
> Best
> Lu
>
>
>

Reply via email to