Hi, Zhu

Thanks for reply! I found using SplittableIterator is also doable to some
extent. How to choose between these two?

Best
Lu

On Wed, Aug 14, 2019 at 8:02 PM Zhu Zhu <reed...@gmail.com> wrote:

> Hi Lu,
>
> Implementing your own *InputFormat* and *InputSplitAssigner*(which has
> the interface "InputSplit getNextInputSplit(String host, int
> taskId)") created by it should work if you want to assign InputSplit to
> tasks according to the task index and file name patterns.
> To assign 2 *InputSplit*s in one request, you can implement a new
> *InputSplit* which wraps multiple *FileInputSplit*s. And you may need to
> define in your *InputFormat* on how to process the new *InputSplit*.
>
> Thanks,
> Zhu Zhu
>
> Lu Niu <qqib...@gmail.com> 于2019年8月15日周四 上午12:26写道:
>
>> Hi,
>>
>> I have a data set backed by a directory of files in which file names are
>> meaningful.
>>
>> folder1
>>    +-----file01
>>    +-----file02
>>    +-----file03
>>    +-----file04
>>
>> I want to control the file assignments in my flink application. For
>> example, when parallelism is 2, worker 1 get file01 and file02 to read and
>> worker2 get 3 and 4. Also each worker get 2 files all at once because
>> reading requires jumping back and forth between those two files.
>>
>> What's the best way to do this? It seems like FileInputFormat is not
>> extensible in this case.
>>
>> Best
>> Lu
>>
>>
>>

Reply via email to