Hi, Zhu Thanks for reply! I found using SplittableIterator is also doable to some extent. How to choose between these two?
Best Lu On Wed, Aug 14, 2019 at 8:02 PM Zhu Zhu <reed...@gmail.com> wrote: > Hi Lu, > > Implementing your own *InputFormat* and *InputSplitAssigner*(which has > the interface "InputSplit getNextInputSplit(String host, int > taskId)") created by it should work if you want to assign InputSplit to > tasks according to the task index and file name patterns. > To assign 2 *InputSplit*s in one request, you can implement a new > *InputSplit* which wraps multiple *FileInputSplit*s. And you may need to > define in your *InputFormat* on how to process the new *InputSplit*. > > Thanks, > Zhu Zhu > > Lu Niu <qqib...@gmail.com> 于2019年8月15日周四 上午12:26写道: > >> Hi, >> >> I have a data set backed by a directory of files in which file names are >> meaningful. >> >> folder1 >> +-----file01 >> +-----file02 >> +-----file03 >> +-----file04 >> >> I want to control the file assignments in my flink application. For >> example, when parallelism is 2, worker 1 get file01 and file02 to read and >> worker2 get 3 and 4. Also each worker get 2 files all at once because >> reading requires jumping back and forth between those two files. >> >> What's the best way to do this? It seems like FileInputFormat is not >> extensible in this case. >> >> Best >> Lu >> >> >>