Thank you for the advice! Now I have a new question. I read the source[1] streaming env exploits FileSourceFunction, which inherits RichParallelSourceFunction, to create split input[2]. I know I can set parallelism in streaming env, but any way I can verify that at runtime the split files or the file is read in parallel?
Thank you again for your help. [1]. https://raw.githubusercontent.com/eBay/Flink/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java [2]. https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java On 28 May 2016 at 17:52, Chesnay Schepler <ches...@apache.org> wrote: > ExecutionEnvironment.readTextFile will read the file in parallel. > > > On 28.05.2016 09:59, David Olsen wrote: > > After searching on the internet I still do not find the answer (with key > word like 'apache flink parallel read text') I am looking for. So asking > here before jumping to write code ... > > My problem is I want to a read text file or split text files (from local > file system). Therefore I want to parallel read those files and process > them accordingly. > > From what I discover so far: > - Use ExecutionEnvironment.readTextFile but this only serves with 1 > thread(?) (meaning reading the file(s) from the beginning to the end) > - Use streaming env to addSource[1] but that seems to me I need to > implement my own source with RichParallelSourceFunction. > Is there any classes or impl that already can read text in parallel? > Thanks > > [1]. > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Reading-separate-files-in-parallel-tasks-as-input-td1623.html > > >