Thank you for the advice!

Now I have a new question. I read the source[1] streaming env exploits
FileSourceFunction, which inherits RichParallelSourceFunction, to create
split input[2]. I know I can set parallelism in streaming env, but any way
I can verify that at runtime the split files or the file is read in
parallel?

Thank you again for your help.

[1].
https://raw.githubusercontent.com/eBay/Flink/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java

[2].
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java



On 28 May 2016 at 17:52, Chesnay Schepler <ches...@apache.org> wrote:

> ExecutionEnvironment.readTextFile will read the file in parallel.
>
>
> On 28.05.2016 09:59, David Olsen wrote:
>
> After searching on the internet I still do not find the answer (with key
> word like 'apache flink parallel read text') I am looking for. So asking
> here before jumping to write code ...
>
> My problem is I want to a read text file or split text files (from local
> file system). Therefore I want to parallel read those files and process
> them accordingly.
>
> From what I discover so far:
> - Use ExecutionEnvironment.readTextFile but this only serves with 1
> thread(?) (meaning reading the file(s) from the beginning to the end)
> - Use streaming env to addSource[1] but that seems to me I need to
> implement my own source with RichParallelSourceFunction.
> Is there any classes or impl that already can read text in parallel?
> Thanks
>
> [1].
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Reading-separate-files-in-parallel-tasks-as-input-td1623.html
>
>
>

Reply via email to