Re: Read a given list of HDFS folder

Ufuk Celebi Mon, 21 Mar 2016 05:41:40 -0700

Hey Gwenhaël,

see here for recursive traversal of input paths:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#recursive-traversal-of-the-input-path-directory


Regarding the phases: the best way to exchange data between batch jobs
is via files. You can then execute two programs one after the other,
the first one produces the files, which the second jobs uses as input.

– Ufuk



On Mon, Mar 21, 2016 at 12:14 PM, Gwenhael Pasquiers
<gwenhael.pasqui...@ericsson.com> wrote:
> Hello,
>
> Sorry if this has been already asked or is already in the docs, I did not 
> find the answer :
>
> Is there a way to read a given set of folders in Flink batch ? Let's say we 
> have one folder per hour of data, written by flume, and we'd like to read 
> only the N last hours (or any other pattern or arbitrary list of folders).
>
> And while I'm at it I have another question :
>
> Let's say that in my batch task I need to sequence two "phases" and that the 
> second phase needs the final result from the first one.
>  - Do I have to create, in the TaskManager, one Execution environment per 
> task and execute them one after the other ?
>  - Can my TaskManagers send back some data (other than counters) to the 
> JobManager or do I have to use a file to store the result from phase one and 
> use it in phase Two ?
>
> Thanks in advance for your answers,
>
> Gwenhaël

Re: Read a given list of HDFS folder

Reply via email to