Re: Read all json files from a hdfs partition folder

Andrey Zagrebin Wed, 12 Dec 2018 08:24:39 -0800

Hi,

If the question is how to read all files from hdfs directory,
in general, each file is potentially a different DataSet (not DataStream). 
It needs to be decided how to combine/join them in Flink pipeline.


If the files are small enough, you could list them as string paths and use 
env.fromCollection to start the pipeline.
Next just manually load file into memory for each path in map operation and 
transform file contents into records for the next stage.

Best,
Andrey

> On 12 Dec 2018, at 15:02, Rakesh Kumar <rakkukumar2...@gmail.com> wrote:
> 
> Hi,
> 
> I wanted to read all json files from hdfs with partition folder.
> 
>       public static void main(String[] args) {
> 
>               StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> 
>               //path
>               
> //hdfs://localhost:8020/data/ingestion/ingestion.raw.product/2018/12/05/23
> 
>               DataStream<String> df = 
> env.readTextFile("hdfs://localhost:8020/data/ingestion/ingestion.raw.product");
>               try {
>                       df.print();
>                       env.execute("dfg");
>               } catch (Exception e) {
>                       e.printStackTrace();
>               }
> 
>       }

Re: Read all json files from a hdfs partition folder

Reply via email to