Re: Read all json files from a hdfs partition folder

2018-12-13 Thread Andrey Zagrebin
Hi Rakesh, So the problem is that you want your Flink job to monitor ' /data/ingestion/ingestion-raw-product’ path for new files inside and process them when they appear, right? Can you try env.readFile but with watchType = FileProcessingMode.PROCESS_CONTINUOUSLY? You can see an example in how

Re: Read all json files from a hdfs partition folder

2018-12-12 Thread Andrey Zagrebin
Actually, does it not work if you just provide directory in env.readTextFile as in your code example or what is the problem? > On 12 Dec 2018, at 17:24, Andrey Zagrebin wrote: > > Hi, > > If the question is how to read all files from hdfs directory, > in general, each file is potentially a dif

Re: Read all json files from a hdfs partition folder

2018-12-12 Thread Andrey Zagrebin
Hi, If the question is how to read all files from hdfs directory, in general, each file is potentially a different DataSet (not DataStream). It needs to be decided how to combine/join them in Flink pipeline. If the files are small enough, you could list them as string paths and use env.fromColl