Re: HDFS streaming source concerns

2022-04-19 Thread Adrian Bednarz
Hello, We are actually working on a similar problem against S3. The checkpointing thing got me thinking if the checkpoint would indeed succeed with a large backlog of files. I always imagined that SplitEnumerator lists all available files and SourceReader is responsible for reading those files aft

Re: HDFS streaming source concerns

2022-04-08 Thread Roman Khachatryan
Hi Carlos, AFAIK, Flink FileSource is capable of checkpointing while reading the files (at least in Streaming Mode). As for the watermarks, I think FLIP-182 [1] could solve the problem; however, it's currently under development. I'm also pulling in Arvid and Fabian who are more familiar with the