Hello,
We are actually working on a similar problem against S3. The checkpointing
thing got me thinking if the checkpoint would indeed succeed with a large
backlog of files. I always imagined that SplitEnumerator lists all
available files and SourceReader is responsible for reading those files
aft
Hi Carlos,
AFAIK, Flink FileSource is capable of checkpointing while reading the
files (at least in Streaming Mode).
As for the watermarks, I think FLIP-182 [1] could solve the problem;
however, it's currently under development.
I'm also pulling in Arvid and Fabian who are more familiar with the
Hi,
We have an in-house platform that we want to integrate with external
clients via HDFS. They have lots of existing files and they continuously
put more data to HDFS. Ideally, we would like to have a Flink job that
takes care of ingesting data as one of the requirements is to execute SQL
on top