Hi, I am using Data stream file source connector in one of my use case. I was going through the documentation where I found below limitations:
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/#current-limitations 1. Watermarking does not work very well for large backlogs of files. This is because watermarks eagerly advance within a file, and the next file might contain data later than the watermark. Queries: Is there any FLIP/design document to better understand the impact of these limitations? Also, is there any work ongoing on these limitations for future Flink releases, if yes, please redirect to any related document? 1. For Unbounded File Sources, the enumerator currently remembers paths of all already processed files, which is a state that can, in some cases, grow rather large. Query: What all data per file is part of checkpointing state by file source? Appreciate any help! Regards, Kirti Dhar