Hi Kirti

For the watermark problem, I think the description in the document mainly
refers to the out-of-order data between multiple files. This will result in
a large number of late events [1], which will generate a large number of
retract events, and late events out of time will be discarded.


Shammon FY

On Thu, Apr 13, 2023 at 8:27 PM Kirti Dhar Upadhyay K via user <
user@flink.apache.org> wrote:

> Hi,
> I am using Data stream file source connector in one of my use case.
> I was going through the documentation where I found below limitations:
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/#current-limitations
>    1. Watermarking does not work very well for large backlogs of files.
>    This is because watermarks eagerly advance within a file, and the next file
>    might contain data later than the watermark.
> *Queries:*
> Is there any FLIP/design document to better understand the impact of these
> limitations?
> Also, is there any work ongoing on these limitations for future Flink
> releases, if yes, please redirect to any related document?
>    1. For Unbounded File Sources, the enumerator currently remembers
>    paths of all already processed files, which is a state that can, in some
>    cases, grow rather large.
> *Query:*
>        What all data per file is part of checkpointing state by file
> source?
> Appreciate any help!
> Regards,
> Kirti Dhar

Reply via email to