Hi Kirti

For the watermark problem, I think the description in the document mainly
refers to the out-of-order data between multiple files. This will result in
a large number of late events [1], which will generate a large number of
retract events, and late events out of time will be discarded.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/time/#lateness

Best,
Shammon FY


On Thu, Apr 13, 2023 at 8:27 PM Kirti Dhar Upadhyay K via user <
user@flink.apache.org> wrote:

> Hi,
>
>
>
> I am using Data stream file source connector in one of my use case.
>
> I was going through the documentation where I found below limitations:
>
>
>
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/#current-limitations
>
>    1. Watermarking does not work very well for large backlogs of files.
>    This is because watermarks eagerly advance within a file, and the next file
>    might contain data later than the watermark.
>
> *Queries:*
>
> Is there any FLIP/design document to better understand the impact of these
> limitations?
>
> Also, is there any work ongoing on these limitations for future Flink
> releases, if yes, please redirect to any related document?
>
>
>
>
>
>    1. For Unbounded File Sources, the enumerator currently remembers
>    paths of all already processed files, which is a state that can, in some
>    cases, grow rather large.
>
> *Query:*
>
>        What all data per file is part of checkpointing state by file
> source?
>
>
>
> Appreciate any help!
>
>
>
> Regards,
>
> Kirti Dhar
>

Reply via email to