Hi,

Which file system are you reading from? If you are reading from S3, this
might be cause by S3's eventual consistency property.
Have a look at FLINK-9940 [1] for a more detailed discussion.
There is also an open PR [2], that you could try to patch the source
operator with.

Best, Fabian

[1] https://issues.apache.org/jira/browse/FLINK-9940
[2] https://github.com/apache/flink/pull/6613

Am Fr., 12. Okt. 2018 um 20:41 Uhr schrieb Juan Miguel Cejuela <
jua...@tagtog.net>:

> Dear flinksters,
>
>
> I'm using the class `ContinuousFileMonitoringFunction` as a source to
> monitor a folder for new incoming files.* I have the problem that not all
> the files that are sent to the folder get processed / triggered by the
> function*. Specific details of my workflow is that I send up to 1k files
> per minute, and that I consume the stream as a `AsyncDataStream`.
>
> I myself raised an unrelated issue with the
> `ContinuousFileMonitoringFunction` class some time ago (
> https://issues.apache.org/jira/browse/FLINK-8046): if two or more files
> shared the very same timestamp, only the first one (non-deterministically
> chosen) would be processed. However, I patched the file myself to fix that
> problem by using a LinkedHashMap to remember which files had been really
> processed before or not. My patch is working fine as far as I can tell.
>
> The problem seems to be rather that some files (when many are sent at once
> to the same folder) do not even get triggered/activated/registered by the
> class.
>
>
> Am I properly explaining my problem?
>
>
> Any hints to solve this challenge would be greatly appreciated ! ❤ THANK
> YOU
>
> --
> Juanmi, CEO and co-founder @ 🍃tagtog.net
>
>     Follow tagtog updates on 🐦 Twitter: @tagtog_net
> <https://twitter.com/tagtog_net>
>
>

Reply via email to