[ https://issues.apache.org/jira/browse/FLINK-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247846#comment-16247846 ]
Juan Miguel Cejuela commented on FLINK-8046: -------------------------------------------- Since we are at this, it is in my humble opinion also strange that, when computing the file splits as in `format.createInputSplits(readerParallelism)`, the given `readerParallelism` is used, but not the the format's `unstoppable` field or `.getNumSplits()` method. I don't know if this could be for another issue. > ContinuousFileMonitoringFunction wrongly ignores files with exact same > timestamp > -------------------------------------------------------------------------------- > > Key: FLINK-8046 > URL: https://issues.apache.org/jira/browse/FLINK-8046 > Project: Flink > Issue Type: Bug > Components: Streaming > Affects Versions: 1.3.2 > Reporter: Juan Miguel Cejuela > Labels: stream > Fix For: 1.5.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > The current monitoring of files sets the internal variable > `globalModificationTime` to filter out files that are "older". However, the > current test (to check "older") does > `boolean shouldIgnore = modificationTime <= globalModificationTime;` (rom > `shouldIgnore`) > The comparison should strictly be SMALLER (NOT smaller or equal). The method > documentation also states "This happens if the modification time of the file > is _smaller_ than...". > The equality acceptance for "older", makes some files with same exact > timestamp to be ignored. The behavior is also non-deterministic, as the first > file to be accepted ("first" being pretty much random) makes the rest of > files with same exact timestamp to be ignored. -- This message was sent by Atlassian JIRA (v6.4.14#64029)