[ https://issues.apache.org/jira/browse/FLINK-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-10518: ----------------------------------- Labels: Source:FileSystem auto-deprioritized-major auto-deprioritized-minor auto-unassigned (was: Source:FileSystem auto-deprioritized-major auto-unassigned stale-minor) Priority: Not a Priority (was: Minor) This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Inefficient design in ContinuousFileMonitoringFunction > ------------------------------------------------------ > > Key: FLINK-10518 > URL: https://issues.apache.org/jira/browse/FLINK-10518 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Affects Versions: 1.5.2 > Reporter: Huyen Levan > Priority: Not a Priority > Labels: Source:FileSystem, auto-deprioritized-major, > auto-deprioritized-minor, auto-unassigned > > The ContinuousFileMonitoringFunction class keeps track of the latest file > modification time to rule out all files it has processed in the previous > cycles. For a long-running job, the list of eligible files will be much > smaller than the list of all files in the folder being monitored. > In the current implementation of the getInputSplitsSortedByModTime method, a > (big) list of all available splits are created first, and then every single > split is checked with the list of eligible files. > {quote}for (FileInputSplit split: > format.createInputSplits(readerParallelism)) { > FileStatus fileStatus = eligibleFiles.get(split.getPath()); > if (fileStatus != null) { > {quote} > The improvement can be done as: > * Listing of all files should be done once in > _ContinuousFileMonitoringFunction.listEligibleFiles()_ (as of now it is done > the 2nd time in _FileInputFormat.createInputSplits()_ ) > * The list of file-splits should then be created from the list of paths in > eligibleFiles. -- This message was sent by Atlassian Jira (v8.20.1#820001)