[ https://issues.apache.org/jira/browse/BEAM-14267?focusedWorklogId=763761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763761 ]
ASF GitHub Bot logged work on BEAM-14267: ----------------------------------------- Author: ASF GitHub Bot Created on: 28/Apr/22 17:50 Start Date: 28/Apr/22 17:50 Worklog Time Spent: 10m Work Description: aaltay commented on PR #17305: URL: https://github.com/apache/beam/pull/17305#issuecomment-1112497195 Run Java PreCommit Issue Time Tracking ------------------- Worklog Id: (was: 763761) Time Spent: 2h 20m (was: 2h 10m) > Update watchForNewFiles to allow reading already read files with a new > timestamp > -------------------------------------------------------------------------------- > > Key: BEAM-14267 > URL: https://issues.apache.org/jira/browse/BEAM-14267 > Project: Beam > Issue Type: New Feature > Components: io-java-files > Reporter: Yi Hu > Assignee: Yi Hu > Priority: P2 > Time Spent: 2h 20m > Remaining Estimate: 0h > > In TextIO and AvroIO, we have a configuration option called watchForNewFiles, > and in FileIO.MatchConfiguration, we have an option called watchInterval. > Right now, these match any files according to the filtering criteria, and > then periodically check for new files. A file is determined to be new if it > has a different filename than a file that has already been read. > We want to add an option to choose to consider a file new if it has a > different timestamp from an existing file, even if the file itself has the > same name. > See the following design doc for more detail: > [https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/edit?usp=sharing&resourcekey=0-be0uF-DdmwAz6Vg4Li9FNw] > -- This message was sent by Atlassian Jira (v8.20.7#820007)