Tamas Palfy created NIFI-8081:
---------------------------------
Summary: List[S]FTP can miss files when multiple subdirectories
are written while listing
Key: NIFI-8081
URL: https://issues.apache.org/jira/browse/NIFI-8081
Project: Apache NiFi
Issue Type: Improvement
Reporter: Tamas Palfy
ListFTP and ListSFTP scans subdirectories one after the other and because of
this they can have the following issue when using 'Tracking Timestamps' as
'Listing Strategy':
# Processor starts and finishes listing directory1
# Processor starts listing directory2
# file1 arrives in directory1 with ts(timestamp)=1
# file2 arrives in directory2 (or any other, not yet listed directory) with ts=2
# Processor finishes listing director2
# Processor returns result which will contain file2(ts=2) but not file1(ts=1)
# Processor stores ts=2 as the latest seen timestamp
# file1 will be filtered out next time (and every subsequent listing) because
it's timestamp is less than the lates seen timestamp
Fix: Leave 'Tracking Timestamps' behaviour as it is (just update documentation)
and create a new strategy. This strategy checks the current time in each cycle
and lists all files that have arrived before the current time (but after the
previous cycle). Compares file timestamps to the current time so it needs to be
adjusted with the timezone difference of NiFi and the file hosting system.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)