We've got an issue currently with the REMOTE_POLL_BATCH_SIZE property for
both ListSFTP and ListFTP. At the moment the property is disabled in
ListSFTP after NIFI-14326, but this was a known issue (see discussion
https://github.com/apache/nifi/pull/8390#issuecomment-1937805866) and I
believe the defect affects both processors - but has only been disabled in
one.

The gist of the problem is that the TIME_TRACKING strategy, where the
processor keeps track of the most recent timestamp processed, will never
work when paired with a batch size. Any batch size less than the total
number of files is never guaranteed to return the oldest set of files, and
therefore the 'latest' timestamp will be updated to an erroneous value and
remaining files can be missed.

There is however, still very much a valid use-case to support a batch size
in conjunction with other tracking strategies.

These are the options that I can see to resolve:

Disable from both:
Currently we have it disabled in ListSFTP, but as both processors suffer
the same issue we should disable one or both of them to protect people from
making this mistake.

Dependent property:
We make REMOTE_POLL_BATCH_SIZE a dependent property on selecting the
tracking strategies that it works with. This is potentially a breaking
change, but protects people from selecting an invalid configuration and
highlights where existing configurations are not valid.

Warning message:
We add a warning message to the property descriptor which informs people
not to select either of the time based tracking strategies with a batch
size.

My suggestion would be to add a dependent property but I can understand if
there are problems with introducing a breaking change to configurations.

Cheers,

Tom

Reply via email to