[
https://issues.apache.org/jira/browse/NIFI-13896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893128#comment-17893128
]
ASF subversion and git services commented on NIFI-13896:
--------------------------------------------------------
Commit 63fe620d93ca856bd92e2ab74e50d3500c30813c in nifi's branch
refs/heads/support/nifi-1.x from Lehel Boér
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=63fe620d93 ]
NIFI-13896 Improved TailFile performance (#9424)
Signed-off-by: David Handermann <[email protected]>
> Improving TailFile performance
> ------------------------------
>
> Key: NIFI-13896
> URL: https://issues.apache.org/jira/browse/NIFI-13896
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Lehel Boér
> Assignee: Lehel Boér
> Priority: Major
> Fix For: 2.0.0
>
> Attachments: tailfile_test.png
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> In case of tailing numerous files, the processor is slow because it
> repeatedly loops over a large number of tailed files and performs several
> expensive operations.
> * In the {{OnTrigger}} method, a loop (loop 1) iterates over all tailed
> files in the state object.
> * Inside this loop, for each tailed file, the {{recoverRolledFiles}} method
> is called (loop 2), which then leads to {{consumeFilesFully}} and finally
> triggers {{{}cleanup{}}}.
> * In the {{cleanup}} method, another loop (loop 3) iterates over all tailed
> files in the state again.
> * During the {{{}cleanup{}}}, {{persistState}} is invoked, which removes any
> legacy state variables from the NiFi state. These legacy state variables
> originate from NiFi 1.0, when support for "Multiple Tailed Files" was not
> available, so state keys didn’t have the "file.x." prefix. As the {{cleanup}}
> iterates over and persists each tailed file's state, the overall state size
> grows (adding six entries per tailed file). This causes the legacy cleanup
> loop to become progressively slower with each iteration as the number of
> state entries grows.
> This can lead to hours of execution time.
>
> Suggestion for improvement:
>
> * Moving out the loop that removes old state entries from cleanup. The
> cleanup of old entries should be run on the startup instead.
> {code:java}
> for(String key : oldState.toMap().keySet()) {
> // These states are stored by older version of NiFi, and won't be used
> anymore.
> // New states have 'file.<index>.' prefix.
> if (TailFileState.StateKeys.CHECKSUM.equals(key)
> || TailFileState.StateKeys.FILENAME.equals(key)
> || TailFileState.StateKeys.POSITION.equals(key)
> || TailFileState.StateKeys.TIMESTAMP.equals(key)) {
> getLogger().info("Removed state {}={} stored by older version of
> NiFi.", new Object[]{key, oldState.get(key)});
> continue;
> }
> updatedState.put(key, oldState.get(key));
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)