[
https://issues.apache.org/jira/browse/NIFI-13896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Handermann resolved NIFI-13896.
-------------------------------------
Fix Version/s: 2.0.0
Resolution: Fixed
> Improving TailFile performance
> ------------------------------
>
> Key: NIFI-13896
> URL: https://issues.apache.org/jira/browse/NIFI-13896
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Lehel Boér
> Assignee: Lehel Boér
> Priority: Major
> Fix For: 2.0.0
>
> Attachments: tailfile_test.png
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> In case of tailing numerous files, the processor is slow because it
> repeatedly loops over a large number of tailed files and performs several
> expensive operations.
> * In the {{OnTrigger}} method, a loop (loop 1) iterates over all tailed
> files in the state object.
> * Inside this loop, for each tailed file, the {{recoverRolledFiles}} method
> is called (loop 2), which then leads to {{consumeFilesFully}} and finally
> triggers {{{}cleanup{}}}.
> * In the {{cleanup}} method, another loop (loop 3) iterates over all tailed
> files in the state again.
> * During the {{{}cleanup{}}}, {{persistState}} is invoked, which removes any
> legacy state variables from the NiFi state. These legacy state variables
> originate from NiFi 1.0, when support for "Multiple Tailed Files" was not
> available, so state keys didn’t have the "file.x." prefix. As the {{cleanup}}
> iterates over and persists each tailed file's state, the overall state size
> grows (adding six entries per tailed file). This increase causes the legacy
> cleanup loop to become progressively slower with each iteration as the number
> of state entries grows.
> This can lead to hours of execution time.
>
> Suggestion for improvement:
>
> * Moving out the loop that removes old state entries from cleanup. The
> cleanup of old entries should be run on the startup instead.
> {code:java}
> for(String key : oldState.toMap().keySet()) {
> // These states are stored by older version of NiFi, and won't be used
> anymore.
> // New states have 'file.<index>.' prefix.
> if (TailFileState.StateKeys.CHECKSUM.equals(key)
> || TailFileState.StateKeys.FILENAME.equals(key)
> || TailFileState.StateKeys.POSITION.equals(key)
> || TailFileState.StateKeys.TIMESTAMP.equals(key)) {
> getLogger().info("Removed state {}={} stored by older version of
> NiFi.", new Object[]{key, oldState.get(key)});
> continue;
> }
> updatedState.put(key, oldState.get(key));
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)