[ 
https://issues.apache.org/jira/browse/NIFI-13896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893128#comment-17893128
 ] 

ASF subversion and git services commented on NIFI-13896:
--------------------------------------------------------

Commit 63fe620d93ca856bd92e2ab74e50d3500c30813c in nifi's branch 
refs/heads/support/nifi-1.x from Lehel Boér
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=63fe620d93 ]

NIFI-13896 Improved TailFile performance (#9424)

Signed-off-by: David Handermann <[email protected]>

> Improving TailFile performance
> ------------------------------
>
>                 Key: NIFI-13896
>                 URL: https://issues.apache.org/jira/browse/NIFI-13896
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Lehel Boér
>            Assignee: Lehel Boér
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: tailfile_test.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In case of tailing numerous files, the processor is slow because it 
> repeatedly loops over a large number of tailed files and performs several 
> expensive operations.
>  * In the {{OnTrigger}} method, a loop (loop 1) iterates over all tailed 
> files in the state object.
>  * Inside this loop, for each tailed file, the {{recoverRolledFiles}} method 
> is called (loop 2), which then leads to {{consumeFilesFully}} and finally 
> triggers {{{}cleanup{}}}.
>  * In the {{cleanup}} method, another loop (loop 3) iterates over all tailed 
> files in the state again.
>  * During the {{{}cleanup{}}}, {{persistState}} is invoked, which removes any 
> legacy state variables from the NiFi state. These legacy state variables 
> originate from NiFi 1.0, when support for "Multiple Tailed Files" was not 
> available, so state keys didn’t have the "file.x." prefix. As the {{cleanup}} 
> iterates over and persists each tailed file's state, the overall state size 
> grows (adding six entries per tailed file). This causes the legacy cleanup 
> loop to become progressively slower with each iteration as the number of 
> state entries grows.
> This can lead to hours of execution time.
>  
> Suggestion for improvement:
>  
>  * Moving out the loop that removes old state entries from cleanup. The 
> cleanup of old entries should be run on the startup instead.
> {code:java}
> for(String key : oldState.toMap().keySet()) {
>     // These states are stored by older version of NiFi, and won't be used 
> anymore.
>     // New states have 'file.<index>.' prefix.
>     if (TailFileState.StateKeys.CHECKSUM.equals(key)
>             || TailFileState.StateKeys.FILENAME.equals(key)
>             || TailFileState.StateKeys.POSITION.equals(key)
>             || TailFileState.StateKeys.TIMESTAMP.equals(key)) {
>         getLogger().info("Removed state {}={} stored by older version of 
> NiFi.", new Object[]{key, oldState.get(key)});
>         continue;
>     }
>     updatedState.put(key, oldState.get(key));
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to