[ 
https://issues.apache.org/jira/browse/NIFI-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266092#comment-17266092
 ] 

ASF subversion and git services commented on NIFI-8136:
-------------------------------------------------------

Commit 097edf4f7c5f9135f0ab9ab1c229bbbb35696b3f in nifi's branch 
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=097edf4 ]

NIFI-8136: Added getState/setState/replaceState/clearState methods to 
ProcessSession, updated processors to use these methods instead of StateManager 
version where appropriate

Signed-off-by: Matthew Burgess <[email protected]>

This closes #4757


> Allow State Management to be tied to Process Session
> ----------------------------------------------------
>
>                 Key: NIFI-8136
>                 URL: https://issues.apache.org/jira/browse/NIFI-8136
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have many processors currently that store state using NiFi's built-in 
> state management capabilities. To do this, processors need to do something 
> like:
> {code:java}
> Map<String, String> state = new HashMap<>();
> state.put("key", "value1");
> state.put("key2", "value2");
> if (flowFile != null) {
>     ...
>     state.put("key2", updatedValue2);
> }
> session.commit();
> context.getStateManager().setState(state, Scope.LOCAL);{code}
> Which is not a terrible API but comes with a few downfalls.
> If using a processor that has the @SupportsBatching annotation, calls to 
> ProcessContext.getStateManager().getState(Scope) can be costly to invoke for 
> each FlowFile. To avoid this, processors typically end up having to cache the 
> values themselves.
> Depending on the code, management of the state map can be difficult and 
> there's no ability to rollback the state changes once applied because the 
> call to setState() immediately updates the remote state.
> If executing within a different context, in which we want to store state 
> atomically with the FlowFiles that resulted in the state change, there's no 
> way to do that currently.
> To overcome these problems, we should allow for setting, getting, clearing, 
> and replacing state to be done via the ProcessSession, in addition to the 
> State Manager. I.e., a Processor developer may do either of:
> {code:java}
> context.getStateManager().setState(...);{code}
> Or
> {code:java}
> session.setState(...); {code}
> The former would behave as it does now, immediately updating state on the 
> remote system (zookeeper, for example). The latter would simply update an 
> in-memory copy of the state in the Process Session. When 
> ProcessSession.commit() is called, it would push the new state to the remote 
> system. If the session is rolled back, it would simply not update the state. 
> This allows the state to be set in the middle of the processor's algorithm, 
> rather than requiring that it be held onto until after session commit is 
> successful. If the session is then checkpointed (via session.commit while 
> running with a Run Duration greater than 0 ms), then the Session Checkpoint 
> will keep the state. Rolling back the session but not the checkpoint would 
> then result in the checkpointed state still be pushed out.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to