[ 
https://issues.apache.org/jira/browse/FLINK-30217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645920#comment-17645920
 ] 

Hangxiang Yu commented on FLINK-30217:
--------------------------------------

Hi [~xljtswf]. Thanks a lot for reporting this.

I think it makes sense to me that we could replace clear + add with update for 
ListState in some cases.

It could reduce the call of StateBackend, and also reduce the count of 
serialization of key.
It's also related to implementation detail of some operators, so [~kevin.cyj] 
[~wanglijie] WDYT?

> Use ListState#update() to replace clear + add mode.
> ---------------------------------------------------
>
>                 Key: FLINK-30217
>                 URL: https://issues.apache.org/jira/browse/FLINK-30217
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: xljtswf
>            Priority: Major
>
> When using listState, I found many times we need to clear current state, then 
> add new values. This is especially common in 
> CheckpointedFunction#snapshotState, which is slower than just use 
> ListState#update().
> Suppose we want to update the liststate to contain value1, value2, value3.
> With current implementation, we first call Liststate#clear(). this updates 
> the state 1 time.
> then we add value1, value2, value3 to the state.
> if we use heap state, we need to search the stateTable 3 times and add 3 
> values to the list.
> this happens in memory and is not too bad.
> if we use rocksdb. then we will call backend.db.merge() 3 times.
> finally, we will  update the state 4 times.
> The more values to be added, the more times we will update the state.
> while if we use listState#update. then we just need to update the state 1 
> time. I think this can save a lot of time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to