[ https://issues.apache.org/jira/browse/FLINK-27934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552830#comment-17552830 ]
Frans King commented on FLINK-27934: ------------------------------------ Pull request here with my changes - https://github.com/apache/flink-statefun/pull/315 > Python API- Inefficient deserialization/serialization of state variables > within a batch > --------------------------------------------------------------------------------------- > > Key: FLINK-27934 > URL: https://issues.apache.org/jira/browse/FLINK-27934 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions > Affects Versions: statefun-3.2.0 > Reporter: Frans King > Priority: Minor > Labels: pull-request-available > > In the Python API state variables can be accessed via the UserFacingContext: > variable = context.storage.variable > This calls into the Cell instance for that state variable which has get() & > set() methods. The get() method always deserializes from the typed_value and > the set() always re-serializes and marks the cell dirty. > > This has two side effects > 1: > var1 = context.storage.variable > var2 = context.storage.variable > id(var2) != id(var1) - they are different instances > > 2: > In a large batch (say 1000 calls to the same function type and id) this can > result in deserializing and re-serializing the same same state variable 1000 > times when really it only needs to be deserialized in the first invocation in > the batch, held in memory until the last invocation and then re-serialized > prior to collecting the mutations. > > I think this can be improved by having a lazily initialized backing field in > the Cell class but I don't know if this was a conscious design decision to > have the behavior described in 1. > > Any feedback would be welcome. -- This message was sent by Atlassian Jira (v8.20.7#820007)