稍等,我看一下您的反馈
发自我的iPhone ------------------ Original ------------------ From: Zhongpu Chen <chenlov...@gmail.com> Date: Fri,Feb 24,2023 7:47 PM To: user <user@flink.apache.org> Subject: Re: RE: Re: Re: Should we always mark ValueState as "transient" forRichFunctions Hi Shammon, Sorry for the inaccurate description of my last reply. Let me restate my question again: Fact 1: we know that ValueState here should not serialized/de-serialized, so it is a good practice to mark it with "transient". Fact 2: on the other hand, if we don't mark it with "transient", it will be initialized to null, and this null value will be serialized/de-serialized. I think it will occur some overhead if the number of partitions is very large. Given the two facts above, the program works well in both cases in terms of accuracy. And my question is: is there any performance benchmark in real (large) applications to compare two cases? Feel free to point out if I've misunderstood. On 2023/02/24 11:01:51 Shammon FY wrote: > Hi > > Sorry that I don't quite understand your question. I think the above > functions will only be deserialized when the job is submitted, do you want > to test the impact of this on submission throughput? > > Best, > Shammon > > > On Fri, Feb 24, 2023 at 3:04 PM Zhongpu Chen wrote: > > > Hi Gen, > > > > Thanks for your explanation. > > > > Back to this code snippet, since they are not marked with "transient" > > now, I suppose Flink will use avro to serialize them (null values). Is > > there any benchmark to show the performance test between null values > > serialization and "transient"? I mean, it is indeed not good to write > > them with "transient", but it works. So is there any performance lose here? > > > > > > On 2023/02/24 06:47:21 Gen Luo wrote: > > > Hi, > > > > > > ValueState is a handle rather than an actual value. So it should never > > be > > > serialized. In fact, ValueState itself is not a Serializable. It > > should be > > > ok to always mark it as transient. > > > > > > In this case, I suppose it works because the ValueState is not set > > (which > > > happens during the runtime) when the function is serialized (while > > > deploying). But it's not good. > > > > > > On Fri, Feb 24, 2023 at 10:29 AM Zhongpu Chen wrote: > > > > > > > Hi, > > > > > > > > When I am reading the code from flink-training-repo [1], I noticed the > > > > following code: > > > > > > > > ```java > > > > > > > > public static class EnrichmentFunction > > > > extends RichCoFlatMapFunction { > > > > > > > > private ValueState rideState; private > > ValueState fareState; > > > > ... > > > > } > > > > > > > > ``` > > > > > > > > From my understanding, since ValueState variables here are scoped > > to each > > > > instance, they should not be serialized for the performance sake. > > Thus, we > > > > should always mark them with "transient". Similar discussion can be > > found > > > > here [2]. > > > > > > > > Should we always mark ValueState as "transient", and why? Please > > help me > > > > to figure it out. > > > > > > > > [1] > > > > > > > > https://github.com/apache/flink-training/blob/master/rides-and-fares/src/solution/java/org/apache/flink/training/solutions/ridesandfares/RidesAndFaresSolution.java > > > > > > > > [2] > > > > > > > > https://stackoverflow.com/questions/72556202/flink-managed-state-as-transient > > > > > > > > > >