Hi Gen,
Thanks for your explanation.
Back to this code snippet, since they are not marked with "transient"
now, I suppose Flink will use avro to serialize them (null values). Is
there any benchmark to show the performance test between null values
serialization and "transient"? I mean, it is indeed not good to write
them with "transient", but it works. So is there any performance lose here?
On 2023/02/24 06:47:21 Gen Luo wrote:
> Hi,
>
> ValueState is a handle rather than an actual value. So it should never be
> serialized. In fact, ValueState itself is not a Serializable. It
should be
> ok to always mark it as transient.
>
> In this case, I suppose it works because the ValueState is not set (which
> happens during the runtime) when the function is serialized (while
> deploying). But it's not good.
>
> On Fri, Feb 24, 2023 at 10:29 AM Zhongpu Chen <ch...@gmail.com> wrote:
>
> > Hi,
> >
> > When I am reading the code from flink-training-repo [1], I noticed the
> > following code:
> >
> > ```java
> >
> > public static class EnrichmentFunction
> > extends RichCoFlatMapFunction<TaxiRide, TaxiFare, RideAndFare> {
> >
> > private ValueState<TaxiRide> rideState; private
ValueState<TaxiFare> fareState;
> > ...
> > }
> >
> > ```
> >
> > From my understanding, since ValueState variables here are scoped
to each
> > instance, they should not be serialized for the performance sake.
Thus, we
> > should always mark them with "transient". Similar discussion can be
found
> > here [2].
> >
> > Should we always mark ValueState as "transient", and why? Please
help me
> > to figure it out.
> >
> > [1]
> >
https://github.com/apache/flink-training/blob/master/rides-and-fares/src/solution/java/org/apache/flink/training/solutions/ridesandfares/RidesAndFaresSolution.java
> >
> > [2]
> >
https://stackoverflow.com/questions/72556202/flink-managed-state-as-transient
> >
>