Hi Madan,
serialization happens at different positions with different mechanisms.
For records that are travelling in the stream the serializer that is
defined by the type information is used (print
env.addSource().getType()). However, whether records are serialized or
not depends on the so-called object reuse mode [0] and if repartitioning
is involved. By default, records are serialized between all operators
because Flink Functions could mutually modify objects if a user does not
pay special attention to it. If you know what you are doing, you can
enable object reuse. In this case, serialization happens only for
repartitioning, e.g. always after a keyBy().
For keeping state (like in sum()), a record must be serialized during a
checkpoint or for writing it to the RocksDB state backend.
Instances of Functions are serialized using Java serialization (to ship
member variables etc.).
I hope we can improve this part in the documentation in the near future.
Regards,
Timo
[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/execution_configuration.html
Am 12/18/17 um 1:15 PM schrieb madan:
Hi,
I am trying to understand serialization part when a environment is
executed. Taking a simple environment for ex.,
env.addSource(...).keyBy(...).sum(...).addSink(...).execute. I would
like to understand when and where serialization happens and what are
all serialized, operators,functions etc.,
Can anyone please give some information or point me at proper
documentation.
--
Thank you,
Madan.