Anything that is reachable by the DoFn/CombineFn/*Fn needs to be
serializable. [1] is saying that it is common to have an anonymous inner
class for a DoFn which because of its serialization capture will get the
encompassing class which is typically a PTransform. If you are careful
about reachability, you can decide to not mark lots of things as
serializable and this is good because it decreases the size of the
serialized *Fn blob as well.

The [2] javadoc could be clarified that PTransform class supports
serialization but is only serialized when part of the serialization capture
of a DoFn/CombineFn/*Fn and otherwise will never be serialized.

On Mon, Jan 6, 2020 at 10:19 AM Alexey Romanenko <[email protected]>
wrote:

> Hello all,
>
> I found myself that I’m a bit confused with Serialization requirements for
> Beam transforms and I want to precise something.
>
> Here [1] it's clearly mentioned that “*DoFn, PTransform, CombineFn and
> other instances will be serialized*”. Since the most of Beam IO
> Read/Write transforms is based on PTransform, then it means that all
> internal members of them should be serializable too or declared as
> transient/static.
>
> In the same time, Javadoc of PTransform says [2] that “*PTransform doesn't
> actually support serialization, despite implementing
> Serializable. PTransform is marked Serializable solely because it is common
> for an anonymous **DoFn, instance to be created within an apply() method
> of a composite **PTransform*”. And, on the other hand, “*DoFn passed to a
> ParDo transform must be Serializable*” [3] So, DoFn must be really
> serializable, PTransform is not necessary.
>
> So, does it mean that the members (that are mostly AutoValue generated) of
> Read/Write PTransforms are free to be serializable or not if they don’t use
> anonymous DoFn's? For example, they are needed only for configuration on
> driver. However, if these members are used in DoFn or in other user defined
> objects further, when they will be involved on workers, then they must be
> serializable in any way.  Is it correct assumption?
>

Yes


>
> [1]
> https://beam.apache.org/contribute/ptransform-style-guide/#serialization
> [2]
> https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116
> [3]
> https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282
>
>

Reply via email to