Thank you for clarification, Luke. > On 6 Jan 2020, at 20:03, Luke Cwik <[email protected]> wrote: > > Anything that is reachable by the DoFn/CombineFn/*Fn needs to be > serializable. [1] is saying that it is common to have an anonymous inner > class for a DoFn which because of its serialization capture will get the > encompassing class which is typically a PTransform. If you are careful about > reachability, you can decide to not mark lots of things as serializable and > this is good because it decreases the size of the serialized *Fn blob as well. > > The [2] javadoc could be clarified that PTransform class supports > serialization but is only serialized when part of the serialization capture > of a DoFn/CombineFn/*Fn and otherwise will never be serialized.
Yes, I think it would be more clear in this sense. > > On Mon, Jan 6, 2020 at 10:19 AM Alexey Romanenko <[email protected] > <mailto:[email protected]>> wrote: > Hello all, > > I found myself that I’m a bit confused with Serialization requirements for > Beam transforms and I want to precise something. > > Here [1] it's clearly mentioned that “DoFn, PTransform, CombineFn and other > instances will be serialized”. Since the most of Beam IO Read/Write > transforms is based on PTransform, then it means that all internal members of > them should be serializable too or declared as transient/static. > > In the same time, Javadoc of PTransform says [2] that “PTransform doesn't > actually support serialization, despite implementing Serializable. PTransform > is marked Serializable solely because it is common for an anonymous DoFn, > instance to be created within an apply() method of a composite PTransform”. > And, on the other hand, “DoFn passed to a ParDo transform must be > Serializable” [3] So, DoFn must be really serializable, PTransform is not > necessary. > > So, does it mean that the members (that are mostly AutoValue generated) of > Read/Write PTransforms are free to be serializable or not if they don’t use > anonymous DoFn's? For example, they are needed only for configuration on > driver. However, if these members are used in DoFn or in other user defined > objects further, when they will be involved on workers, then they must be > serializable in any way. Is it correct assumption? > > Yes > > > [1] https://beam.apache.org/contribute/ptransform-style-guide/#serialization > <https://beam.apache.org/contribute/ptransform-style-guide/#serialization> > [2] > https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116 > > <https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116> > [3] > https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282 > > <https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282>
