Re: [BEAM-9322] Python SDK discussion on correct output tag names

Luke Cwik Tue, 31 Mar 2020 16:13:39 -0700

It is important that composites know how things are named so that any
embedded payloads in the composite PTransform can reference the outputs
appropriately.


On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <[email protected]> wrote:

> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <[email protected]> wrote:
> >>>
> >>> * Don't allow arbitrary nestings returned during expansion, force
> composite transforms to always provide an unambiguous name (either a tuple
> with PCollections with unique tags or a dictionary with untagged
> PCollections or a singular PCollection (Java and Go SDKs do this)).
> >>
> >> I believe that aligning with Java and Go would be the right way to go
> here. I don't know if this would limit expressiveness.
> >
> > Yeah this sounds like a much more elegant way of handling this
> situation. I would lean towards this limiting expressiveness because there
> would be a limit to nesting, but I think that the trade-off with reducing
> complexity is worth it.
> >
> > So in summary it could be:
> > PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection],
> Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple]
> >
> > With the expectation that (pseudo-code):
> > a_transform = ATransform()
> > ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys()
> == a_transform.outputs.keys()
> >
> > Since this changes the Python SDK composite transform API, what would be
> the next steps for the community to come to a consensus on this?
>
> It seems here we're conflating the nesting of PValue results with the
> nesting of composite operations.
>
> Both examples in the original post have PTransform nesting (a
> composite) returning a flat tuple. This is completely orthogonal to
> the idea of a PTransform returning a nested result (such as (pc1,
> (pc2, pc3))) and forbidding the latter won't solve the former.
>
> Currently, with the exception of explicit names given for multi-output
> ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ...
> (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter
> the nesting. We could do better, e.g. for the example above, label
> them "0", "1.0", "1.1", or use the keys in the returned dict, but this
> is separate from the idea of trying to relate the output tags of
> composites to the output tags of their inner transforms.
>
> - Robert
>

Re: [BEAM-9322] Python SDK discussion on correct output tag names

Reply via email to