It is important that composites know how things are named so that any
embedded payloads in the composite PTransform can reference the outputs
appropriately.

On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <[email protected]> wrote:

> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <[email protected]> wrote:
> >>>
> >>> * Don't allow arbitrary nestings returned during expansion, force
> composite transforms to always provide an unambiguous name (either a tuple
> with PCollections with unique tags or a dictionary with untagged
> PCollections or a singular PCollection (Java and Go SDKs do this)).
> >>
> >> I believe that aligning with Java and Go would be the right way to go
> here. I don't know if this would limit expressiveness.
> >
> > Yeah this sounds like a much more elegant way of handling this
> situation. I would lean towards this limiting expressiveness because there
> would be a limit to nesting, but I think that the trade-off with reducing
> complexity is worth it.
> >
> > So in summary it could be:
> > PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection],
> Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple]
> >
> > With the expectation that (pseudo-code):
> > a_transform = ATransform()
> > ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys()
> == a_transform.outputs.keys()
> >
> > Since this changes the Python SDK composite transform API, what would be
> the next steps for the community to come to a consensus on this?
>
> It seems here we're conflating the nesting of PValue results with the
> nesting of composite operations.
>
> Both examples in the original post have PTransform nesting (a
> composite) returning a flat tuple. This is completely orthogonal to
> the idea of a PTransform returning a nested result (such as (pc1,
> (pc2, pc3))) and forbidding the latter won't solve the former.
>
> Currently, with the exception of explicit names given for multi-output
> ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ...
> (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter
> the nesting. We could do better, e.g. for the example above, label
> them "0", "1.0", "1.1", or use the keys in the returned dict, but this
> is separate from the idea of trying to relate the output tags of
> composites to the output tags of their inner transforms.
>
> - Robert
>

Reply via email to