It is important that composites know how things are named so that any embedded payloads in the composite PTransform can reference the outputs appropriately.
On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <[email protected]> wrote: > On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <[email protected]> wrote: > >>> > >>> * Don't allow arbitrary nestings returned during expansion, force > composite transforms to always provide an unambiguous name (either a tuple > with PCollections with unique tags or a dictionary with untagged > PCollections or a singular PCollection (Java and Go SDKs do this)). > >> > >> I believe that aligning with Java and Go would be the right way to go > here. I don't know if this would limit expressiveness. > > > > Yeah this sounds like a much more elegant way of handling this > situation. I would lean towards this limiting expressiveness because there > would be a limit to nesting, but I think that the trade-off with reducing > complexity is worth it. > > > > So in summary it could be: > > PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection], > Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple] > > > > With the expectation that (pseudo-code): > > a_transform = ATransform() > > ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys() > == a_transform.outputs.keys() > > > > Since this changes the Python SDK composite transform API, what would be > the next steps for the community to come to a consensus on this? > > It seems here we're conflating the nesting of PValue results with the > nesting of composite operations. > > Both examples in the original post have PTransform nesting (a > composite) returning a flat tuple. This is completely orthogonal to > the idea of a PTransform returning a nested result (such as (pc1, > (pc2, pc3))) and forbidding the latter won't solve the former. > > Currently, with the exception of explicit names given for multi-output > ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ... > (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter > the nesting. We could do better, e.g. for the example above, label > them "0", "1.0", "1.1", or use the keys in the returned dict, but this > is separate from the idea of trying to relate the output tags of > composites to the output tags of their inner transforms. > > - Robert >
