Re: [BEAM-9322] Python SDK discussion on correct output tag names

Robert Bradshaw Tue, 31 Mar 2020 17:15:39 -0700

On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik <[email protected]> wrote:
>
> It is important that composites know how things are named so that any 
> embedded payloads in the composite PTransform can reference the outputs 
> appropriately.


Very good point. This is part of the cleanup to treat inputs and
outputs of PCollections as maps rather than lists generally across the
Python representations (which also relates to some of the ugliness
that Cham has been running into with cross-language).

> On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw <[email protected]> wrote:
>>
>> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde <[email protected]> wrote:
>> >>>
>> >>> * Don't allow arbitrary nestings returned during expansion, force 
>> >>> composite transforms to always provide an unambiguous name (either a 
>> >>> tuple with PCollections with unique tags or a dictionary with untagged 
>> >>> PCollections or a singular PCollection (Java and Go SDKs do this)).
>> >>
>> >> I believe that aligning with Java and Go would be the right way to go 
>> >> here. I don't know if this would limit expressiveness.
>> >
>> > Yeah this sounds like a much more elegant way of handling this situation. 
>> > I would lean towards this limiting expressiveness because there would be a 
>> > limit to nesting, but I think that the trade-off with reducing complexity 
>> > is worth it.
>> >
>> > So in summary it could be:
>> > PTransform.expand: (...) -> Union[PValue, NamedTuple[str, PCollection], 
>> > Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple]
>> >
>> > With the expectation that (pseudo-code):
>> > a_transform = ATransform()
>> > ATransform.from_runner_api(a_transform.to_runner_api()).outputs.keys() == 
>> > a_transform.outputs.keys()
>> >
>> > Since this changes the Python SDK composite transform API, what would be 
>> > the next steps for the community to come to a consensus on this?
>>
>> It seems here we're conflating the nesting of PValue results with the
>> nesting of composite operations.
>>
>> Both examples in the original post have PTransform nesting (a
>> composite) returning a flat tuple. This is completely orthogonal to
>> the idea of a PTransform returning a nested result (such as (pc1,
>> (pc2, pc3))) and forbidding the latter won't solve the former.
>>
>> Currently, with the exception of explicit names given for multi-output
>> ParDos, we simply label the outputs sequentially with 0, 1, 2, 3, ...
>> (Actually, for historical reasons, it's None, 1, 2, 3, ...), no matter
>> the nesting. We could do better, e.g. for the example above, label
>> them "0", "1.0", "1.1", or use the keys in the returned dict, but this
>> is separate from the idea of trying to relate the output tags of
>> composites to the output tags of their inner transforms.
>>
>> - Robert

Re: [BEAM-9322] Python SDK discussion on correct output tag names

Reply via email to