On Fri, Feb 6, 2026 at 5:43 PM Robert Bradshaw <[email protected]> wrote:
> On Fri, Feb 6, 2026 at 2:36 PM Joey Tran <[email protected]> > wrote: > > > > On Fri, Feb 6, 2026 at 4:43 PM Danny McCormick < > [email protected]> wrote: > >> > >> On Fri, Feb 6, 2026 at 4:22 PM Joey Tran <[email protected]> > wrote: > >>> > >>> FWIW, much of the value of this proposal to me is the better > readability from not having to consider multiple versions of transforms and > not having to break up chains to extract main outputs. I appreciate though > that we'd be making a trade-off of readability of the "sad path" for > readability of the "happy path" > >> > >> > >> Yeah, that makes sense; what do you think of the other alternative > mentioned as an option for optimizing for both kinds of readability? > Specifically, allowing: > >> > >> pcoll | Partition(...)['main'] | ChainedParDo() > >> > >> I guess the downside there is education (all pipeline authors need to > know this is an option as opposed to only one expert transform author), but > I'm curious if it is sufficient for your context. > > > > Is the suggestion here to implement `__getitem__` on PTransform/ParDo so > a particular pcollection can be specified? This would definitely be an > improvement from the current state. I think one further improvement would > be if we could specify the pcollection by attribute rather than by > key/string, so `Partition(...).main` instead, but that risks pcollection > name and ptransform method collisions. > > > > I'm still partial toward the other suggestions, particularly towards > implementing `PTransform.with_outputs`, but this is probably sufficient for > my context. > > I'll admit that I'm actually not a fan of with_outputs(...). It's not > very dry--I'd rather the consumer decide what it wants to consume by > consuming it than have to also (redundantly) specify it on the > producer. I think it dates back to trying to copy java where the > return type needs to be a typed PValue. Were I to do it again, I would > have such transforms return a dict or named tuple (if all outputs are > meaningful) or an "augmented" PCollection (as has been proposed here) > when they are auxiliary (and preferably leave the decision up to the > DoFn implementor, not the caller). > > - Robert > Ha, yeah I also don't find it the most intuitively named / parametrized. I usually need to look at it's documentation each time I need to use it. Standardization is nice though.
