Just want to bump this. In what direction should we go here? On Fri, Feb 6, 2026 at 5:49 PM Joey Tran <[email protected]> wrote:
> > > On Fri, Feb 6, 2026 at 5:43 PM Robert Bradshaw <[email protected]> wrote: > >> On Fri, Feb 6, 2026 at 2:36 PM Joey Tran <[email protected]> >> wrote: >> > >> > On Fri, Feb 6, 2026 at 4:43 PM Danny McCormick < >> [email protected]> wrote: >> >> >> >> On Fri, Feb 6, 2026 at 4:22 PM Joey Tran <[email protected]> >> wrote: >> >>> >> >>> FWIW, much of the value of this proposal to me is the better >> readability from not having to consider multiple versions of transforms and >> not having to break up chains to extract main outputs. I appreciate though >> that we'd be making a trade-off of readability of the "sad path" for >> readability of the "happy path" >> >> >> >> >> >> Yeah, that makes sense; what do you think of the other alternative >> mentioned as an option for optimizing for both kinds of readability? >> Specifically, allowing: >> >> >> >> pcoll | Partition(...)['main'] | ChainedParDo() >> >> >> >> I guess the downside there is education (all pipeline authors need to >> know this is an option as opposed to only one expert transform author), but >> I'm curious if it is sufficient for your context. >> > >> > Is the suggestion here to implement `__getitem__` on PTransform/ParDo >> so a particular pcollection can be specified? This would definitely be an >> improvement from the current state. I think one further improvement would >> be if we could specify the pcollection by attribute rather than by >> key/string, so `Partition(...).main` instead, but that risks pcollection >> name and ptransform method collisions. >> > >> > I'm still partial toward the other suggestions, particularly towards >> implementing `PTransform.with_outputs`, but this is probably sufficient for >> my context. >> >> I'll admit that I'm actually not a fan of with_outputs(...). It's not >> very dry--I'd rather the consumer decide what it wants to consume by >> consuming it than have to also (redundantly) specify it on the >> producer. I think it dates back to trying to copy java where the >> return type needs to be a typed PValue. Were I to do it again, I would >> have such transforms return a dict or named tuple (if all outputs are >> meaningful) or an "augmented" PCollection (as has been proposed here) >> when they are auxiliary (and preferably leave the decision up to the >> DoFn implementor, not the caller). >> >> - Robert >> > > Ha, yeah I also don't find it the most intuitively named / parametrized. I > usually need to look at it's documentation each time I need to use it. > Standardization is nice though. >
