It's fair. if we change the default value, we can perhaps add an error handling logic so that (pcoll) | beam.Flatten() fails with an error that recommends (pcoll) | beam.FlatMap(), instead of saying that input is not an iterable.
On Thu, Mar 21, 2024 at 3:41 PM Joey Tran <joey.t...@schrodinger.com> wrote: > +1 > > On Thu, Mar 21, 2024 at 6:30 PM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> I would be more comfortable with a default for FlatMap than overloading >> Flatten in this way. Distinguishing between >> >> (pcoll,) | beam.Flatten() >> >> and >> >> (pcoll) | beam.Flatten() >> >> seems a bit error prone. >> >> >> On Thu, Mar 21, 2024 at 2:23 PM Joey Tran <joey.t...@schrodinger.com> >> wrote: >> >>> Ah, I misunderstood your original suggestion then. That makes sense >>> then. I have already seen someone get a little confused about the names and >>> surprised that Flatten doesn't do what FlatMap does. >>> >>> On Thu, Mar 21, 2024 at 5:20 PM Valentyn Tymofieiev <valen...@google.com> >>> wrote: >>> >>>> Beam throws an error at submission time in Python if you pass a single >>>> PCollection to Flatten. The scenario you describe concerns a one-element >>>> list. >>>> >>>> On Thu, Mar 21, 2024, 13:43 Joey Tran <joey.t...@schrodinger.com> >>>> wrote: >>>> >>>>> I think it'd be quite surprising if beam.Flatten would become >>>>> equivalent to FlatMap if passed only a single pcollection. One use case >>>>> that would be broken from that is cases where someone might be flattening >>>>> a >>>>> variable number of pcollections, including possibly only one pcollection. >>>>> In that case, that single pcollection suddenly get FlatMapped. >>>>> >>>>> >>>>> >>>>> On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev < >>>>> dev@beam.apache.org> wrote: >>>>> >>>>>> One possible alternative is to define beam.Flatten for a single >>>>>> collection to be functionally equivalent to beam.FlatMap(lambda x: x), >>>>>> but >>>>>> that would be a larger change and such behavior might need to be >>>>>> consistent across SDKs and documented. Adding a default value is a >>>>>> simpler >>>>>> change. >>>>>> >>>>>> I can also confirm that the usage >>>>>> >>>>>> | 'Flatten' >> beam.FlatMap(lambda x: x) >>>>>> >>>>>> is fairly common by inspecting uses of Beam internally. >>>>>> On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev < >>>>>> dev@beam.apache.org> wrote: >>>>>> >>>>>>> IIRC, Java has Flatten.iterables() and Flatten.collections(), the >>>>>>> first of which does what you want. >>>>>>> >>>>>>> Giving FlatMap a default arg of lambda x: x is an interesting idea. >>>>>>> The only downside I see is a less clear error if one forgets to provide >>>>>>> this (now mandatory) parameter, but maybe that's low enough to be worth >>>>>>> the >>>>>>> convenience? >>>>>>> >>>>>>> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran < >>>>>>> joey.t...@schrodinger.com> wrote: >>>>>>> >>>>>>>> That's not really the same thing, is it? `beam.Flatten` combines >>>>>>>> two or more pcollections into a single pcollection while beam.FlatMap >>>>>>>> unpacks iterables of elements (i.e. PCollection<Iterable<T>> -> >>>>>>>> PCollection<T>) >>>>>>>> >>>>>>>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev < >>>>>>>> dev@beam.apache.org> wrote: >>>>>>>> >>>>>>>>> Hi, you can use beam.Flatten() instead. >>>>>>>>> >>>>>>>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran < >>>>>>>>> joey.t...@schrodinger.com> wrote: >>>>>>>>> >>>>>>>>>> Hey all, >>>>>>>>>> >>>>>>>>>> Using an identity function for FlatMap comes up more often than >>>>>>>>>> using FlatMap without an identity function. Would it make sense to >>>>>>>>>> use the >>>>>>>>>> identity function as a default? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>