I think it'd be quite surprising if beam.Flatten would become equivalent to FlatMap if passed only a single pcollection. One use case that would be broken from that is cases where someone might be flattening a variable number of pcollections, including possibly only one pcollection. In that case, that single pcollection suddenly get FlatMapped.
On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev < dev@beam.apache.org> wrote: > One possible alternative is to define beam.Flatten for a single collection > to be functionally equivalent to beam.FlatMap(lambda x: x), but that would > be a larger change and such behavior might need to be consistent across > SDKs and documented. Adding a default value is a simpler change. > > I can also confirm that the usage > > | 'Flatten' >> beam.FlatMap(lambda x: x) > > is fairly common by inspecting uses of Beam internally. > On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> IIRC, Java has Flatten.iterables() and Flatten.collections(), the first >> of which does what you want. >> >> Giving FlatMap a default arg of lambda x: x is an interesting idea. The >> only downside I see is a less clear error if one forgets to provide this >> (now mandatory) parameter, but maybe that's low enough to be worth the >> convenience? >> >> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran <joey.t...@schrodinger.com> >> wrote: >> >>> That's not really the same thing, is it? `beam.Flatten` combines two or >>> more pcollections into a single pcollection while beam.FlatMap unpacks >>> iterables of elements (i.e. PCollection<Iterable<T>> -> PCollection<T>) >>> >>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev < >>> dev@beam.apache.org> wrote: >>> >>>> Hi, you can use beam.Flatten() instead. >>>> >>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran <joey.t...@schrodinger.com> >>>> wrote: >>>> >>>>> Hey all, >>>>> >>>>> Using an identity function for FlatMap comes up more often than using >>>>> FlatMap without an identity function. Would it make sense to use the >>>>> identity function as a default? >>>>> >>>>> >>>>> >>>>>