It's fair. if we change the default value, we can perhaps add an error
handling logic so that (pcoll) | beam.Flatten() fails with an error that
recommends (pcoll) | beam.FlatMap(), instead of saying that input is not
an iterable.
On Thu, Mar 21, 2024 at 3:41 PM Joey Tran <joey.t...@schrodinger.com> wrote:

> +1
>
> On Thu, Mar 21, 2024 at 6:30 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> I would be more comfortable with a default for FlatMap than overloading
>> Flatten in this way. Distinguishing between
>>
>>     (pcoll,) | beam.Flatten()
>>
>> and
>>
>>     (pcoll) | beam.Flatten()
>>
>> seems a bit error prone.
>>
>>
>> On Thu, Mar 21, 2024 at 2:23 PM Joey Tran <joey.t...@schrodinger.com>
>> wrote:
>>
>>> Ah, I misunderstood your original suggestion then. That makes sense
>>> then. I have already seen someone get a little confused about the names and
>>> surprised that Flatten doesn't do what FlatMap does.
>>>
>>> On Thu, Mar 21, 2024 at 5:20 PM Valentyn Tymofieiev <valen...@google.com>
>>> wrote:
>>>
>>>> Beam throws an error at submission time in Python if you pass a single
>>>> PCollection  to Flatten. The scenario you describe concerns a one-element
>>>> list.
>>>>
>>>> On Thu, Mar 21, 2024, 13:43 Joey Tran <joey.t...@schrodinger.com>
>>>> wrote:
>>>>
>>>>> I think it'd be quite surprising if beam.Flatten would become
>>>>> equivalent to FlatMap if passed only a single pcollection. One use case
>>>>> that would be broken from that is cases where someone might be flattening 
>>>>> a
>>>>> variable number of pcollections, including possibly only one pcollection.
>>>>> In that case, that single pcollection suddenly get FlatMapped.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> One possible alternative is to define beam.Flatten for a single
>>>>>> collection to be functionally equivalent to beam.FlatMap(lambda x: x), 
>>>>>> but
>>>>>> that would be a larger change and such behavior might need to be
>>>>>> consistent across SDKs and documented. Adding a default value is a 
>>>>>> simpler
>>>>>> change.
>>>>>>
>>>>>> I can also confirm that the usage
>>>>>>
>>>>>>     |  'Flatten' >> beam.FlatMap(lambda x: x)
>>>>>>
>>>>>> is fairly common by inspecting uses of Beam internally.
>>>>>> On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>>
>>>>>>> IIRC, Java has Flatten.iterables() and Flatten.collections(), the
>>>>>>> first of which does what you want.
>>>>>>>
>>>>>>> Giving FlatMap a default arg of lambda x: x is an interesting idea.
>>>>>>> The only downside I see is a less clear error if one forgets to provide
>>>>>>> this (now mandatory) parameter, but maybe that's low enough to be worth 
>>>>>>> the
>>>>>>> convenience?
>>>>>>>
>>>>>>> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran <
>>>>>>> joey.t...@schrodinger.com> wrote:
>>>>>>>
>>>>>>>> That's not really the same thing, is it? `beam.Flatten` combines
>>>>>>>> two or more pcollections into a single pcollection while beam.FlatMap
>>>>>>>> unpacks iterables of elements (i.e. PCollection<Iterable<T>> ->
>>>>>>>> PCollection<T>)
>>>>>>>>
>>>>>>>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev <
>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi, you can use beam.Flatten() instead.
>>>>>>>>>
>>>>>>>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran <
>>>>>>>>> joey.t...@schrodinger.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey all,
>>>>>>>>>>
>>>>>>>>>> Using an identity function for FlatMap comes up more often than
>>>>>>>>>> using FlatMap without an identity function. Would it make sense to 
>>>>>>>>>> use the
>>>>>>>>>> identity function as a default?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>

Reply via email to