I think it'd be quite surprising if beam.Flatten would become equivalent to
FlatMap if passed only a single pcollection. One use case that would be
broken from that is cases where someone might be flattening a variable
number of pcollections, including possibly only one pcollection. In that
case, that single pcollection suddenly get FlatMapped.


On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev <
dev@beam.apache.org> wrote:

> One possible alternative is to define beam.Flatten for a single collection
> to be functionally equivalent to beam.FlatMap(lambda x: x), but that would
> be a larger change and such behavior might need to be consistent across
> SDKs and documented. Adding a default value is a simpler change.
>
> I can also confirm that the usage
>
>     |  'Flatten' >> beam.FlatMap(lambda x: x)
>
> is fairly common by inspecting uses of Beam internally.
> On Thu, Mar 21, 2024 at 1:30 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> IIRC, Java has Flatten.iterables() and Flatten.collections(), the first
>> of which does what you want.
>>
>> Giving FlatMap a default arg of lambda x: x is an interesting idea. The
>> only downside I see is a less clear error if one forgets to provide this
>> (now mandatory) parameter, but maybe that's low enough to be worth the
>> convenience?
>>
>> On Thu, Mar 21, 2024 at 12:02 PM Joey Tran <joey.t...@schrodinger.com>
>> wrote:
>>
>>> That's not really the same thing, is it? `beam.Flatten` combines two or
>>> more pcollections into a single pcollection while beam.FlatMap unpacks
>>> iterables of elements (i.e. PCollection<Iterable<T>> -> PCollection<T>)
>>>
>>> On Thu, Mar 21, 2024 at 2:57 PM Valentyn Tymofieiev via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Hi, you can use beam.Flatten() instead.
>>>>
>>>> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran <joey.t...@schrodinger.com>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> Using an identity function for FlatMap comes up more often than using
>>>>> FlatMap without an identity function. Would it make sense to use the
>>>>> identity function as a default?
>>>>>
>>>>>
>>>>>
>>>>>

Reply via email to