On Mon, Mar 13, 2023 at 11:33 AM Godefroy Clair <godefroy.cl...@gmail.com> wrote:
> Hi, > I am wondering about the way `Flatten()` and `FlatMap()` are implemented > in Apache Beam Python. > In most functional languages, FlatMap() is the same as composing > `Flatten()` and `Map()` as indicated by the name, so Flatten() and > Flatmap() have the same input. > But in Apache Beam, Flatten() is using _iterable of PCollections_ while > FlatMap() is working with _PCollection of Iterables_. > > If I am not wrong, the signature of Flatten, Map and FlatMap are : > ``` > Flatten:: Iterable[PCollections[A]] -> PCollection[A] > Map:: (PCollection[A], (A-> B)) -> PCollection[B] > FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A] > FlatMap is actually (PCollection[A], (A->Iterable[B])) -> PCollection[B]. > ``` > > So my question is is there another "Flatten-like" function with this > signature : > ``` > anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A] > ``` > > One of the reason this would be useful, is that when you just want to > "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an > identity function. > > So instead of writing: > `FlatMap(lambda e: e)` > I would like to use a function > `anotherFlatten()` > As Reuven mentions, Beam's Flatten could have been called Union, in which case we'd free up the name Flatten for the PCollection[Iterable[A]] -> PCollection[A] operation. It's Flatten for historical reasons, and would be difficult to change now. FlumeJava uses static constructors to provide Flatten.Iterables: PCollection[Iterable[A]] -> PCollection[A] vs. Flatten.PCollections: Iterable[PCollection[A]] -> PCollection[A]. If you want a FlattenIterables in Python, you could easily implement it as a composite transform [2] whose implementation is passing the identity function to FlatMap. [1] https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Flatten.html [2] https://beam.apache.org/documentation/programming-guide/#composite-transforms