In Apache Beam, Flatten is a union operation - it takes multiple PCollections (of the same type) and merges them into a single PCollection.
On Mon, Mar 13, 2023 at 11:32 AM Godefroy Clair <godefroy.cl...@gmail.com> wrote: > Hi, > I am wondering about the way `Flatten()` and `FlatMap()` are implemented > in Apache Beam Python. > In most functional languages, FlatMap() is the same as composing > `Flatten()` and `Map()` as indicated by the name, so Flatten() and > Flatmap() have the same input. > But in Apache Beam, Flatten() is using _iterable of PCollections_ while > FlatMap() is working with _PCollection of Iterables_. > > If I am not wrong, the signature of Flatten, Map and FlatMap are : > ``` > Flatten:: Iterable[PCollections[A]] -> PCollection[A] > Map:: (PCollection[A], (A-> B)) -> PCollection[B] > FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A] > ``` > > So my question is is there another "Flatten-like" function with this > signature : > ``` > anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A] > ``` > > One of the reason this would be useful, is that when you just want to > "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an > identity function. > > So instead of writing: > `FlatMap(lambda e: e)` > I would like to use a function > `anotherFlatten()` > > Thanks, > Godefroy >