You don't have to specify the names if the callable you pass in is /different/ for two `beam.Map`s, but if the callable is the same you must specify a label. For example, the below will raise an exception:
``` | beam.Filter(identity_filter) | beam.Filter(identity_filter) ``` Here's an example on playground that shows the error message you get [1]. I marked every line I added with a "# ++". It's a contrived example, but using a map or filter at the same pipeline level probably comes up often, at least in my inexperience. For example, you. might have a pipeline that partitions a pcoll into three different pcolls, runs some transforms on them, and then runs the same type of filter on each of them. The case that happens most often for me is using the `assert_that` [2] testing transform. In this case, I think often users will really have no need for a disambiguating label as they're often just writing unit tests that test a few different properties of their workflow. [1] https://play.beam.apache.org/?sdk=python&shared=hIrm7jvCamW [2] https://beam.apache.org/releases/pydoc/2.29.0/apache_beam.testing.util.html#apache_beam.testing.util.assert_that On Mon, Oct 2, 2023 at 9:08 AM Bruno Volpato via user <user@beam.apache.org> wrote: > If I understand the question correctly, you don't have to specify those > names. > > As Reuven pointed out, it is probably a good idea so you have a stable / > deterministic graph. > But in the Python SDK, you can simply use pcollection | map_fn, instead > of pcollection | 'Map' >> map_fn. > > See an example here > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/group_with_coder.py#L100-L116 > > > On Sun, Oct 1, 2023 at 9:08 PM Joey Tran <joey.t...@schrodinger.com> > wrote: > >> Hmm, I'm not sure what you mean by "updating pipelines in place". Can you >> elaborate? >> >> I forgot to mention my question is posed from the context of a python SDK >> user, and afaict, there doesn't seem to be an obvious way to autogenerate >> names/labels. Hearing that the java SDK supports it makes me wonder if the >> python SDK could support it as well though... (If so, I'd be happy to do >> implement it). Currently, it's fairly tedious to have to name every >> instance of a transform that you might reuse in a pipeline, e.g. when >> reapplying the same Map on different pcollections. >> >> On Sun, Oct 1, 2023 at 8:12 PM Reuven Lax via user <user@beam.apache.org> >> wrote: >> >>> Are you talking about transform names? The main reason was because for >>> runners that support updating pipelines in place, the only way to do so >>> safely is if the runner can perfectly identify which transforms in the new >>> graph match the ones in the old graph. There's no good way to auto generate >>> names that will stay stable across updates - even small changes to the >>> pipeline might change the order of nodes in the graph, which could result >>> in a corrupted update. >>> >>> However, if you don't care about update, Beam can auto generate these >>> names for you! When you call PCollection.apply (if using BeamJava), simply >>> omit the name parameter and Beam will auto generate a unique name for you. >>> >>> Reuven >>> >>> On Sat, Sep 30, 2023 at 11:54 AM Joey Tran <joey.t...@schrodinger.com> >>> wrote: >>> >>>> After writing a few pipelines now, I keep getting tripped up from >>>> accidentally have duplicate labels from using multiple of the same >>>> transforms without labeling them. I figure this must be a common complaint, >>>> so I was just curious, what the rationale behind this design was? My naive >>>> thought off the top of my head is that it'd be more user friendly to just >>>> auto increment duplicate transforms, but I figure I must be missing >>>> something >>>> >>>> Cheers, >>>> Joey >>>> >>>