If I understand the question correctly, you don't have to specify those
names.

As Reuven pointed out, it is probably a good idea so you have a stable /
deterministic graph.
But in the Python SDK, you can simply use pcollection | map_fn,
instead of pcollection
| 'Map' >> map_fn.

See an example here
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/group_with_coder.py#L100-L116


On Sun, Oct 1, 2023 at 9:08 PM Joey Tran <joey.t...@schrodinger.com> wrote:

> Hmm, I'm not sure what you mean by "updating pipelines in place". Can you
> elaborate?
>
> I forgot to mention my question is posed from the context of a python SDK
> user, and afaict, there doesn't seem to be an obvious way to autogenerate
> names/labels. Hearing that the java SDK supports it makes me wonder if the
> python SDK could support it as well though... (If so, I'd be happy to do
> implement it). Currently, it's fairly tedious to have to name every
> instance of a transform that you might reuse in a pipeline, e.g. when
> reapplying the same Map on different pcollections.
>
> On Sun, Oct 1, 2023 at 8:12 PM Reuven Lax via user <user@beam.apache.org>
> wrote:
>
>> Are you talking about transform names? The main reason was because for
>> runners that support updating pipelines in place, the only way to do so
>> safely is if the runner can perfectly identify which transforms in the new
>> graph match the ones in the old graph. There's no good way to auto generate
>> names that will stay stable across updates - even small changes to the
>> pipeline might change the order of nodes in the graph, which could result
>> in a corrupted update.
>>
>> However, if you don't care about update, Beam can auto generate these
>> names for you! When you call PCollection.apply (if using BeamJava), simply
>> omit the name parameter and Beam will auto generate a unique name for you.
>>
>> Reuven
>>
>> On Sat, Sep 30, 2023 at 11:54 AM Joey Tran <joey.t...@schrodinger.com>
>> wrote:
>>
>>> After writing a few pipelines now, I keep getting tripped up from
>>> accidentally have duplicate labels from using multiple of the same
>>> transforms without labeling them. I figure this must be a common complaint,
>>> so I was just curious, what the rationale behind this design was? My naive
>>> thought off the top of my head is that it'd be more user friendly to just
>>> auto increment duplicate transforms, but I figure I must be missing
>>> something
>>>
>>> Cheers,
>>> Joey
>>>
>>

Reply via email to