Re: Adding a StepMetadataRegistry for Python SDK

Robert Bradshaw Thu, 29 Mar 2018 17:51:33 -0700

If I understand correctly, this is something runner-specific that would
live solely on the runner side (i.e. over the Fn API we'd still have a
single name for operations rather than pushing this complexity into that
protocol as well which I'd really like to avoid, right?) If that's the
case, then it's a bit unclear what we'd be doing on the Python side, as all
the non-SDK worker code is going to be thrown away in the new world and I'd
like to avoid investing too much more there.


On Wed, Mar 28, 2018 at 5:13 PM Pablo Estrada <pabl...@google.com> wrote:

> Hello all,
> I've filed https://issues.apache.org/jira/browse/BEAM-3955, to consider
> the possibility of adding some sort of facility to translate different
> names for the runners.
> This is currently a problem in Dataflow, where steps can have different
> names in the backend and in the SDK.
> This is observable in Beam code, where different parts of the
> SDK/worker/runners use different names in their metrics:
>
> - Logging uses Beam transform names (e.g. Foo/Bar)
> - Metrics uses operation_name (e.g. s2)
> - Statesampler uses operation_name.
> - The Dataflow worker sets step_name to operation_name after creating the
> operation.
>
> I'd like to propose the following design outline:
>
>    - Create an e*xecution context *that will allow runners to provide
>    their specific functionality*.*
>    - Execution context will be able to provide multiple runner-specific
>    functionality (e.g. side input fetchers).
>    - In this case, the execution contexts can have a StepNameRegistry, or
>    StepRegistry, or StepMetadataRegistry of some kind, where step names and
>    other metadata can be enrolled.
>    - Runners can pass their execution contexts to operations, logging,
>    and other modules.
>    - Beam core can then switch to use Beam step names, and each runner's
>    specific monitoring / metrics / etc classes can have their own logic for
>    accessing these.
>    - This would also allow us to remove the LoggingContext tracking, and
>    rely only on statesampler for context tracking.
>
> Eventually, all of this should be fully contained in the portability API
> and runners won't have to deal with these issues, but for now it seems like
> a good compromise.
>
> If this sounds good, I'll start working to implement that.
> Note that this is only a rough description, and I'm open to reconsider any
> and all aspects.
>
> Best
> -P.
> --
> Got feedback? go/pabloem-feedback
> <https://goto.google.com/pabloem-feedback>
>

Re: Adding a StepMetadataRegistry for Python SDK

Reply via email to