If I understand correctly, this is something runner-specific that would live solely on the runner side (i.e. over the Fn API we'd still have a single name for operations rather than pushing this complexity into that protocol as well which I'd really like to avoid, right?) If that's the case, then it's a bit unclear what we'd be doing on the Python side, as all the non-SDK worker code is going to be thrown away in the new world and I'd like to avoid investing too much more there.
On Wed, Mar 28, 2018 at 5:13 PM Pablo Estrada <pabl...@google.com> wrote: > Hello all, > I've filed https://issues.apache.org/jira/browse/BEAM-3955, to consider > the possibility of adding some sort of facility to translate different > names for the runners. > This is currently a problem in Dataflow, where steps can have different > names in the backend and in the SDK. > This is observable in Beam code, where different parts of the > SDK/worker/runners use different names in their metrics: > > - Logging uses Beam transform names (e.g. Foo/Bar) > - Metrics uses operation_name (e.g. s2) > - Statesampler uses operation_name. > - The Dataflow worker sets step_name to operation_name after creating the > operation. > > I'd like to propose the following design outline: > > - Create an e*xecution context *that will allow runners to provide > their specific functionality*.* > - Execution context will be able to provide multiple runner-specific > functionality (e.g. side input fetchers). > - In this case, the execution contexts can have a StepNameRegistry, or > StepRegistry, or StepMetadataRegistry of some kind, where step names and > other metadata can be enrolled. > - Runners can pass their execution contexts to operations, logging, > and other modules. > - Beam core can then switch to use Beam step names, and each runner's > specific monitoring / metrics / etc classes can have their own logic for > accessing these. > - This would also allow us to remove the LoggingContext tracking, and > rely only on statesampler for context tracking. > > Eventually, all of this should be fully contained in the portability API > and runners won't have to deal with these issues, but for now it seems like > a good compromise. > > If this sounds good, I'll start working to implement that. > Note that this is only a rough description, and I'm open to reconsider any > and all aspects. > > Best > -P. > -- > Got feedback? go/pabloem-feedback > <https://goto.google.com/pabloem-feedback> >