Hi All,

Thanks for the comments so far. Seems like we generally agree on this
proposal.

Please see https://github.com/apache/beam/pull/22802 for a prototype
implementation that adds the following.

* Support for dynamically discovering and registering SchemaTransforms in
the Java expansion service.
* Support for dynamically discovering registered SchemaTransforms from the
Python side.
* Support for using SchemaTransforms in Python pipelines.

Feel free to add more comments to the doc and/or the PR.

Thanks,
Cham







On Mon, Aug 8, 2022 at 9:34 PM Chamikara Jayalath <chamik...@google.com>
wrote:

> I think the *DiscoverSchemaTransform()* RPC introduced in this proposal
> and the ability to easily deploy/use available *SchemaTransforms* using
> an expansion service essentially provide the tooling necessary for
> implementing such a service. Such a service could even startup expansion
> services to discover/list transforms available in given artifacts (for
> example, jar files).
>
> Thanks,
> Cham
>
> On Mon, Aug 8, 2022 at 3:48 PM Byron Ellis <byronel...@google.com> wrote:
>
>> I like that idea, sort of like Kafka’s Schema Service but for transforms?
>>
>> On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> This is a great idea. I would like to approach this from the
>>> perspective of making it easy to provide a catalog of well-defined
>>> transforms for use in expansion services from typical SDKs and also
>>> elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally
>>> everything about what a transform is (its config, documentation,
>>> expectations on inputs, etc.) can be specified programmatically in a
>>> way that's much easier to both author and consume than it is now.
>>>
>>> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev
>>> <dev@beam.apache.org> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > I believe we can make the multi-language pipelines offering [1] much
>>> easier to use by updating the expansion service to be fully aware of
>>> SchemaTransforms. Additionally this will make it easy to
>>> register/discover/use transforms defined in one SDK from all other SDKs.
>>> Specifically we could add the following features.
>>> >
>>> > Expansion service can be used to easily initialize and expand
>>> transforms without need for additional code.
>>> > Expansion service can be used to easily discover already registered
>>> transforms.
>>> > Pipeline SDKs can generate user-friendly stub-APIs based on transforms
>>> registered with an expansion service, eliminating the need to develop
>>> language-specific wrappers.
>>> >
>>> > Please see here for my proposal:
>>> https://s.apache.org/easy-multi-language
>>> >
>>> > Lemme know if you have any comments/questions/suggestions :)
>>> >
>>> > Thanks,
>>> > Cham
>>> >
>>> > [1]
>>> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines
>>> >
>>>
>>

Reply via email to