Hi,
Is it possible to create
class ErrorSieve(PTransform):
def __init__ (dofn):
def expand():
return ParDo(modifiedDoFn)
that way your pipeline just looks like p | ErrorSieve(DoFn()) and you don't
expose the ParDo to the user.
Will this work for your usecase?
-Sourabh
On Fri, Jul 21, 2017 at 2:06 PM Dmitry Demeshchuk <[email protected]>
wrote:
> Hi list,
>
> I'm trying to make a transformation function (let's call it ErrorSieve)
> that would take a ParDo object as input and modify its underlying DoFn
> object, basically adding extra logic on top of an underlying process()
> method.
>
> Ideally for me, the example usage would be:
>
> ```python
> p | ErrorSieve(beam.ParDo(MyDoFn())
>
> or
>
> p | ErrorSieve(beam.FlatMap(lambda x: x + 1))
> ```
>
> However, this would require me to butcher the internals of ParDo
> mechanisms, especially since ParDo's make_fn() method gets called during
> its transformation. My other thinking was to make it a fair and square DoFn:
>
> ```python
> p | beam.ParDo(ErrorSieve(MyDoFn())
> ```
>
> The only problem with this is that I can't use it with transforms like
> FlatMap, which is a bit unfortunate.
>
> Do you think it's worth investigating how to implement the first approach,
> or should I just instead settle with the second approach, using only custom
> DoFns?
>
> Thank you.
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>