Re: [DISCUSS] Adding a dispose() method in the RichFunction.

Fabian Hueske Thu, 10 Nov 2016 02:48:04 -0800

RichFunctions are used in the DataStream and DataSet APIs.
How would that change affect the DataSet API?


Best, Fabian


2016-11-10 11:37 GMT+01:00 Kostas Kloudas <[email protected]>:

> Hello,
>
> I would like to propose the addition of a dispose() method, in addition to
> the
> already existing close(), in the RichFunction interface. This will align
> the lifecycle
> of a RichFunction, with that of an Operator. After this, the code paths
> followed
> when finishing successfully and when cancelling, will be totally distinct.
>
> Semantically, close() will be responsible for guaranteeing semantic
> correctness
> of the job when the job terminates successfully, while dispose() will be
> responsible
> for taking care of system clean up both when terminating gracefully and
> when
> cancelling, e.g. freeing resources like db connections.
>
> Currently, most functions use close() with the semantics of dispose() as
> the only
> thing they do is freeing up resources. A nice example where this leads to
> confusion
> is the case of the BucketingSink/RollingSink where at close(), data that
> is not
> committed is marked as "pending" (semantic correctness of the job). In
> this case,
> and given that there is no distinction between close() and dispose(), this
> method is
> called by the AbstractUdfStreamOperator both when successfully finishing a
> job and
> when something went wrong during execution. This is essentially a
> compromise,
> as the close() should mark this data as "committed" when successfully
> terminating,
> but it cannot as it can also be called when an exception was thrown.
>
> Let me know what you think,
> Kostas

Re: [DISCUSS] Adding a dispose() method in the RichFunction.

Reply via email to