RichFunctions are used in the DataStream and DataSet APIs. How would that change affect the DataSet API?
Best, Fabian 2016-11-10 11:37 GMT+01:00 Kostas Kloudas <[email protected]>: > Hello, > > I would like to propose the addition of a dispose() method, in addition to > the > already existing close(), in the RichFunction interface. This will align > the lifecycle > of a RichFunction, with that of an Operator. After this, the code paths > followed > when finishing successfully and when cancelling, will be totally distinct. > > Semantically, close() will be responsible for guaranteeing semantic > correctness > of the job when the job terminates successfully, while dispose() will be > responsible > for taking care of system clean up both when terminating gracefully and > when > cancelling, e.g. freeing resources like db connections. > > Currently, most functions use close() with the semantics of dispose() as > the only > thing they do is freeing up resources. A nice example where this leads to > confusion > is the case of the BucketingSink/RollingSink where at close(), data that > is not > committed is marked as "pending" (semantic correctness of the job). In > this case, > and given that there is no distinction between close() and dispose(), this > method is > called by the AbstractUdfStreamOperator both when successfully finishing a > job and > when something went wrong during execution. This is essentially a > compromise, > as the close() should mark this data as "committed" when successfully > terminating, > but it cannot as it can also be called when an exception was thrown. > > Let me know what you think, > Kostas
