UDFs exist intentionally across iterations, it is a feature, to allow you to keep state. To Figure out when an iteration starts and ends, you can use a RichFunctions, which get calls to open() and close() for each iteration.
On Wed, Feb 11, 2015 at 10:40 AM, Kruse, Sebastian <sebastian.kr...@hpi.de> wrote: > Thanks for your answers. > > > > I am trying to build an apriori-like algorithm to find key candidates in a > relational dataset. I was considering delta iterations, because the > algorithm should maintain two datasets: a set of column combinations to be > checked (as delta set) and a set of tuples which are still relevant to the > next iteration (as work set). So, the general proceeding is adapted from > the popular frequent item set algorithm. > > > > I am now also thinking that delta iterations are not the right thing for > me, also because of other problems (only join and coGroup to be used on the > solution set and “Error: Iterative task without a single iterative input.” > whose cause is not obvious to me). > > > > @Alexander: Using an output within a bulk iteration leaves me with the > following exception: > > Exception in thread "main" > org.apache.flink.api.common.InvalidProgramException: A data set that is > part of an iteration was used as a sink or action. Did you forget to close > the iteration? > > Do you have any experience/proposals how to incorporate your idea > nevertheless? > > > > @Stefan: Are operators intentionally reused across iterations, i.e., is it > an explicit feature or is it likely to change in the future? > > > > Cheers, > > Sebastian > > > > *From:* ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] *On Behalf > Of *Stephan Ewen > *Sent:* Mittwoch, 11. Februar 2015 10:02 > *To:* user@flink.apache.org > *Subject:* Re: DeltaIterations: shrink solution set > > > > You can also use a bulk iteration and just keep the state yourself. Since > the functions love across iterations, it is easily doable to just gather > the state in a HashMap yourself. Use map(), or mapPartition(), a manual > partition() call - that should do the trick... > > Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" < > alexander.s.alexand...@gmail.com>: > > True. > > > > 2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <vasilikikala...@gmail.com>: > > Hi, > > It's hard to tell without details about your algorithm, but what you're > describing sounds to me like something you can use the workset for. > > -V. > > On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" < > alexander.s.alexand...@gmail.com> wrote: > > I am not sure whether this is supported at the moment. The only workaround > I could think of is indeed to use a boolean flag that indicates whether the > element has been deleted or not. > > An alternative approach is to ditch Flink's native iteration construct and > write your intermediate results to Tachyon or HDFS after each iteration > using the TypeInfoInput/OutputFormats. You then have full control how the > old and the new solutions sets should be merged. > > > BTW can you share some details about that particular algorithm? I was > thinking about examples iterative algorithms with this property... > > Regards, > A. > > > > 2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <sebastian.kr...@hpi.de>: > > Hi everyone, > > > > From playing around a bit around with delta iterations, I saw that you can > update elements from the solution set and add new elements. My question is: > is it possible to remove elements from the solution set (apart from marking > them as “deleted” somehow)? > > > > My use case at hand for this is the following: In each iteration, I > generate candidate solutions that I want to verify within the next > iteration. If verification fails, I would like to remove them from the > solution set, otherwise retain them. > > > > Thanks, > > Sebastian > > > > >