Re: DeltaIterations: shrink solution set

Stephan Ewen Wed, 11 Feb 2015 01:47:03 -0800

UDFs exist intentionally across iterations, it is a feature, to allow you
to keep state. To Figure out when an iteration starts and ends, you can use
a RichFunctions, which get calls to open() and close() for each iteration.


On Wed, Feb 11, 2015 at 10:40 AM, Kruse, Sebastian <sebastian.kr...@hpi.de>
wrote:

>  Thanks for your answers.
>
>
>
> I am trying to build an apriori-like algorithm to find key candidates in a
> relational dataset. I was considering delta iterations, because the
> algorithm should maintain two datasets: a set of column combinations to be
> checked (as delta set) and a set of tuples which are still relevant to the
> next iteration (as work set). So, the general proceeding is adapted from
> the popular frequent item set algorithm.
>
>
>
> I am now also thinking that delta iterations are not the right thing for
> me, also because of other problems (only join and coGroup to be used on the
> solution set and “Error: Iterative task without a single iterative input.”
> whose cause is not obvious to me).
>
>
>
> @Alexander: Using an output within a bulk iteration leaves me with the
> following exception:
>
> Exception in thread "main"
> org.apache.flink.api.common.InvalidProgramException: A data set that is
> part of an iteration was used as a sink or action. Did you forget to close
> the iteration?
>
> Do you have any experience/proposals how to incorporate your idea
> nevertheless?
>
>
>
> @Stefan: Are operators intentionally reused across iterations, i.e., is it
> an explicit feature or is it likely to change in the future?
>
>
>
> Cheers,
>
> Sebastian
>
>
>
> *From:* ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] *On Behalf
> Of *Stephan Ewen
> *Sent:* Mittwoch, 11. Februar 2015 10:02
> *To:* user@flink.apache.org
> *Subject:* Re: DeltaIterations: shrink solution set
>
>
>
> You can also use a bulk iteration and just keep the state yourself. Since
> the functions love across iterations, it is easily doable to just gather
> the state in a HashMap yourself. Use map(), or mapPartition(), a manual
> partition() call - that should do the trick...
>
> Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <
> alexander.s.alexand...@gmail.com>:
>
> True.
>
>
>
> 2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <vasilikikala...@gmail.com>:
>
> Hi,
>
> It's hard to tell without details about your algorithm, but what you're
> describing sounds to me like something you can use the workset for.
>
> -V.
>
> On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <
> alexander.s.alexand...@gmail.com> wrote:
>
> I am not sure whether this is supported at the moment. The only workaround
> I could think of is indeed to use a boolean flag that indicates whether the
> element has been deleted or not.
>
> An alternative approach is to ditch Flink's native iteration construct and
> write your intermediate results to Tachyon or HDFS after each iteration
> using the TypeInfoInput/OutputFormats. You then have full control how the
> old and the new solutions sets should be merged.
>
>
> BTW can you share some details about that particular algorithm? I was
> thinking about examples iterative algorithms with this property...
>
> Regards,
> A.
>
>
>
> 2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <sebastian.kr...@hpi.de>:
>
> Hi everyone,
>
>
>
> From playing around a bit around with delta iterations, I saw that you can
> update elements from the solution set and add new elements. My question is:
> is it possible to remove elements from the solution set (apart from marking
> them as “deleted” somehow)?
>
>
>
> My use case at hand for this is the following: In each iteration, I
> generate candidate solutions that I want to verify within the next
> iteration. If verification fails, I would like to remove them from the
> solution set, otherwise retain them.
>
>
>
> Thanks,
>
> Sebastian
>
>
>
>
>

Re: DeltaIterations: shrink solution set

Reply via email to