Hi Andre, I'm happy you were able to solve your problem :)
Improvements to the documentation are always welcome! To me ST != WT is straight-forward from the javadocs, but I guess it wouldn't hurt to stress it in the docs. Do you think you could simplify your implementation a bit to make for a nice example? It might be a bit too complicated to follow as it is right now. In any case, if you would like to improve the delta iteration docs, please go ahead and open a JIRA. We can discuss the details of what improvements to make over there. Thanks! -Vasia. On 5 November 2015 at 11:41, André Petermann < peterm...@informatik.uni-leipzig.de> wrote: > Hi Vasia! > > Many thanks for your reply! Your hints finally enabled me implementing the > problems first-choice solution using delta iteration: > https://gist.github.com/p3et/12deb7d6321b48e9efab > > Do you think this could be worth to be contributed as an example within > the Flink documentation? The examples I found so far could not help > enlightening me how to use delta iteration for this kind of loop > (ST != WT, start from empty solution set, ...). > > Cheers, > Andre > > > On 04.11.2015 16:32, Vasiliki Kalavri wrote: > >> Hi Andre, >> >> On 4 November 2015 at 16:04, André Petermann < >> peterm...@informatik.uni-leipzig.de> wrote: >> >> Hi Fabian, >>> >>> thanks for your fast reply! >>> >>> I created a gist to explain the while-not-empty loop in more detail: >>> https://gist.github.com/p3et/9f6e56cf0b68213e3e2b >>> >>> It is an approach to create a minimal example of the kind of algorithm >>> corresponding to page 1 of the PDF, in particular frequent substrings. >>> Please don't care about the algorithm itself, the actual algorithm is >>> much >>> more complex. However, even this simple case runs only for 7 iterations >>> on >>> my machine before "too few memory segments" is raising. >>> >>> If I understood delta iteration right, it's, necessary to provide a 1:1 >>> mapping between working and solution set items. The empty-case you >>> referred >>> to is the no-more-updates case, but not no-more-items, right? >>> >>> >> >> I think it's possible to do what you want with a delta iteration. >> >> The solution set and workset *don't* need to be of the same type. When you >> define the delta iteration, >> e.g. DeltaIteration<ST, WT> iteration = ..., >> ST is the type of the solution set and WT is the type of the workset. >> >> >> >> >>> In my example, the working set is completely replaced for each iteration >>> (line 52), with only a parent-child mapping. The solution set is >>> initially >>> empty (line 33) and stores results of all iterations (line 48). I hope >>> this >>> shows the difference to the delta iteration and while-not-empty. Further >>> on, you see the different data types of working and solution sets. >>> >> >> >> I will provide a further Gist for the "second choice" solution using delta >>> iteration. However, what we actually would prefer is to replace the >>> for-loop by something like while(embeddings.isNotEmpty()) including a >>> truly >>> iterative execution. >>> >>> >> >> The default convergence criterion for a delta iteration is an empty >> workset. Thus, if you set embeddings as the workset, you have your >> "while(embeddings.isNotEmpty())" logic. >> Also, as far as I know, there is no problem with appending new elements >> to the solution set. So, using allFrequentPatterns as the solution set >> should be fine. >> >> >> >> >>> But please let me know if I missed some Flink features already enabling >>> such loops. >>> >>> Thanks, >>> Andre >>> >> >> >> >> Does this clear things out a bit? Let me know if I misunderstood what you >> want to do. >> >> Cheers, >> -Vasia. >> >> >> >> >>> >>> On 03.11.2015 15:47, Fabian Hueske wrote: >>> >>> Hi Andre, >>>> >>>> Thanks for reaching out to the Flink community! >>>> >>>> I am not sure your analysis is based on correct assumptions about >>>> Flink's >>>> delta iterations. >>>> Flink's delta iterations do support >>>> - working and solution sets of different types >>>> - worksets that grow and shrink or are completely replaced in each >>>> iteration >>>> - solution sets that grow (up to the point of the too few memory >>>> segments >>>> error) >>>> - termination via a custom aggregator. So it is max number of >>>> iterations, >>>> empty workset, and a custom termination criterion. >>>> >>>> The problem of the "too few memory segments" occurs because the solution >>>> set is held in an in-memory hash table. >>>> Spilling that table would result in a significant iteration slowdown. >>>> One >>>> solution is to throw more resources at the problem (more nodes, more >>>> RAM). >>>> Another work-around is to reduce the amount of managed memory and switch >>>> the solution set hash table to "unmanaged". This will keep the result >>>> set >>>> in a regular Java HashMap but this might cause a OOMError and kill the >>>> JVM >>>> if it grows too large. >>>> AFAIK, there is no way to shrink the solution set at the moment. It >>>> might >>>> be worth to investigate into that direction. >>>> >>>> Regarding your questions about the while-not-empty loop. What exactly is >>>> the difference to the default delta iteration. It does stop when the >>>> workset is empty (in addition to the custom termination criterion). >>>> I am not sure if I got all your points. Please let me know, if I got >>>> something wrong. >>>> >>>> Thanks, >>>> Fabian >>>> >>>> 2015-11-03 13:50 GMT+01:00 André Petermann < >>>> peterm...@informatik.uni-leipzig.de>: >>>> >>>> Sorry for redundant post ... >>>> >>>>> >>>>> >>>>> Hi all, >>>>> >>>>> We are working on an implementation of Frequent Subgraph Mining using >>>>> Flink. At that, the "too few memory segments error" prevents the most >>>>> promising solution. The problem is not specific for graphs, but all >>>>> iterative problems where >>>>> >>>>> - working and solution sets contain data of different types >>>>> - the working set may grow, shrink or is replaced for each iteration >>>>> - the solution set grows for each iteration >>>>> - the termination criterion is based on data set metrics, >>>>> e.g. while working set not empty >>>>> >>>>> An illustration of our problem workflow, generalized to graph >>>>> unspecific >>>>> frequent pattern mining, can be found here: >>>>> >>>>> >>>>> >>>>> https://github.com/dbs-leipzig/gradoop/blob/master/dev-support/loopWithIntermediateResultMemory.pdf >>>>> >>>>> Page 1 shows the most promising solution. We started implementing it >>>>> using a for loop. However, the "too few memory segments error" makes it >>>>> untestable. As the iteration body itself is a complex workflow and the >>>>> number of iterations is arbitrary, unrolling it while reserving >>>>> operator >>>>> memory will be a permanent limitation. Increasing limits or physical >>>>> memory would only delay the problem. The resulting question is: >>>>> >>>>> Would it be possible to implement a while-not-empty or at least a for >>>>> loop, that isn't unrolled and can be executed more memory efficient? >>>>> >>>>> Page 2 shows an alternative solution to our problem using the concept >>>>> of >>>>> delta iteration. However, Flink delta iteration does neither support >>>>> broadcasting nor working-set independent intermediate results. Page 3 >>>>> shows our working solution using two workarounds for those >>>>> restrictions. >>>>> However, these workarounds lead to unnecessary memory consumption and >>>>> redundant expensive computations. So, in the case the answer to the >>>>> first question is no, a second question: >>>>> >>>>> Would it be possible to extend the delta iteration by the support of >>>>> rich map functions with broadcast sets and the memory of intermediate >>>>> results? >>>>> >>>>> We think, that a while-not-empty loop might be useful for other >>>>> algorithms too, e.g. variable length path search in graphs. Did we miss >>>>> Flink features meeting our requirements? Do you think it's worth to >>>>> create an improvement issue? At that, we'd of course be willing to >>>>> contribute in the form of development. >>>>> >>>>> Best Regards >>>>> Andre >>>>> >>>>> -- >>>>> ------------------------------------------- >>>>> PhD Student >>>>> University of Leipzig >>>>> Department of Computer Science >>>>> Database Research Group >>>>> >>>>> email: peterm...@informatik.uni-leipzig.de >>>>> web: dbs.uni-leipzig.de >>>>> ------------------------------------------- >>>>> >>>>> >>>>> >>>> >>