Thanks Greg for opening this discussion! I really really don't want to derail the discussion here, just a quick clarification regarding Suneel's last email: folks that are working at data Artisans are participating in this community as individuals, not as a corporation, and the dev list is not a support forum to "request" features from some company, but an open forum for the Flink community. I would hope that we keep the discussion technical (I know that I broke this with this email, but really felt I had to clarify this).
I think all of us agree that this is a very useful feature, and I'm very happy to see more work on this! Kostas On Mon, May 30, 2016 at 2:49 PM, Suneel Marthi <smar...@apache.org> wrote: > This is a feature that was requested by the Mahout project few months > before for the very same reasons as mentioned in previous emails on this > thread, but we were snubbed by the flink folks as this being '*WAY too > specific*' request for flink to deal with and 'its got to be done the way > Flink has it', etc... > > While delta iterations r real cool, its not real trivial to have them as > part of language specific DSLs handling more general iterations. Its good > to see that this limitation has started to bite others and hopefully Data > Artisans now sees this as a much needed feature. > > > > On Mon, May 30, 2016 at 8:31 AM, Gábor Gévay <gga...@gmail.com> wrote: > > > Hello, > > > > > Would the best way be to extend the iteration operators to support > > > intermediate outputs or revisit the idea of caching intermediate > results > > > and thus allow efficient for-loop iterations? > > > > Caching intermediate results would also help a lot to projects that > > are targeting Flink as a backend, like Emma [1] and SystemML [2]. The > > issue here is that these languages allow writing more general > > iterations (general control flow (nested loops, ifs in the loop body), > > multiple "solution sets", doing something else with the intermediate > > results, etc.), that can't be translated to Flink's iteration > > constructs. So these systems currently don't have much better options > > than just writing intermediate results to files, which is not so nice. > > > > Best, > > Gabor > > > > [1] > > > http://www.user.tu-berlin.de/asteriosk/assets/publications/emma-sigmod2015.pdf > > [2] https://systemml.apache.org/ > > > > > > > > 2016-05-28 13:48 GMT+02:00 Vasiliki Kalavri <vasilikikala...@gmail.com>: > > > Hey, > > > > > > it would be great to add this feature indeed! Thanks for bringing it up > > > Greg :) > > > Would the best way be to extend the iteration operators to support > > > intermediate outputs or revisit the idea of caching intermediate > results > > > and thus allow efficient for-loop iterations? > > > > > > -Vasia. > > > > > > On 26 May 2016 at 22:41, Greg Hogan <c...@greghogan.com> wrote: > > > > > >> Hi y'all, > > >> > > >> I think this is an oft-requested feature [0] and there are many graph > > >> algorithms for which intermediate output is the desired result. I'd > > like to > > >> take Stephan up on his offer [1] for pointers. > > >> > > >> I have yet to get in deep, but I see that iteration tasks are treated > > >> specially as IterationIntermediateTask for synchronization between > > >> supersteps. Also, when OperatorTranslation and GraphCreatingVisitor > are > > >> walking the program DAG an iteration must be first reached through the > > >> tail. > > >> > > >> Greg > > >> > > >> [0] > > >> > > >> > > > http://stackoverflow.com/questions/37224140/possibility-of-saving-partial-outputs-from-bulk-iteration-in-flink-dataset > > >> [1] > > >> > > >> > > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Intermediate-output-during-delta-iterations-td436.html > > >> > > >