Do you think it is possible to push ahead this thing? I need to implement this interactive feature of Datasets. Do you think it is possible to implement the persist() method in Flink (similar to Spark)? If you want I can work on it with some instructions..
On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <m...@apache.org> wrote: > Hi Flavio, > > I was working on this some time ago but it didn't make it in yet and > priorities shifted a bit. The pull request is here: > https://github.com/apache/flink/pull/640 > > The basic idea is to remove Flink's ResultPartition buffers in memory > lazily, i.e. keep them as long as enough memory is available. When a > new job is resumed, it picks up the old results again. The pull > request needs some overhaul now and the API integration is not there > yet. > > Cheers, > Max > > On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier > <pomperma...@okkam.it> wrote: > > I think that with some support I could try to implement it...actually I > just > > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs > > (similar to what Spark does..) and output it to a tachyon directory > > configured in the flink-conf.yml and then re-read that dataset using its > > generated name on tachyon. Do you have other suggestions? > > > > > > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com> > wrote: > >> > >> The basic building blocks are there but I am not aware of any efforts to > >> implement caching and add it to the API. > >> > >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > >>> > >>> Is there any effort in this direction? maybe I could achieve something > >>> like that using Tachyon in some way...? > >>> > >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhue...@gmail.com> > wrote: > >>>> > >>>> Hi Flavio, > >>>> > >>>> Flink does not support caching of data sets in memory yet. > >>>> > >>>> Best, Fabian > >>>> > >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > >>>>> > >>>>> Hi to all, > >>>>> I was wondering if Flink could fit a use case where a user load a > >>>>> dataset in memory and then he/she wants to explore it interactively. > Let's > >>>>> say I want to load a csv, then filter out the rows where the column > value > >>>>> match some criteria, then apply another criteria after seeing the > results of > >>>>> the first filter. > >>>>> Is there a way to keep the dataset in memory and modify it > >>>>> interactively without re-reading all the dataset every time I want > to chain > >>>>> another operation to my dataset? > >>>>> > >>>>> Best, > >>>>> Flavio > >>>> > >>>> > >>> > >>> > >> > > > > >