Any progress in this direction?how mich effort do you think it's required in order to implement this feature?
On 2 Dec 2015 16:29, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: > Do you think it is possible to push ahead this thing? I need to implement > this interactive feature of Datasets. Do you think it is possible to > implement the persist() method in Flink (similar to Spark)? If you want I > can work on it with some instructions.. > > On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <m...@apache.org> wrote: > >> Hi Flavio, >> >> I was working on this some time ago but it didn't make it in yet and >> priorities shifted a bit. The pull request is here: >> https://github.com/apache/flink/pull/640 >> >> The basic idea is to remove Flink's ResultPartition buffers in memory >> lazily, i.e. keep them as long as enough memory is available. When a >> new job is resumed, it picks up the old results again. The pull >> request needs some overhaul now and the API integration is not there >> yet. >> >> Cheers, >> Max >> >> On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier >> <pomperma...@okkam.it> wrote: >> > I think that with some support I could try to implement it...actually I >> just >> > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs >> > (similar to what Spark does..) and output it to a tachyon directory >> > configured in the flink-conf.yml and then re-read that dataset using its >> > generated name on tachyon. Do you have other suggestions? >> > >> > >> > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com> >> wrote: >> >> >> >> The basic building blocks are there but I am not aware of any efforts >> to >> >> implement caching and add it to the API. >> >> >> >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: >> >>> >> >>> Is there any effort in this direction? maybe I could achieve something >> >>> like that using Tachyon in some way...? >> >>> >> >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhue...@gmail.com> >> wrote: >> >>>> >> >>>> Hi Flavio, >> >>>> >> >>>> Flink does not support caching of data sets in memory yet. >> >>>> >> >>>> Best, Fabian >> >>>> >> >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it >> >: >> >>>>> >> >>>>> Hi to all, >> >>>>> I was wondering if Flink could fit a use case where a user load a >> >>>>> dataset in memory and then he/she wants to explore it >> interactively. Let's >> >>>>> say I want to load a csv, then filter out the rows where the column >> value >> >>>>> match some criteria, then apply another criteria after seeing the >> results of >> >>>>> the first filter. >> >>>>> Is there a way to keep the dataset in memory and modify it >> >>>>> interactively without re-reading all the dataset every time I want >> to chain >> >>>>> another operation to my dataset? >> >>>>> >> >>>>> Best, >> >>>>> Flavio >> >>>> >> >>>> >> >>> >> >>> >> >> >> > >> > >> > >