There is still quite a bit needed to do this properly: (1) incremental recovery (2) network stack caching
(1) will probably happen quite soon, I am not aware of any committer having concrete plans for (2). Best, Stephan On Sat, Oct 8, 2016 at 4:41 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Any progress in this direction?how mich effort do you think it's required > in order to implement this feature? > > On 2 Dec 2015 16:29, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: > >> Do you think it is possible to push ahead this thing? I need to implement >> this interactive feature of Datasets. Do you think it is possible to >> implement the persist() method in Flink (similar to Spark)? If you want I >> can work on it with some instructions.. >> >> On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <m...@apache.org> >> wrote: >> >>> Hi Flavio, >>> >>> I was working on this some time ago but it didn't make it in yet and >>> priorities shifted a bit. The pull request is here: >>> https://github.com/apache/flink/pull/640 >>> >>> The basic idea is to remove Flink's ResultPartition buffers in memory >>> lazily, i.e. keep them as long as enough memory is available. When a >>> new job is resumed, it picks up the old results again. The pull >>> request needs some overhaul now and the API integration is not there >>> yet. >>> >>> Cheers, >>> Max >>> >>> On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier >>> <pomperma...@okkam.it> wrote: >>> > I think that with some support I could try to implement it...actually >>> I just >>> > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs >>> > (similar to what Spark does..) and output it to a tachyon directory >>> > configured in the flink-conf.yml and then re-read that dataset using >>> its >>> > generated name on tachyon. Do you have other suggestions? >>> > >>> > >>> > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com> >>> wrote: >>> >> >>> >> The basic building blocks are there but I am not aware of any efforts >>> to >>> >> implement caching and add it to the API. >>> >> >>> >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: >>> >>> >>> >>> Is there any effort in this direction? maybe I could achieve >>> something >>> >>> like that using Tachyon in some way...? >>> >>> >>> >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhue...@gmail.com> >>> wrote: >>> >>>> >>> >>>> Hi Flavio, >>> >>>> >>> >>>> Flink does not support caching of data sets in memory yet. >>> >>>> >>> >>>> Best, Fabian >>> >>>> >>> >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it >>> >: >>> >>>>> >>> >>>>> Hi to all, >>> >>>>> I was wondering if Flink could fit a use case where a user load a >>> >>>>> dataset in memory and then he/she wants to explore it >>> interactively. Let's >>> >>>>> say I want to load a csv, then filter out the rows where the >>> column value >>> >>>>> match some criteria, then apply another criteria after seeing the >>> results of >>> >>>>> the first filter. >>> >>>>> Is there a way to keep the dataset in memory and modify it >>> >>>>> interactively without re-reading all the dataset every time I want >>> to chain >>> >>>>> another operation to my dataset? >>> >>>>> >>> >>>>> Best, >>> >>>>> Flavio >>> >>>> >>> >>>> >>> >>> >>> >>> >>> >> >>> > >>> > >>> >> >>