I think that with some support I could try to implement it...actually I just need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs (similar to what Spark does..) and output it to a tachyon directory configured in the flink-conf.yml and then re-read that dataset using its generated name on tachyon. Do you have other suggestions?
On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com> wrote: > The basic building blocks are there but I am not aware of any efforts to > implement caching and add it to the API. > > 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> Is there any effort in this direction? maybe I could achieve something >> like that using Tachyon in some way...? >> >> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhue...@gmail.com> wrote: >> >>> Hi Flavio, >>> >>> Flink does not support caching of data sets in memory yet. >>> >>> Best, Fabian >>> >>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: >>> >>>> Hi to all, >>>> I was wondering if Flink could fit a use case where a user load a >>>> dataset in memory and then he/she wants to explore it interactively. Let's >>>> say I want to load a csv, then filter out the rows where the column value >>>> match some criteria, then apply another criteria after seeing the results >>>> of the first filter. >>>> Is there a way to keep the dataset in memory and modify it >>>> interactively without re-reading all the dataset every time I want to chain >>>> another operation to my dataset? >>>> >>>> Best, >>>> Flavio >>>> >>> >>> >> >> >