Hi Frank, This feature is currently under discussion. You can follow it in this issue: https://issues.apache.org/jira/browse/FLINK-11199
Best, Andrey On Thu, Feb 21, 2019 at 7:41 PM Frank Grimes <frankgrime...@yahoo.com> wrote: > Hi, > > I'm trying to port an existing Spark job to Flink and have gotten stuck on > the same issue brought up here: > > https://stackoverflow.com/questions/46243181/cache-and-persist-datasets > > Is there some way to accomplish this same thing in Flink? > i.e. avoid re-computing a particular DataSet when multiple different > subsequent transformations are required on it. > > I've even tried explicitly writing out the DataSet to avoid the > re-computation but still taking an I/O hit for the initial write to HDFS > and subsequent re-reading of it in the following stages. > While it does yield a performance improvement over no caching at all, it > doesn't match the performance I get with RDD.persist in Spark. > > Thanks, > > Frank Grimes >