Is there a Flink DataSet equivalent to Spark's RDD.persist?

Frank Grimes Thu, 21 Feb 2019 10:41:36 -0800

Hi,
I'm trying to port an existing Spark job to Flink and have gotten stuck on the 
same issue brought up here:
https://stackoverflow.com/questions/46243181/cache-and-persist-datasets
Is there some way to accomplish this same thing in Flink?i.e. avoid 
re-computing a particular DataSet when multiple different subsequent 
transformations are required on it.
I've even tried explicitly writing out the DataSet to avoid the re-computation 
but still taking an I/O hit for the initial write to HDFS and subsequent 
re-reading of it in the following stages. While it does yield a performance 
improvement over no caching at all, it doesn't match the performance I get with 
RDD.persist in Spark.
Thanks,
Frank Grimes

Is there a Flink DataSet equivalent to Spark's RDD.persist?

Reply via email to