Re: Is there a Flink DataSet equivalent to Spark's RDD.persist?

Andrey Zagrebin Tue, 26 Feb 2019 10:27:18 -0800

Hi Frank,

This feature is currently under discussion. You can follow it in this issue:
https://issues.apache.org/jira/browse/FLINK-11199


Best,
Andrey

On Thu, Feb 21, 2019 at 7:41 PM Frank Grimes <frankgrime...@yahoo.com>
wrote:

> Hi,
>
> I'm trying to port an existing Spark job to Flink and have gotten stuck on
> the same issue brought up here:
>
> https://stackoverflow.com/questions/46243181/cache-and-persist-datasets
>
> Is there some way to accomplish this same thing in Flink?
> i.e. avoid re-computing a particular DataSet when multiple different
> subsequent transformations are required on it.
>
> I've even tried explicitly writing out the DataSet to avoid the
> re-computation but still taking an I/O hit for the initial write to HDFS
> and subsequent re-reading of it in the following stages.
> While it does yield a performance improvement over no caching at all, it
> doesn't match the performance I get with RDD.persist in Spark.
>
> Thanks,
>
> Frank Grimes
>

Re: Is there a Flink DataSet equivalent to Spark's RDD.persist?

Reply via email to