Re: Is there any inplict RDD cache operation for query optimizations?

attilapiros Mon, 15 Feb 2021 08:40:42 -0800

hi,

There is good reason why the decision about caching is left for the user.
Spark does not know about the future of the DataFrames and RDDs.


Think about how your program is running (you are still running program), so
there is an exact point where the execution is and when Spark reaches an
action it evaluates the Spark job but it does not know about the future
jobs. A cached data would be only useful for that future job which will
reuses it.

On the other hand this information is available for the user as he writes
all the jobs.

Attila



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Is there any inplict RDD cache operation for query optimizations?

Reply via email to