happened to see this problem on stackoverflow:
http://stackoverflow.com/questions/36195105/what-happens-if-i-cache-the-same-rdd-twice-in-spark/36195812#36195812


I think it's very interesting, and I think the answer posted by Aaron
sounds promising, but I'm not sure, and I don't find the details on the
cache principle in Spark, so just post here and to ask everyone that the
internal principle on implementing cache.

great thanks.


-----aaron's answer to that question [Is that right?]-----

nothing happens, it will just cache the RDD for once. The reason, I think,
is that every RDD has an id internally, spark will use the id to mark
whether a RDD have been cached or not. so cache one RDD for multiple times
will do nothing.
-----------



-- 
*--------------------------------------*
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao

Reply via email to