happened to see this problem on stackoverflow: http://stackoverflow.com/questions/36195105/what-happens-if-i-cache-the-same-rdd-twice-in-spark/36195812#36195812
I think it's very interesting, and I think the answer posted by Aaron sounds promising, but I'm not sure, and I don't find the details on the cache principle in Spark, so just post here and to ask everyone that the internal principle on implementing cache. great thanks. -----aaron's answer to that question [Is that right?]----- nothing happens, it will just cache the RDD for once. The reason, I think, is that every RDD has an id internally, spark will use the id to mark whether a RDD have been cached or not. so cache one RDD for multiple times will do nothing. ----------- -- *--------------------------------------* a spark lover, a quant, a developer and a good man. http://github.com/litaotao