rerun person.count and you will see the performance of cache. person.cache would not cache it right now. It'll actually cache this RDD after one action[person.count here]
----- 原始邮件 ----- 发件人: fightf...@163.com 收件人: "user" <user@spark.apache.org> 发送时间: 星期三, 2015年 4 月 01日 下午 1:21:25 主题: rdd.cache() not working ? Hi, all Running the following code snippet through spark-shell, however cannot see any cached storage partitions in web ui. Does this mean that cache now working ? Cause if we issue person.count again that we cannot say any time consuming performance upgrading. Hope anyone can explain this for a little. Best, Sun. case class Person(id: Int, col1: String) val person = sc.textFile("hdfs://namenode_host:8020/user/person.txt").map(_.split(",")).map(p => Person(p(0).trim.toInt, p(1))) person.cache person.count fightf...@163.com -- --------------------------------------------------------------------------- Thanks & Best regards 李涛涛 Taotao · Li | Fixed Income@Datayes | Software Engineer 地址:上海市浦东新区陆家嘴西路 99 号万向大厦8 楼, 200120 Address :Wanxiang Towen 8 F, Lujiazui West Rd. No.99, Pudong New District, Shanghai, 200120 电话 |Phone : 021-60216502 手机 |Mobile: +86-18202171279