Can Spark read input data from HDFS centralized cache?

Jia Zou Mon, 25 Jan 2016 13:24:12 -0800

I configured HDFS to cache file in HDFS's cache, like following:

hdfs cacheadmin -addPool hibench


hdfs cacheadmin -addDirective -path /HiBench/Kmeans/Input -pool hibench


But I didn't see much performance impacts, no matter how I configure
dfs.datanode.max.locked.memory


Is it possible that Spark doesn't know the data is in HDFS cache, and still
read data from disk, instead of from HDFS cache?


Thanks!

Jia

Can Spark read input data from HDFS centralized cache?

Reply via email to