Re: Configuring distributed caching with Spark and YARN

2014-04-01 Thread santhoma
I think with addJar() there is no 'caching', in the sense files will be copied everytime per job. Whereas in hadoop distributed cache, files will be copied only once, and a symlink will be created to the cache file for subsequent runs: https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fi

Re: Configuring distributed caching with Spark and YARN

2014-03-27 Thread Mayur Rustagi
is this equivalent to addjar? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Thu, Mar 27, 2014 at 3:58 AM, santhoma wrote: > Curious to know, were you able to do distributed caching for spark? > > I have done that for

Re: Configuring distributed caching with Spark and YARN

2014-03-27 Thread santhoma
Curious to know, were you able to do distributed caching for spark? I have done that for hadoop and pig, but could not find a way to do it in spark -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p33