On Mon, Sep 8, 2014 at 11:52 AM, Dimension Data, LLC. < subscripti...@didata.us> wrote:
> So just to clarify for me: When specifying 'spark.yarn.jar' as I did > above, even if I don't use HDFS to create a > RDD (e.g. do something simple like: 'sc.parallelize(range(100))'), it is > still necessary to configure the HDFS > location in each NM's '/etc/hadoop/conf/*', just so that they can access > the Spark Jar in the YARN case? > That's correct. In fact, I'm not aware of Yarn working at all without the HDFS configuration being in place (even if the default fs is not HDFS), but then I'm not a Yarn deployment expert. -- Marcelo