There is a kind of check in the *yarn-site.xml*
*<property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/var/yarn/logs</value>* *</property>* Using *hdfs://xxxx:9000* as* fs.defaultFS* in *core-site.xml* you have to *hdfs dfs -mkdir /var/yarn/logs* Using *S3://* as * fs.defaultFS*... Take care of *.dir* properties in* hdfs-site.xml*. Must point to local or S3 value. Curious to see *YARN* working without *DFS*. @*JB*Δ <http://jbigdata.fr/jbigdata/hadoop.html> Le lun. 20 mai 2019 à 09:54, Hariharan <hariharan...@gmail.com> a écrit : > Hi Huizhe, > > You can set the "fs.defaultFS" field in core-site.xml to some path on s3. > That way your spark job will use S3 for all operations that need HDFS. > Intermediate data will still be stored on local disk though. > > Thanks, > Hari > > On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari <abdealikoth...@gmail.com> > wrote: > >> While spark can read from S3 directly in EMR, I believe it still needs >> the HDFS to perform shuffles and to write intermediate data into disk when >> doing jobs (I.e. when the in memory need stop spill over to disk) >> >> For these operations, Spark does need a distributed file system - You >> could use something like EMRFS (which is like a HDFS backed by S3) on >> Amazon. >> >> The issue could be something else too - so a stacktrace or error message >> could help in understanding the problem. >> >> >> >> On Mon, May 20, 2019, 07:20 Huizhe Wang <wang.h...@husky.neu.edu> wrote: >> >>> Hi, >>> >>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and >>> using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and >>> DataNode. I got an error when using yarn cluster mode. Could I using yarn >>> without start DFS, how could I use this mode? >>> >>> Yours, >>> Jane >>> >>