just wondering what is the advantage of doing this? Regards Gourav Sengupta
On Wed, May 22, 2019 at 3:01 AM Huizhe Wang <wang.h...@husky.neu.edu> wrote: > Hi Hari, > Thanks :) I tried to do it as u said. It works ;) > > > Hariharan <hariharan...@gmail.com>于2019年5月20日 周一下午3:54写道: > >> Hi Huizhe, >> >> You can set the "fs.defaultFS" field in core-site.xml to some path on s3. >> That way your spark job will use S3 for all operations that need HDFS. >> Intermediate data will still be stored on local disk though. >> >> Thanks, >> Hari >> >> On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari < >> abdealikoth...@gmail.com> wrote: >> >>> While spark can read from S3 directly in EMR, I believe it still needs >>> the HDFS to perform shuffles and to write intermediate data into disk when >>> doing jobs (I.e. when the in memory need stop spill over to disk) >>> >>> For these operations, Spark does need a distributed file system - You >>> could use something like EMRFS (which is like a HDFS backed by S3) on >>> Amazon. >>> >>> The issue could be something else too - so a stacktrace or error message >>> could help in understanding the problem. >>> >>> >>> >>> On Mon, May 20, 2019, 07:20 Huizhe Wang <wang.h...@husky.neu.edu> wrote: >>> >>>> Hi, >>>> >>>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS >>>> and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode >>>> and DataNode. I got an error when using yarn cluster mode. Could I using >>>> yarn without start DFS, how could I use this mode? >>>> >>>> Yours, >>>> Jane >>>> >>>