Re: [spark on yarn] spark on yarn without DFS

Gourav Sengupta Wed, 22 May 2019 07:15:01 -0700

just wondering what is the advantage of doing this?

Regards
Gourav Sengupta


On Wed, May 22, 2019 at 3:01 AM Huizhe Wang <wang.h...@husky.neu.edu> wrote:

> Hi Hari,
> Thanks :) I tried to do it as u said. It works ;)
>
>
> Hariharan <hariharan...@gmail.com>于2019年5月20日 周一下午3:54写道：
>
>> Hi Huizhe,
>>
>> You can set the "fs.defaultFS" field in core-site.xml to some path on s3.
>> That way your spark job will use S3 for all operations that need HDFS.
>> Intermediate data will still be stored on local disk though.
>>
>> Thanks,
>> Hari
>>
>> On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari <
>> abdealikoth...@gmail.com> wrote:
>>
>>> While spark can read from S3 directly in EMR, I believe it still needs
>>> the HDFS to perform shuffles and to write intermediate data into disk when
>>> doing jobs (I.e. when the in memory need stop spill over to disk)
>>>
>>> For these operations, Spark does need a distributed file system - You
>>> could use something like EMRFS (which is like a HDFS backed by S3) on
>>> Amazon.
>>>
>>> The issue could be something else too - so a stacktrace or error message
>>> could help in understanding the problem.
>>>
>>>
>>>
>>> On Mon, May 20, 2019, 07:20 Huizhe Wang <wang.h...@husky.neu.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS
>>>> and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode
>>>> and DataNode. I got an error when using yarn cluster mode. Could I using
>>>> yarn without start DFS, how could I use this mode?
>>>>
>>>> Yours,
>>>> Jane
>>>>
>>>

Re: [spark on yarn] spark on yarn without DFS

Reply via email to