For HDFS, maybe you can try mount HDFS as NFS. But not sure about the
stability, and also there is additional overhead of network I/O and replica of
HDFS files.
> On Aug 24, 2016, at 21:02, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
> Spark Shuffle uses Java File related API to create local dirs and R/W data,
> so it can only be worked with OS supported FS. It doesn't leverage Hadoop
> FileSystem API, so writing to Hadoop compatible FS is not worked.
>
> Also it is not suitable to write temporary shuffle data into distributed FS,
> this will bring unnecessary overhead. In you case if you have large memory on
> each node, you could use ramfs instead to store shuffle data.
>
> Thanks
> Saisai
>
> On Wed, Aug 24, 2016 at 8:11 PM, tony....@tendcloud.com
> <mailto:tony....@tendcloud.com> <tony....@tendcloud.com
> <mailto:tony....@tendcloud.com>> wrote:
> Hi, All,
> When we run Spark on very large data, spark will do shuffle and the shuffle
> data will write to local disk. Because we have limited capacity at local
> disk, the shuffled data will occupied all of the local disk and then will be
> failed. So is there a way we can write the shuffle spill data to HDFS? Or if
> we introduce alluxio in our system, can the shuffled data write to alluxio?
>
> Thanks and Regards,
>
> 阎志涛(Tony)
>
> 北京腾云天下科技有限公司
> --------------------------------------------------------------------------------------------------------
> 邮箱:tony....@tendcloud.com <mailto:tony....@tendcloud.com>
> 电话:13911815695
> 微信: zhitao_yan
> QQ : 4707059
> 地址:北京市东城区东直门外大街39号院2号楼航空服务大厦602室
> 邮编:100027
> --------------------------------------------------------------------------------------------------------
> TalkingData.com <http://talkingdata.com/> - 让数据说话
>