Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

Sun Rui Wed, 24 Aug 2016 07:10:55 -0700
For HDFS, maybe you can try mount HDFS as NFS. But not sure about the 
stability, and also there is additional overhead of network I/O and replica of 
HDFS files.
> On Aug 24, 2016, at 21:02, Saisai Shao <sai.sai.s...@gmail.com> wrote:
> 
> Spark Shuffle uses Java File related API to create local dirs and R/W data, 
> so it can only be worked with OS supported FS. It doesn't leverage Hadoop 
> FileSystem API, so writing to Hadoop compatible FS is not worked.
> 
> Also it is not suitable to write temporary shuffle data into distributed FS, 
> this will bring unnecessary overhead. In you case if you have large memory on 
> each node, you could use ramfs instead to store shuffle data.
> 
> Thanks
> Saisai
> 
> On Wed, Aug 24, 2016 at 8:11 PM, tony....@tendcloud.com 
> <mailto:tony....@tendcloud.com> <tony....@tendcloud.com 
> <mailto:tony....@tendcloud.com>> wrote:
> Hi, All,
> When we run Spark on very large data, spark will do shuffle and the shuffle 
> data will write to local disk. Because we have limited capacity at local 
> disk, the shuffled data will occupied all of the local disk and then will be 
> failed.  So is there a way we can write the shuffle spill data to HDFS? Or if 
> we introduce alluxio in our system, can the shuffled data write to alluxio?
> 
> Thanks and Regards,
> 
> 阎志涛(Tony)
> 
> 北京腾云天下科技有限公司
> --------------------------------------------------------------------------------------------------------
> 邮箱：tony....@tendcloud.com <mailto:tony....@tendcloud.com>
> 电话：13911815695
> 微信： zhitao_yan
> QQ ： 4707059
> 地址：北京市东城区东直门外大街39号院2号楼航空服务大厦602室
> 邮编：100027
> --------------------------------------------------------------------------------------------------------
> TalkingData.com <http://talkingdata.com/> - 让数据说话
>
Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

Reply via email to