Re: where storagelevel DISK_ONLY persists RDD to

2015-01-25 Thread Larry Liu
e like HDFS or NFS it you are attempting to interoperate with > another system, such as Hadoop. `.persist` is for keeping the contents of > an RDD around so future uses of that particular RDD don't need to > recalculate its composite parts. > > > On Sun Jan 25 2015 at 3:36:3

Re: Shuffle to HDFS

2015-01-25 Thread Larry Liu
t; > I don’t think current Spark’s shuffle can support HDFS as a shuffle > output. Anyway, is there any specific reason to spill shuffle data to HDFS > or NFS, this will severely increase the shuffle time. > > > > Thanks > > Jerry > > > > *From:* Larry Liu [mailto

Shuffle to HDFS

2015-01-25 Thread Larry Liu
How to change shuffle output to HDFS or NFS?

where storagelevel DISK_ONLY persists RDD to

2015-01-25 Thread Larry Liu
I would like to persist RDD TO HDFS or NFS mount. How to change the location?

Re: How to use more executors

2015-01-21 Thread Larry Liu
Will SPARK-1706 be included in next release? On Wed, Jan 21, 2015 at 2:50 PM, Ted Yu wrote: > Please see SPARK-1706 > > On Wed, Jan 21, 2015 at 2:43 PM, Larry Liu wrote: > >> I tried to submit a job with --conf "spark.cores.max=6" >> or --total-executor-co

How to use more executors

2015-01-21 Thread Larry Liu
I tried to submit a job with --conf "spark.cores.max=6" or --total-executor-cores 6 on a standalone cluster. But I don't see more than 1 executor on each worker. I am wondering how to use multiple executors when submitting jobs. Thanks larry

Re: wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
un jstack to see where the process is > spending time. Also make sure Spark's local work directories > (spark.local.dir) are not on NFS. They shouldn't be though, that should be > /tmp. > > Matei > > On Dec 17, 2014, at 11:56 AM, Larry Liu wrote: > > Hi, Matei

Re: wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
Hi, Matei Thanks for your response. I tried to copy the file (1G) from NFS and took 10 seconds. The NFS mount is a LAN environment and the NFS server is running on the same server that Spark is running on. So basically I mount the NFS on the same bare metal machine. Larry On Wed, Dec 17, 2014 a

wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
Hi, A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry

wordcount job slow while input from NFS mount

2014-12-17 Thread Larry Liu
A wordcounting job for about 1G text file takes 1 hour while input from a NFS mount. The same job took 30 seconds while input from local file system. Is there any tuning required for a NFS mount input? Thanks Larry

Re: input split size

2014-10-17 Thread Larry Liu
Thanks, Andrew. What about reading out of local? On Fri, Oct 17, 2014 at 5:38 PM, Andrew Ash wrote: > When reading out of HDFS it's the HDFS block size. > > On Fri, Oct 17, 2014 at 5:27 PM, Larry Liu wrote: > >> What is the default input split size? How to change it? >> > >

How to disable input split

2014-10-17 Thread Larry Liu
Is it possible to disable input split if input is already small?

input split size

2014-10-17 Thread Larry Liu
What is the default input split size? How to change it?