Re: Number of partitions and Number of concurrent tasks

2014-08-01 Thread Daniel Siegmann
>>> -t m3.2xlarge -w 3600 --spot-price=.08 -z us-east-1e --worker-instances=2 >>> *my-cluster* >>> >>> >>> -- >>> *From:* Daniel Siegmann >>> *To:* Darin McBeath >>> *Cc:* Daniel Siegmann ; &quo

Re: Number of partitions and Number of concurrent tasks

2014-08-01 Thread Nicholas Chammas
worker-instances=2 >> *my-cluster* >> >> >> -- >> *From:* Daniel Siegmann >> *To:* Darin McBeath >> *Cc:* Daniel Siegmann ; "user@spark.apache.org" >> >> *Sent:* Thursday, July 31, 2014 10:04 AM >>

Re: Number of partitions and Number of concurrent tasks

2014-08-01 Thread Daniel Siegmann
ann ; "user@spark.apache.org" > > *Sent:* Thursday, July 31, 2014 10:04 AM > > *Subject:* Re: Number of partitions and Number of concurrent tasks > > I haven't configured this myself. I'd start with setting > SPARK_WORKER_CORES to a higher value, since that

Re: Number of partitions and Number of concurrent tasks

2014-07-31 Thread Darin McBeath
00 --spot-price=.08 -z us-east-1e --worker-instances=2 my-cluster From: Daniel Siegmann To: Darin McBeath Cc: Daniel Siegmann ; "user@spark.apache.org" Sent: Thursday, July 31, 2014 10:04 AM Subject: Re: Number of partitions and Number of concu

Re: Number of partitions and Number of concurrent tasks

2014-07-31 Thread Daniel Siegmann
n what the > documentation states). What would I want that value to be based on my > configuration below? Or, would I leave that alone? > > -- > *From:* Daniel Siegmann > *To:* user@spark.apache.org; Darin McBeath > *Sent:* Wednesday, July 30, 2014 5

Re: Number of partitions and Number of concurrent tasks

2014-07-30 Thread Darin McBeath
r of partitions and Number of concurrent tasks This is correct behavior. Each "core" can execute exactly one task at a time, with each task corresponding to a partition. If your cluster only has 24 cores, you can only run at most 24 tasks at once. You could run multiple workers per n

Re: Number of partitions and Number of concurrent tasks

2014-07-30 Thread Daniel Siegmann
This is correct behavior. Each "core" can execute exactly one task at a time, with each task corresponding to a partition. If your cluster only has 24 cores, you can only run at most 24 tasks at once. You could run multiple workers per node to get more executors. That would give you more cores in

Number of partitions and Number of concurrent tasks

2014-07-30 Thread Darin McBeath
I have a cluster with 3 nodes (each with 8 cores) using Spark 1.0.1. I have an RDD which I've repartitioned so it has 100 partitions (hoping to increase the parallelism). When I do a transformation (such as filter) on this RDD, I can't  seem to get more than 24 tasks (my total number of cores a