You need to set spark.cores.max to a number say 16, so that on all 4 machines the tasks will get distributed evenly, Another thing would be to set spark.default.parallelism if you haven't tried already.
Thanks Best Regards On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay <sesnbarzi...@gmail.com> wrote: > I am running on a 4 workers cluster each having between 16 to 30 cores and > 50 GB of ram > > On Wed, 11 Mar 2015 8:55 am Akhil Das <ak...@sigmoidanalytics.com> wrote: > >> Depending on your cluster setup (cores, memory), you need to specify the >> parallelism/repartition the data. >> >> Thanks >> Best Regards >> >> On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay <sesnbarzi...@gmail.com> >> wrote: >> >>> Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm >>> from the mllib library. When I am trying to run the algorithm over a large >>> basket(over 1000 items) the program seems to never finish. Did anyone find >>> a workaround for this problem? >>> >> >>