Re: Spark fpg large basket

2015-03-11 Thread Sean Barzilay
My min support is low and after filling out all my space I am applying a filter on the results to only get item seta that interest me On Wed, 11 Mar 2015 1:58 pm Sean Owen wrote: > Have you looked at how big your output is? for example, if your min > support is very low, you will output a massiv

Re: Spark fpg large basket

2015-03-11 Thread Sean Owen
Have you looked at how big your output is? for example, if your min support is very low, you will output a massive volume of frequent item sets. If that's the case, then it may be expected that it's taking ages to write terabytes of data. On Wed, Mar 11, 2015 at 8:34 AM, Sean Barzilay wrote: > Th

Re: Spark fpg large basket

2015-03-11 Thread Sean Barzilay
The program spends its time when I am writing the output to a text file and I am using 70 partitions On Wed, 11 Mar 2015 9:55 am Sean Owen wrote: > I don't think there is enough information here. Where is the program > spending its time? where does it "stop"? how many partitions are > there? > >

Re: Spark fpg large basket

2015-03-11 Thread Sean Owen
I don't think there is enough information here. Where is the program spending its time? where does it "stop"? how many partitions are there? On Wed, Mar 11, 2015 at 7:10 AM, Akhil Das wrote: > You need to set spark.cores.max to a number say 16, so that on all 4 > machines the tasks will get distr

Re: Spark fpg large basket

2015-03-11 Thread Akhil Das
You need to set spark.cores.max to a number say 16, so that on all 4 machines the tasks will get distributed evenly, Another thing would be to set spark.default.parallelism if you haven't tried already. Thanks Best Regards On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay wrote: > I am running on

Re: Spark fpg large basket

2015-03-10 Thread Akhil Das
Depending on your cluster setup (cores, memory), you need to specify the parallelism/repartition the data. Thanks Best Regards On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay wrote: > Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm from > the mllib library. When I am tryin

Spark fpg large basket

2015-03-10 Thread Sean Barzilay
Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm from the mllib library. When I am trying to run the algorithm over a large basket(over 1000 items) the program seems to never finish. Did anyone find a workaround for this problem?