My min support is low and after filling out all my space I am applying a
filter on the results to only get item seta that interest me
On Wed, 11 Mar 2015 1:58 pm Sean Owen wrote:
> Have you looked at how big your output is? for example, if your min
> support is very low, you will output a massiv
Have you looked at how big your output is? for example, if your min
support is very low, you will output a massive volume of frequent item
sets. If that's the case, then it may be expected that it's taking
ages to write terabytes of data.
On Wed, Mar 11, 2015 at 8:34 AM, Sean Barzilay wrote:
> Th
The program spends its time when I am writing the output to a text file and
I am using 70 partitions
On Wed, 11 Mar 2015 9:55 am Sean Owen wrote:
> I don't think there is enough information here. Where is the program
> spending its time? where does it "stop"? how many partitions are
> there?
>
>
I don't think there is enough information here. Where is the program
spending its time? where does it "stop"? how many partitions are
there?
On Wed, Mar 11, 2015 at 7:10 AM, Akhil Das wrote:
> You need to set spark.cores.max to a number say 16, so that on all 4
> machines the tasks will get distr
You need to set spark.cores.max to a number say 16, so that on all 4
machines the tasks will get distributed evenly, Another thing would be to
set spark.default.parallelism if you haven't tried already.
Thanks
Best Regards
On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay
wrote:
> I am running on
Depending on your cluster setup (cores, memory), you need to specify the
parallelism/repartition the data.
Thanks
Best Regards
On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay
wrote:
> Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm from
> the mllib library. When I am tryin
Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm from
the mllib library. When I am trying to run the algorithm over a large
basket(over 1000 items) the program seems to never finish. Did anyone find
a workaround for this problem?