DId you cache the data and check the load balancing? How many
features? Which API are you using, Scala, Java, or Python? -Xiangrui

On Thu, Oct 30, 2014 at 9:13 AM, Jimmy <[email protected]> wrote:
> Watch the app manager it should tell you what's running and taking awhile...
> My guess it's a "distinct" function on the data.
> J
>
> Sent from my iPhone
>
> On Oct 30, 2014, at 8:22 AM, peng xia <[email protected]> wrote:
>
> Hi,
>
>
>
> Previous we have applied SVM algorithm in MLlib to 5 million records (600
> mb), it takes more than 25 minutes to finish.
> The spark version we are using is 1.0 and we were running this program on a
> 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
>
> The 5 million records only have two distinct records (One positive and one
> negative), others are all duplications.
>
> Any one has any idea on why it takes so long on this small data?
>
>
>
> Thanks,
> Best,
>
> Peng

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to