Watch the app manager it should tell you what's running and taking awhile... My 
guess it's a "distinct" function on the data.
J

Sent from my iPhone

> On Oct 30, 2014, at 8:22 AM, peng xia <[email protected]> wrote:
> 
> Hi,
> 
>  
> 
> Previous we have applied SVM algorithm in MLlib to 5 million records (600 
> mb), it takes more than 25 minutes to finish.
> The spark version we are using is 1.0 and we were running this program on a 4 
> nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
> 
> The 5 million records only have two distinct records (One positive and one 
> negative), others are all duplications.
> 
> Any one has any idea on why it takes so long on this small data?
> 
>  
> 
> Thanks,
> Best,
> 
> Peng

Reply via email to