Watch the app manager it should tell you what's running and taking awhile... My guess it's a "distinct" function on the data. J
Sent from my iPhone > On Oct 30, 2014, at 8:22 AM, peng xia <[email protected]> wrote: > > Hi, > > > > Previous we have applied SVM algorithm in MLlib to 5 million records (600 > mb), it takes more than 25 minutes to finish. > The spark version we are using is 1.0 and we were running this program on a 4 > nodes cluster. Each node has 4 cpu cores and 11 GB RAM. > > The 5 million records only have two distinct records (One positive and one > negative), others are all duplications. > > Any one has any idea on why it takes so long on this small data? > > > > Thanks, > Best, > > Peng
