DId you cache the data and check the load balancing? How many features? Which API are you using, Scala, Java, or Python? -Xiangrui
On Thu, Oct 30, 2014 at 9:13 AM, Jimmy <[email protected]> wrote: > Watch the app manager it should tell you what's running and taking awhile... > My guess it's a "distinct" function on the data. > J > > Sent from my iPhone > > On Oct 30, 2014, at 8:22 AM, peng xia <[email protected]> wrote: > > Hi, > > > > Previous we have applied SVM algorithm in MLlib to 5 million records (600 > mb), it takes more than 25 minutes to finish. > The spark version we are using is 1.0 and we were running this program on a > 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM. > > The 5 million records only have two distinct records (One positive and one > negative), others are all duplications. > > Any one has any idea on why it takes so long on this small data? > > > > Thanks, > Best, > > Peng --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
