Hi Aakash, I think what it generally means that you have to use the general spark APIs of Dataframe to bring in the data and crunch the numbers, however you cannot use the KMeansClustering algorithm which is already present in the MLlib spark library.
I think a good place to start would be understanding what the KMeans clustering algorithm is and then looking into how you can use the DataFrame API to implement the KMeansClustering. Thanks, Shashank On Thu, Sep 1, 2016 at 1:05 PM, Aakash Basu <aakash.spark....@gmail.com> wrote: > Hey Siva, > > It needs to be done with Spark, without the use of any Spark libraries. > Need some help in this. > > Thanks, > Aakash. > > On Fri, Sep 2, 2016 at 1:25 AM, Sivakumaran S <siva.kuma...@icloud.com> > wrote: > >> If you are to do it without Spark, you are asking at the wrong place. Try >> Python + scikit-learn. Or R. If you want to do it with a UI based software, >> try Weka or Orange. >> >> Regards, >> >> Sivakumaran S >> >> On 1 Sep 2016 8:42 p.m., Aakash Basu <aakash.spark....@gmail.com> wrote: >> >> >> ---------- Forwarded message ---------- >> From: *Aakash Basu* <aakash.spark....@gmail.com> >> Date: Thu, Aug 25, 2016 at 10:06 PM >> Subject: Need some help >> To: user@spark.apache.org >> >> >> Hi all, >> >> Aakash here, need a little help in KMeans clustering. >> >> This is needed to be done: >> >> "Implement Kmeans Clustering Algorithm without using the libraries of >> Spark. You're given a txt file with object ids and features from which you >> have to use the features as your data points. This will be a part of the >> code itself" >> >> PFA the file with ObjectIDs and features. Now how to go ahead and work on >> it? >> >> Thanks, >> Aakash. >> >> >> >