Re: MLLib sample data format

2014-06-22 Thread Justin Yip
I see. That's good. Thanks. Justin On Sun, Jun 22, 2014 at 4:59 PM, Evan Sparks wrote: > Oh, and the movie lens one is userid::movieid::rating > > - Evan > > On Jun 22, 2014, at 3:35 PM, Justin Yip wrote: > > Hello, > > I am looking into a couple of MLLib data files in > https://github.com/ap

Re: MLLib sample data format

2014-06-22 Thread Evan Sparks
Oh, and the movie lens one is userid::movieid::rating - Evan > On Jun 22, 2014, at 3:35 PM, Justin Yip wrote: > > Hello, > > I am looking into a couple of MLLib data files in > https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any > explanation for these files? Does a

Re: MLLib sample data format

2014-06-22 Thread Evan Sparks
These files follow the libsvm format where each line is a record, the first column is a label, and then after that the fields are offset:value where offset is the offset into the feature vector, and value is the value of the input feature. This is a fairly efficient representation for sparse b

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
Hi Shuo, Yes. I was reading the guide as well as the sample code. For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, now where in the github repository I can find the file: sc.textFile( "mllib/data/ridge-data/lpsa.data"). Thanks. Jus

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
Hi Shuo, Yes. I was reading the guide as well as the sample code. For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, nowhere in the github repository I can find the file: sc.textFile( "mllib/data/ridge-data/lpsa.data"). Thanks. Justi

Re: MLLib sample data format

2014-06-22 Thread Shuo Xiang
Hi, you might find http://spark.apache.org/docs/latest/mllib-guide.html helpful. On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip wrote: > Hello, > > I am looking into a couple of MLLib data files in > https://github.com/apache/spark/tree/master/data/mllib. But I cannot find > any explanation for th