Re: How to binarize data in spark

2015-08-07 Thread Adamantios Corais
I have ended up with the following piece of code but is turns out to be really slow... Any other ideas provided that I can only use MLlib 1.2? val data = test11.map(x=> ((x(0) , x(1)) , x(2))).groupByKey().map(x=> (x._1 , x._2.toArray)).map{x=> var lt : Array[Double] = new Array[Double](test12.s

Re: How to binarize data in spark

2015-08-06 Thread Yanbo Liang
I think you want to flatten the 1M products to a vector of 1M elements, of course mostly are zero. It looks like HashingTF can help you. 2015-08-07 11:02 GMT+08:00 praveen S : > Use StringIndexer in MLib1.4 : > > htt

Re: How to binarize data in spark

2015-08-06 Thread praveen S
Use StringIndexer in MLib1.4 : https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/ml/feature/StringIndexer.html On Thu, Aug 6, 2015 at 8:49 PM, Adamantios Corais < adamantios.cor...@gmail.com> wrote: > I have a set of data based on which I want to create a classification > model. Each

How to binarize data in spark

2015-08-06 Thread Adamantios Corais
I have a set of data based on which I want to create a classification model. Each row has the following form: user1,class1,product1 > user1,class1,product2 > user1,class1,product5 > user2,class1,product2 > user2,class1,product5 > user3,class2,product1 > etc There are about 1M users, 2 classes, a