Re: Mahout rowSimilarity

2016-05-04 Thread Pat Ferrel
Here is an example that takes a PairRDD, which is an RDD of pairs of strings. The row-id and column-id are expected in the pair. This method inputs each element in the sparse matrix individually. So if the row-id is a user-id and the column-id is an item-id it will turn them into an IndexedDatas

Re: Mahout rowSimilarity

2016-05-04 Thread Rohit Jain
I am still looking searching for my answer. It will be great if somebody can help me with this :) On Wed, May 4, 2016 at 11:25 AM, Rohit Jain wrote: > And If yes, can you please help me with what exactly do you mean by "You > can then just write some simple pre processing code that converts your

Re: Mahout rowSimilarity

2016-05-03 Thread Rohit Jain
And If yes, can you please help me with what exactly do you mean by "You can then just write some simple pre processing code that converts your database files to the appropriate format for Mahout and read it in as an indexed dataset." On Wed, May 4, 2016 at 11:21 AM, Rohit Jain wrote: > Hello Ni

Re: Mahout rowSimilarity

2016-05-03 Thread Rohit Jain
Hello Nikaash, So you mean I need to first read data from my mogodb using scala's mongo driver and then convert it into indexed datasets. And then process it using row similarity? On Wed, May 4, 2016 at 7:56 AM, Nikaash Puri wrote: > Hi Rohit, > > This would be a good place to start. > https://g

Re: Mahout rowSimilarity

2016-05-03 Thread Nikaash Puri
Hi Rohit, This would be a good place to start. https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala This bit

Re: Mahout rowSimilarity

2016-05-03 Thread Rohit Jain
Hello Pat, Can you please explain it in little detail. I didn't understand how to go about it. On Tue, May 3, 2016 at 9:08 PM, Pat Ferrel wrote: > Sure, but at least some would be Scala. There are examples in Mahout that > take PairRDDs as input but anything that constructs an IndexedDataset wou

Re: Mahout rowSimilarity

2016-05-03 Thread Pat Ferrel
Sure, but at least some would be Scala. There are examples in Mahout that take PairRDDs as input but anything that constructs an IndexedDataset would be fine. I use this code in a system that creates an RDD from HBase. Think of the task as one of how to create a Spark RDD from your DB content.

Mahout rowSimilarity

2016-05-03 Thread Rohit Jain
Hello Everyone, I have products and there are certain associated tags to each product. So to find similar products I am using mahout spark-rowsimilarity algorithm in following manner. $MAHOUT_HOME/mahout spark-rowsimilarity -i hdfs://0.0.0.0:9000/wtrousers -o hdfs://0.0.0.0:9000/s_trousers_out1/ -

Mahout RowSimilarity

2016-02-12 Thread Remy
Hello, I am trying to run the RowSimilarity algorithm on Mahout 0.11.0 (on a single node cluster), but I cannot seem to find a way to transform my data into the appropriate SequenceFile format. My data is in a CSV file, with each row being user_id, tag_id, rating. 4, 1233, 0.3 4, 98, 0.7 12,