Re: Error using collectAsMap() in scala

2016-03-21 Thread Akhil Das
Adding back to the user for the second question. Thanx a lot . That worked . > > Now I just have one more doubt. In ALS algorithm, I want to store the > generated model in a csv file. > *model.save(path) *saves the model in parquet format. > > And when I try to save *MatrixFactorizationModel *as a

Re: Error using collectAsMap() in scala

2016-03-20 Thread Akhil Das
What you should be doing is a join, something like this: //Create a key, value pair, key being the column1 val rdd1 = sc.textFile(file1).map(x => (x.split(",")(0),x.split(",")) //Create a key, value pair, key being the column2 val rdd2 = sc.textFile(file2).map(x => (x.split(",")(1),x.split(","))

Re: Error using collectAsMap() in scala

2016-03-20 Thread Shishir Anshuman
I have stored the contents of two csv files in separate RDDs. file1.csv format*: (column1,column2,column3)* file2.csv format*: (column1, column2)* *column1 of file1 *and* column2 of file2 *contains similar data. I want to compare the two columns and if match is found: - Replace the data at *c

Re: Error using collectAsMap() in scala

2016-03-20 Thread Prem Sure
any specific reason you would like to use collectasmap only? You probably move to normal RDD instead of a Pair. On Monday, March 21, 2016, Mark Hamstra wrote: > You're not getting what Ted is telling you. Your `dict` is an RDD[String] > -- i.e. it is a collection of a single value type, Strin

Re: Error using collectAsMap() in scala

2016-03-20 Thread Mark Hamstra
You're not getting what Ted is telling you. Your `dict` is an RDD[String] -- i.e. it is a collection of a single value type, String. But `collectAsMap` is only defined for PairRDDs that have key-value pairs for their data elements. Both a key and a value are needed to collect into a Map[K, V].

Re: Error using collectAsMap() in scala

2016-03-19 Thread Ted Yu
It is defined in: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman wrote: > I am using following code snippet in scala: > > > *val dict: RDD[String] = sc.textFile("path/to/csv/file")* > *val dict_broadcast=sc.broadcast(dict.collect