Adding back to the user for the second question.
Thanx a lot . That worked .
>
> Now I just have one more doubt. In ALS algorithm, I want to store the
> generated model in a csv file.
> *model.save(path) *saves the model in parquet format.
>
> And when I try to save *MatrixFactorizationModel *as a
What you should be doing is a join, something like this:
//Create a key, value pair, key being the column1
val rdd1 = sc.textFile(file1).map(x => (x.split(",")(0),x.split(","))
//Create a key, value pair, key being the column2
val rdd2 = sc.textFile(file2).map(x => (x.split(",")(1),x.split(","))
I have stored the contents of two csv files in separate RDDs.
file1.csv format*: (column1,column2,column3)*
file2.csv format*: (column1, column2)*
*column1 of file1 *and* column2 of file2 *contains similar data. I want to
compare the two columns and if match is found:
- Replace the data at *c
any specific reason you would like to use collectasmap only? You probably
move to normal RDD instead of a Pair.
On Monday, March 21, 2016, Mark Hamstra wrote:
> You're not getting what Ted is telling you. Your `dict` is an RDD[String]
> -- i.e. it is a collection of a single value type, Strin
You're not getting what Ted is telling you. Your `dict` is an RDD[String]
-- i.e. it is a collection of a single value type, String. But
`collectAsMap` is only defined for PairRDDs that have key-value pairs for
their data elements. Both a key and a value are needed to collect into a
Map[K, V].
It is defined in:
core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman wrote:
> I am using following code snippet in scala:
>
>
> *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
> *val dict_broadcast=sc.broadcast(dict.collect