I'm working on implementing LSH on Spark. I start with an implementation
provided by SoundCloud:
https://github.com/soundcloud/cosine-lsh-join-spark/blob/master/src/main/scala/com/soundcloud/lsh/Lsh.scala
when I check WebUI, I see that after call sortBy, the number of partitions
of RDD descreases f
I'm training a model using MLLib. When I try to split data into training and
test data, I found a weird problem. I can't figure what problem is happening
here.
Here is my code in experiment:
val logData = rdd.map(x => (x._1, x._2)).distinct()
val ratings: RDD[Rating] = logData.map(x => Rating(x