Hi, I'm new to spark, I'm trying to compute similarity between users/products. I've a huge table which I can't do a self join with the cluster I have.
I'm trying to implement do self join using random walk methodology which will approximately give the results. The table is a bipartite graph with 2 columns Idea: take any element(t1) in the first column in random picking the corresponding element(t2) in for the element(t1) in the graph. lookup for possible elements in the graph for t2 in random say t3 create a edge between t1 and t3 Iterate it in the order of atleat n*n so that results will be approximate Questions Is spark a suitable environment to do this? I've coded logic for picking elements in random but facing issue when building graph Should consider graphx? Any help is highly appreciated. Regards, Naveen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Implementation-of-random-algorithm-walk-in-spark-tp26360.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org