Hi,

I'm new to spark, I'm trying to compute similarity between users/products.
I've a huge table which I can't do a self join with the cluster I have.

I'm trying to implement do self join using random walk methodology which
will approximately give the results. The table is a bipartite graph with 2
columns

Idea:
take any element(t1) in the first column in random
picking the corresponding element(t2) in for the element(t1) in the graph.
lookup for possible elements in the graph for t2 in random say t3
create a edge between t1 and t3
Iterate it in the order of atleat n*n so that results will be approximate
Questions

Is spark a suitable environment to do this?
I've coded logic for picking elements in random but facing issue when
building graph
Should consider graphx?
Any help is highly appreciated.

Regards,
Naveen



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Implementation-of-random-algorithm-walk-in-spark-tp26360.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to