Re: Document Similarity -Spark Mllib

2016-12-15 Thread Liang-Chi Hsieh
like it might not work much better than brute-force even you set a higher threshold. - Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Document-Similarity-Spark-Mllib

Re: Document Similarity -Spark Mllib

2016-12-13 Thread Liang-Chi Hsieh
.n3.nabble.com/Document-Similarity-Spark-Mllib-tp20196p20219.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Document Similarity -Spark Mllib

2016-12-13 Thread satyajit vegesna
--- > Liang-Chi Hsieh | @viirya > Spark Technology Center > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Document-Similarity-Spark-Mllib- > tp20196p20198.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >

Re: Document Similarity -Spark Mllib

2016-12-10 Thread Liang-Chi Hsieh
viirya Spark Technology Center -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Document-Similarity-Spark-Mllib-tp20196p20198.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --

Document Similarity -Spark Mllib

2016-12-09 Thread satyajit vegesna
Hi ALL, I am trying to implement a mlllib spark job, to find the similarity between documents(for my case is basically home addess). i believe i cannot use DIMSUM for my use case as, DIMSUM is works well only with matrix with thin columns and more rows in matrix. matrix example format, for my us