Re: Duplicate entries in output of mllib column similarities

2015-05-12 Thread Richard Bolkey
s in your vectors aren't sorted by > index. Is that the case? Sparse Vectors > <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala> > need sorted indices. > Reza > > On Sat, May 9, 2015 at 8:51 AM, Richard Bolkey wro

Re: Duplicate entries in output of mllib column similarities

2015-05-09 Thread Richard Bolkey
Hi Reza, After a bit of digging, I had my previous issue a little bit wrong. We're not getting duplicate (i,j) entries, but we are getting transposed entries (i,j) and (j,i) with potentially different scores. We assumed the output would be a triangular matrix. Still, let me know if that's expected