Re: Computing cosine similiarity using pyspark

2014-05-27 Thread Jeremy Freeman
Hi Jamal, One nice feature of PySpark is that you can easily use existing functions from NumPy and SciPy inside your Spark code. For a simple example, the following uses Spark's cartesian operation (which combines pairs of vectors into tuples), followed by NumPy's corrcoef to compute the pearson c

Re: Computing cosine similiarity using pyspark

2014-05-23 Thread Andrei
Do you need cosine distance and correlation between vectors or between variables (elements of vector)? It would be helpful if you could tell us details of your task. On Thu, May 22, 2014 at 5:49 PM, jamal sasha wrote: > Hi, > I have bunch of vectors like > [0.1234,-0.231,0.23131] > and s

Re: Computing cosine similiarity using pyspark

2014-05-23 Thread Andrew Ash
Hi Jamal, I don't believe there are pre-written algorithms for Cosine similarity or Pearson Porrelation in PySpark that you can re-use. If you end up writing your own implementation of the algorithm though, the project would definitely appreciate if you shared that code back with the project for f

Computing cosine similiarity using pyspark

2014-05-22 Thread jamal sasha
Hi, I have bunch of vectors like [0.1234,-0.231,0.23131] and so on. and I want to compute cosine similarity and pearson correlation using pyspark.. How do I do this? Any ideas? Thanks