Hi Jamal,
One nice feature of PySpark is that you can easily use existing functions
from NumPy and SciPy inside your Spark code. For a simple example, the
following uses Spark's cartesian operation (which combines pairs of vectors
into tuples), followed by NumPy's corrcoef to compute the pearson
c
Do you need cosine distance and correlation between vectors or between
variables (elements of vector)? It would be helpful if you could tell us
details of your task.
On Thu, May 22, 2014 at 5:49 PM, jamal sasha wrote:
> Hi,
> I have bunch of vectors like
> [0.1234,-0.231,0.23131]
> and s
Hi Jamal,
I don't believe there are pre-written algorithms for Cosine similarity or
Pearson Porrelation in PySpark that you can re-use. If you end up writing
your own implementation of the algorithm though, the project would
definitely appreciate if you shared that code back with the project for
f
Hi,
I have bunch of vectors like
[0.1234,-0.231,0.23131]
and so on.
and I want to compute cosine similarity and pearson correlation using
pyspark..
How do I do this?
Any ideas?
Thanks