Compute pairwise distance

Driesprong, Fokko Wed, 29 Apr 2015 13:12:13 -0700

Dear Sparkers,

I am working on an algorithm which requires the pair distance between all
points (eg. DBScan, LOF, etc.). Computing this for *n* points will require
produce a n^2 matrix. If the distance measure is symmetrical, this can be
reduced to (n^2)/2. What would be the most optimal way of computing this?


The paper *Pairwise Element Computation with MapReduce
<https://www.cs.ucsb.edu/~ravenben/classes/290F/papers/kvl10.pdf>* paper
describes different approaches to optimize this process within a map-reduce
model. Although I don't believe this is applicable to Spark. How would you
guys approach this?

I first thought about broadcasting the original points to all the workers,
and then compute the distances across the different workers. Although this
requires all the points to be distributed across all the machines. But this
feels rather brute-force, what do you guys think.

I don't expect full solutions, but some pointers would be great. I think a
good solution can be re-used for many algorithms.

Kind regards,
Fokko Driesprong

Compute pairwise distance

Reply via email to