Approximate Nearest Neighbors (ann) for Scala Spark

Kim, Min-Seok Fri, 09 Sep 2016 12:22:20 -0700

Hi,

I wrote a Scala implementation of Annoy(https://github.com/spotify/annoy)
which is an ann library.


https://github.com/mskimm/annoy4s

Because building tree in Annoy is done by a single node,
I thought the following solution:
 - building tree (index file) using `toLocalIterator` of RDD on the driver,
 - then quering nns on executors using the `index file` which is downloaded
by `sc.addFile`

Anybody reviews the code and idea?

I tested this implementation in Spark 1.6.2, and it seems work.

The code I tested was like
```
https://github.com/mskimm/annoy4s#item-similarity-computation
```

Minseok

Approximate Nearest Neighbors (ann) for Scala Spark

Reply via email to