Fuzzy GroupBy

Mihran Shahinian Thu, 26 Mar 2015 13:48:48 -0700

I would like to group records, but instead of grouping on exact key I want
to be able to compute the similarity of keys on my own. Is there a
recommended way of doing this?


here is my starting point

final JavaRDD< pojo > records = spark.parallelize(getListofPojos()).cache();
class pojo {
 String prop1
 String prop2
}

during groupBy I would like to compute similarity between prop1 for each
pojo.

Much appreciated,
Mihran

Fuzzy GroupBy

Reply via email to