I would like to group records, but instead of grouping on exact key I want to be able to compute the similarity of keys on my own. Is there a recommended way of doing this?
here is my starting point final JavaRDD< pojo > records = spark.parallelize(getListofPojos()).cache(); class pojo { String prop1 String prop2 } during groupBy I would like to compute similarity between prop1 for each pojo. Much appreciated, Mihran