Hi, I have my RDD that stores the titles of some articles: 1. "About Spark Streaming" 2. "About Spark MLlib" 3. "About Spark SQL" 4. "About Spark Installation" 5. "Kafka Streaming" 6. "Kafka Setup" 7. ....
I need to build a model to find titles by similarity, e.g if given "About Spark", hope to get: "About Spark Installation", 0.98622 (where 0.98622 is the score of similarity, range between 0 to 1) "About Spark MLlib", 0.95394 "About Spark Streaming", 0.94332 "About Spark SQL", 0.9111 Any idea or reference to do so? Thanks Ascot and need to find out similar titles