Hi,
I have some questions about how to use Lucene for the specific purpose of finding document similarities. Lucene seems to have classes that were made for this, including: ClassicSimilarity and BM25Similarity. However I’m fumbling a bit when it comes to implementing them. >From what I understand, to use these classes you simply set the similarity of your IndexWriter and IndexSearcher, then submit a query. The documents returned from your query should be ordered from highest to lowest similarity. My initial thought was to just use a phrase query to hold the "document" I want to find similarities to, but phrase queries are limited in that they will only return results that are deemed to fall within a certain slop value. Term/Boolean queries are similarly limited in that they allow documents to be sorted only if they contain all the terms in the query. Ideally, I’d like to submit a query that would essentially be a document itself. Each of my queries contain 10 or so phrases, that each contain 5-10 terms. I would like to compare this query with all the documents in my index to see which is the most similar, and which is the least similar. I feel as if there is an easy way to do this that I'm missing, after all, I essentially just want to remove a step from the process. Any help would be much appreciated. Thank you, -John B