Have you already checked Solr's more like this? http://wiki.apache.org/solr/MoreLikeThisHandler and http://wiki.apache.org/solr/MoreLikeThis Your describe a problem similar to the use case of that component and if there is something to hack is solr's more like this.
Lucene's similarity is a low level class used by some queries (for example TermQueries) but I don't think that you need something so low level from what you describe. Thanks Emmanuel 2013/3/6 Michael O'Leary <mich...@seomoz.org> > Is there an api in Lucene for finding the similarity score for two > documents that have been randomly pulled from an index? What about for a > query and a randomly selected document? > > I realize this isn't the standard purpose of Lucene, but I was given a task > to compare similarity scores for the Similarity classes defined in Lucene > 4.x using a somewhat large predefined set of documents and query strings, > and I am finding that collecting the results by indexing the documents in > separate indexes with each of the Similarity classes, searching using the > query strings, locating the subset of documents in the results that I am > interested in and recording the scores is taking quite a long time. > > I am about to look through the Lucene source code to see how the Similarity > classes are used in normal use cases such as search and more-like-this, but > if someone could direct me on where to look, or, even better, knows of an > api function that takes a pair of documents, or a query and a document, and > returns a similarity score for them, I would greatly appreciate it. > Thanks, > Mike >