I tend to agree with you Marvin, you are right, the different scoring mechanisms need different information available and this is the problem.
although last I checked, one hard part of BM25 rotates around fields versus documents... e.g. BM25's IDF calculation. but maybe this is just an extreme form of your example :) On Wed, Feb 17, 2010 at 11:39 AM, Marvin Humphrey <mar...@rectangular.com>wrote: > On Wed, Feb 17, 2010 at 10:31:19AM -0500, Robert Muir wrote: > > yet if we don't do the hard work up front to make it easy to plug in > things > > like BM25, then no one will implement additional scoring formulas for > > Lucene, we currently make it terribly difficult to do this. > > FWIW... Similarity and posting format spec are so closely tied that I'm > considering linking them in Lucy. > > Schema schema = new Schema(); > FullTextType bm25Type = new FullTextType(new BM25Similarity()); > schema.specField("content", bm25Type); > schema.specField("title", bm25Type); > StringType matchType = new StringType(new MatchSimilarity()); > schema.specField("category", matchType); > > That way, custom scoring implementations can guarantee that they always > have > the posting information they need available to make their similarity > judgments. Similarity also becomes a more generalized notion, with the > TF/IDF-specific functionality moving into a subclass. > > Maybe something similar could be made to work in Lucene. Dunno how > McCandless > has things set up for spec'ing codecs on the flex branch. > > Marvin Humphrey > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com