On Wed, Feb 17, 2010 at 10:31:19AM -0500, Robert Muir wrote:
> yet if we don't do the hard work up front to make it easy to plug in things
> like BM25, then no one will implement additional scoring formulas for
> Lucene, we currently make it terribly difficult to do this.

FWIW... Similarity and posting format spec are so closely tied that I'm
considering linking them in Lucy.  

  Schema schema = new Schema();
  FullTextType bm25Type = new FullTextType(new BM25Similarity());
  schema.specField("content", bm25Type);
  schema.specField("title", bm25Type);
  StringType matchType = new StringType(new MatchSimilarity());
  schema.specField("category", matchType);

That way, custom scoring implementations can guarantee that they always have
the posting information they need available to make their similarity
judgments.  Similarity also becomes a more generalized notion, with the
TF/IDF-specific functionality moving into a subclass.

Maybe something similar could be made to work in Lucene.  Dunno how McCandless
has things set up for spec'ing codecs on the flex branch.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to