Re: Scoring over Multiple Indexes

2015-10-22 Thread McKinley, James T
ering from the low statistics problem Erick described. We use an FST (see org.apache.lucene.util.fst.Builder) to hold the stats in memory so that the lookups are fast. Jim From: Erick Erickson Sent: 22 October 2015 15:15 To: java-user Subject: Re: Scoring

Re: Scoring over Multiple Indexes

2015-10-22 Thread Erick Erickson
bq: Given that the content loaded for these indexes represents individually curated terminologies, I think we can argue to our users that what comes from combined queries over the latter is as meaningful in it¹s own right as those run over the monolithic index If one assumes that the individually

Re: Scoring over Multiple Indexes

2015-10-22 Thread Bauer, Herbert S. (Scott)
Thanks for your reply. We¹ve recently moved from a single large index to multiple indexes. Given that the content loaded for these indexes represents individually curated terminologies, I think we can argue to our users that what comes from combined queries over the latter is as meaningful in it¹s

Re: Scoring over Multiple Indexes

2015-10-22 Thread Erick Erickson
In a word, no. At least not that I've heard of. "normalizing scores" is one of those things that sounds reasonable on the surface, but is really meaningless. Scores don't really _tell_ you anything about the abstract "goodness" of a doc, they just tell you that doc1 is likely better than doc2 _with

Scoring over Multiple Indexes

2015-10-22 Thread Bauer, Herbert S. (Scott)
We have a test case that boosts a set of terms. Something along the lines of “term1^2 AND term2^3 AND term3^4 and this query runs over a two content distinct indexes. Our expectation is that the terms would be returned to us as term3, term2 and term1. Instead we get something along the lines