ering from the low statistics problem Erick
described. We use an FST (see org.apache.lucene.util.fst.Builder) to hold the
stats in memory so that the lookups are fast.
Jim
From: Erick Erickson
Sent: 22 October 2015 15:15
To: java-user
Subject: Re: Scoring
bq: Given that the content loaded for these indexes
represents individually curated terminologies, I think we can argue to our
users that what comes from combined queries over the latter is as
meaningful in it¹s own right as those run over the monolithic index
If one assumes that the individually
Thanks for your reply. We¹ve recently moved from a single large index to
multiple indexes. Given that the content loaded for these indexes
represents individually curated terminologies, I think we can argue to our
users that what comes from combined queries over the latter is as
meaningful in it¹s
In a word, no. At least not that I've heard of. "normalizing scores"
is one of those things
that sounds reasonable on the surface, but is really meaningless.
Scores don't really
_tell_ you anything about the abstract "goodness" of a doc, they just
tell you that
doc1 is likely better than doc2 _with
We have a test case that boosts a set of terms. Something along the lines of
“term1^2 AND term2^3 AND term3^4 and this query runs over a two content
distinct indexes. Our expectation is that the terms would be returned to us as
term3, term2 and term1. Instead we get something along the lines