Hi Hoss, I didn't end up writing my own query (well I did, but all it does is rewrite into another query). I found DisjunctionMaxQuery, which seemed a good fit for what I was trying to do. Instead of TermQuery, I used ConstantScoreQuery combined with TermsFilter to create queries that weren't dependent upon the Term's scores. For each ConstantScoreQuery, I set the boost much as you suggested.
What's the difference in this case between using a DisjunctionMaxQuery, which is what I've done, and using a BooleanQuery with disabled coord? And, if I set omit norms, will TermQuery essential return constant scores for terms? Does the use of DMQ + CSQ + TermsFilter throw up any red flags in your experience? Thanks again, Jeremy On Tue, Apr 27, 2010 at 2:14 PM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > First off: if you haven't already make sure you OMIT_NORMS when indexing > this field, that way you don't have to worry about docs with "lots" of > numbers scoring low purely because of hte fieldNorm. > > Second: i wouldn't bother with a custom query, i would stick with your > BooleanQuery appraoch, but make sure you do two things: > > 1) add boosts to all of your TermQueries a boost based on how far they are > from the end of hte range. so if you have a rangle like [10 19] give the > 19 clauses a boost of 1, the 18 clause a boost of 2, the 17 clause a boost > of 3, etc... > > 2) disable the coord. there is an option on BooleanQuery to do this, and > it will make sure docs that only match one clause in your BooleanQuery > dont' get a penalty compared to clauses that match many clauses in your > BooleanQuery -- which is going to be important in ensuring that your > boosts are useful. > > That should get you what you want, and if not then take a look at the > score explaiantions and see if anything obvious jumps out -- post a > followup with your code and the score explanations if you can't solve it > to your liking. > > : I have a field in my document that contains a range of numbers. Say, for > : example, the universe of numbers is the range of integers from 0-100. My > : field represents a subrange of those numbers in a token stream. So, for > : example, if one document contains 20-30, it's token stream contains the > : terms [20, 21, 22, ..., 29]. Now I can quickly find all documents that > : contain some number. > : > : The next part of the problem is searching for all documents that > intersect > : with some subrange of numbers. Somewhat like a range query, but not > exactly. > : Say I want to search for all documents that touch the range [10, 30]. My > : original implementation was to simply create a BooleanQuery full of > : TermQuerys for each term in the range i was searching for. While this > : returned the proper results, it did so with skewed scores. I'd prefer > : documents containing numbers towards the beginning of my search range to > be > : scored higher than docs towards the end. So, if I had two documents, one > : with 10-20, and one with 20-30, and I searched for [19,30], both > documents > : would be returned, but the second would be much more highly scored due to > : its higher number of matched terms. > : > : So, my plan is to write a custom query which matches documents documents > in > : my range in a way such as: > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >