I have a situation similar to the following that I'm trying to solve: I have a field in my document that contains a range of numbers. Say, for example, the universe of numbers is the range of integers from 0-100. My field represents a subrange of those numbers in a token stream. So, for example, if one document contains 20-30, it's token stream contains the terms [20, 21, 22, ..., 29]. Now I can quickly find all documents that contain some number.
The next part of the problem is searching for all documents that intersect with some subrange of numbers. Somewhat like a range query, but not exactly. Say I want to search for all documents that touch the range [10, 30]. My original implementation was to simply create a BooleanQuery full of TermQuerys for each term in the range i was searching for. While this returned the proper results, it did so with skewed scores. I'd prefer documents containing numbers towards the beginning of my search range to be scored higher than docs towards the end. So, if I had two documents, one with 10-20, and one with 20-30, and I searched for [19,30], both documents would be returned, but the second would be much more highly scored due to its higher number of matched terms. So, my plan is to write a custom query which matches documents documents in my range in a way such as: for (term : queryRange) { TermDocs td = searcher.termDocs(term); while (td.next()) { ... } } And for each document, set the score to some vale proportional to the matching term's distance from the beginning of the queried range. My question is: what score should I start at, and what score should I end at? If i assume that all documents matching the first term in my queried range have score scoreMax, and all documents matching the last term have scoreMin, and all documents matching in-between terms have a score between scoreMax and scoreMin proportional to where they fall within the range, what should scoreMax and scoreMin be? My current thought is to start with the value passed to my Weight's normalize() method, and work down to 0.0. Thanks, Jeremy