Erick,
this RangeQuery thing looks promising. It might be a bit hacky but it
will most probably do the job in the given time and framework.
Thanks a lot,
Niels
Erick Erickson schrieb:
Well, assuming that token_count is an indexed field
in your documents (i.e. not something you're
computing on the fly), just use a RangeQuery for the numeric
part. Actually, you probably want to use
ConstantScoreRangeQuery...
The only thing you have to watch is that Lucene does a
lexical compare, so you have to index your numbers
as comparable strings, probably left-padding to some
fixed width with zeros, see NumberTools.
Best
Erick
On Thu, Oct 23, 2008 at 8:27 AM, Niels Ott <[EMAIL PROTECTED]>wrote:
Hi everybody,
I need to query for documents not only for search terms but also for
numeric values (or other general types). Let me try to explain with a
hypothetical example.
Assuming there is a value for the number words in each document (or the
number of person names, or whatever), I would want to formulate a query
like "Give me documents containing 'jack johnson' AND with token_count >
250".
I've been working with Lucene before and the keyword part is easy, but
what would be a good solution to query for numbers etc.?
One first idea I had was storing the numbers (which are basically a
HashMap<String,Double>) in the index in some way or the other. But it is
not at all obvious for me how to query them then.
Another thing I could think of would be using a separate database of any
type, but then how to bring those two together in a way that makes sense?
Any pointers to useful resources and any types of hints are welcome! :-)
Best,
Niels
--
Niels Ott
Computational Linguist (B.A.)
http://www.drni.de/niels/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]