On Jun 16, 2005, at 3:03 PM, Sean O'Connor wrote:
The big stumbling block I have at the moment is understanding whether Terms can be used to find something like a phrase query, proximity query, or boolean query. I think the answer is no, two different concepts.

Terms are the heart of searching. All queries boil down to finding specific terms. A PhraseQuery finds terms exactly side-by-side position-wise (or near by some slop factor). BooleanQuery is an aggregation of other primitive queries, so its sort of special in that it doesn't deal with terms directly, but all primitive aggregated queries do. A PrefixQuery is not a "primitive" query - it rewrites itself to a BooleanQuery OR'd with all the matching terms. So the terms are the "atoms" of Lucene. Hopefully this is in-line with the conceptual stumbling block.

But I also tend to think that the wheel has already been invented to find how many times a phrase (i.e. "Lucene in Action") appears in a document. Before I go digging through the source code, and possibly creating some rather embarrassing hack(s), I thought I would check to see if there is a 'right' way to go about this.

Deep within PhraseQuery the number of occurrences is tracked and used in scoring. But at the high level of searching, that detail is not exposed. Using IndexSearcher.explain to see the tf() of a PhraseQuery may help, but by default it'll be the square root of the number of occurrences. (at least I hope I'm on track with that explanation - please correct me if I've misspoken) I'd love to hear how others have done this sort of thing - find the number of occurrences of a phrase.

Alternatively, any suggestions on what to google, or where to look to educate myself would be welcome as well.

You're in the right place. Keep asking - we're all here to learn and share what we know.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to