Before you go too far down this path, please consider what a "hit" is. It's more complicated than you think <G>.
If all you want to do is count up the number of times any term appears in the document, it's not too hard. You should be able to use a termenum/termdocs process to count them. TermDocs should work, just seek to a term, skip to the document number (which you'll have to get somewhere else), and keep adding to your count while the docid is the same as your target. Repeat for each term. But it's a much more complicated story if you want to accurately reflect a query. For instance, consider a near query, that is terms within, say, 3 of each other. If you do something like the above, you'll present "hits" that aren't real. For instance... a b c d e f g h i j a if you search for a and c within 3 of each other, is this one hit? two? it definitely isn't three which is what you'd get if you just counted the occurrence of the terms a, b... What about a NOT clause? How does a phrase query get counted? There have been several discussions of various aspects of this issue, but often in the context of highlighting. You'll probably get some good information from the following threads... Counting terms' hits from phrases Counting hits in a document as well as searching the archive on highlighting and/or hitcount Best Erick On 2/7/07, csahat <[EMAIL PROTECTED]> wrote:
Hi all, I'm so sorry if this question already answered before in this list, but I already search the list, and I couldn't find the answer. This is what I want to do : When the user type in the query, for example "WebSphere Java", Lucene will show not only the score, but showing the term count per document as well, like this doc1 0.8333 websphere=3, Java = 2 doc2 0.817 websphere=2, Java=2 I already tried to implement with TermFreqVector, but TermFreqVector show all the terms in the field, instead what I want is only the terms that happen in the query. I already tried using TermDocs as well, but it always gave result 0. I tried using Explanation class, using toString method, but I have to "clean" the information. Is there any "direct" way to do this in Lucene ? Or perhaps someone can give me a hint ? Thanks in advance