Erick Erickson wrote:
Um, to return all counts of all terms in a field, what other option
*is* there except to walk the whole thing?

Have you looked at TermEnum, TermDocs, and TermFreqVector?
For that matter, TermPositionVector might also be of some use.

It would be easier to provide some help if you
1> mentioned what you'd tried already
2> mentioned what's inadequate about what you've tried.
Sorry for not being clear what I am trying to achieve. I am storing
documents in my index that are made of 5 Fields. One of the Fields
contains keywords that describe the document. Now, I need a fast
way of retrieving these keywords together with their frequency from
the index.

My current solution is to use IndexReader#terms() to walk over all
terms and count the ones that appear in the keyword-Field.

As you can assume, this is not scaling well. The content in the keywords
field is usually quite small, however, the other fields may store
up to thousands of terms.

What I am asking for is a way to walk all the terms of just the keyword-field
in order to avoid having to walk all terms in all fields.

Of course, even better would be some API that would return a TermVector from
the keyword-field. But I guess TermVectors are only supported on a per
Document level and not index level?

Regards,
Ben

Best
Erick

On 6/9/07, Benjamin Pasero <[EMAIL PROTECTED]> wrote:

Hi,

I wonder if this is possible:

Return all Terms of a Field in the Index together with the number of
occurances
in all documents.

E.g. have 10 Documents with the Field "author" in the index, 5 of them
having
the value "foo" and 5 "bar" I would like to build a map with:

[foo] -> 5
[bar] -> 5

I looked at what Luke is doing to show the top terms of a given field in
the
index and it seems to iterate over all terms (using
IndexReader#terms()). Isnt
that quite un-efficient? I would at least expect a method
IndexReader#terms(String field)
to limit the terms on the desired field.

Thanks for helping,
Ben


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to