Maybe I'm missing the boat, but I don't understand why TermEnum doesn't work for you. Try something like...
TermEnum te = this.reader.terms(); te.skipTo(new Term("keyword", "")); while (te.next()) { Term term = te.term(); if (! term.field().equals("keyword")) { break; } System.out.println(term.text()); } On 6/10/07, Benjamin Pasero <[EMAIL PROTECTED]> wrote:
Erick Erickson wrote: > Um, to return all counts of all terms in a field, what other option > *is* there except to walk the whole thing? > > Have you looked at TermEnum, TermDocs, and TermFreqVector? > For that matter, TermPositionVector might also be of some use. > > It would be easier to provide some help if you > 1> mentioned what you'd tried already > 2> mentioned what's inadequate about what you've tried. Sorry for not being clear what I am trying to achieve. I am storing documents in my index that are made of 5 Fields. One of the Fields contains keywords that describe the document. Now, I need a fast way of retrieving these keywords together with their frequency from the index. My current solution is to use IndexReader#terms() to walk over all terms and count the ones that appear in the keyword-Field. As you can assume, this is not scaling well. The content in the keywords field is usually quite small, however, the other fields may store up to thousands of terms. What I am asking for is a way to walk all the terms of just the keyword-field in order to avoid having to walk all terms in all fields. Of course, even better would be some API that would return a TermVector from the keyword-field. But I guess TermVectors are only supported on a per Document level and not index level? Regards, Ben > > Best > Erick > > On 6/9/07, Benjamin Pasero <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> I wonder if this is possible: >> >> Return all Terms of a Field in the Index together with the number of >> occurances >> in all documents. >> >> E.g. have 10 Documents with the Field "author" in the index, 5 of them >> having >> the value "foo" and 5 "bar" I would like to build a map with: >> >> [foo] -> 5 >> [bar] -> 5 >> >> I looked at what Luke is doing to show the top terms of a given field in >> the >> index and it seems to iterate over all terms (using >> IndexReader#terms()). Isnt >> that quite un-efficient? I would at least expect a method >> IndexReader#terms(String field) >> to limit the terms on the desired field. >> >> Thanks for helping, >> Ben >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]