Re: Retrieving TermVectors from a Field over the full index?

Benjamin Pasero Mon, 11 Jun 2007 00:20:13 -0700


This is what I coded up till now:

TermEnum terms = reader.terms();
while (terms.next()) {
 String field = terms.term().field();
 if ("keywords".equals(field))
   keywords.put(terms.term().text(), terms.docFreq());
}

Your solution is doing more or less the same right?

Ben

Maybe I'm missing the boat, but I don't understand why TermEnum doesn't
work for you.
Try something like...

      TermEnum te = this.reader.terms();
       te.skipTo(new Term("keyword", ""));
       while (te.next()) {
           Term term = te.term();
           if (! term.field().equals("keyword")) {
               break;
           }
           System.out.println(term.text());
       }



On 6/10/07, Benjamin Pasero <[EMAIL PROTECTED]> wrote:

Erick Erickson wrote:
> Um, to return all counts of all terms in a field, what other option
> *is* there except to walk the whole thing?
>
> Have you looked at TermEnum, TermDocs, and TermFreqVector?
> For that matter, TermPositionVector might also be of some use.
>
> It would be easier to provide some help if you
> 1> mentioned what you'd tried already
> 2> mentioned what's inadequate about what you've tried.
Sorry for not being clear what I am trying to achieve. I am storing
documents in my index that are made of 5 Fields. One of the Fields
contains keywords that describe the document. Now, I need a fast
way of retrieving these keywords together with their frequency from
the index.

My current solution is to use IndexReader#terms() to walk over all
terms and count the ones that appear in the keyword-Field.

As you can assume, this is not scaling well. The content in the keywords
field is usually quite small, however, the other fields may store
up to thousands of terms.

What I am asking for is a way to walk all the terms of just the
keyword-field
in order to avoid having to walk all terms in all fields.

Of course, even better would be some API that would return a TermVector
from
the keyword-field. But I guess TermVectors are only supported on a per
Document level and not index level?

Regards,
Ben
>
> Best
> Erick
>
> On 6/9/07, Benjamin Pasero <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> I wonder if this is possible:
>>
>> Return all Terms of a Field in the Index together with the number of
>> occurances
>> in all documents.
>>

>> E.g. have 10 Documents with the Field "author" in the index, 5 ofthem

>> having
>> the value "foo" and 5 "bar" I would like to build a map with:
>>
>> [foo] -> 5
>> [bar] -> 5
>>
>> I looked at what Luke is doing to show the top terms of a given field
in
>> the
>> index and it seems to iterate over all terms (using
>> IndexReader#terms()). Isnt
>> that quite un-efficient? I would at least expect a method
>> IndexReader#terms(String field)
>> to limit the terms on the desired field.
>>
>> Thanks for helping,
>> Ben
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Retrieving TermVectors from a Field over the full index?

Reply via email to