Re: updating index

Erick Erickson Sun, 25 Feb 2007 07:05:48 -0800

Yes, I'm pretty sure you have to index the field (UN_TOKENIZED) to be able
to fetch it with TermDocs/TermEnum! The loop I posted works like this....


for each term in the index for the field
   if  this is one I want to update
        use a TermDocs to get to that document and operate on it.


But this is actually pretty silly. Your loop uses a better approach, except
you're not using TermDocs correctly. Try

    TermDocs tDocs = new IndexReader.TermDocs()
    for (Business biz : updates)
      {
          Term t = new Term("id", biz.getId());
          tDocs.seek(t);
          while (tDocs.next())
          {
              Document doc = reader.document(tDocs.doc());
          }
      }

But TermDocs/TermEnum is looking at terms in the index. If you haven't
indexed the term, you won't find it, so your Field.Index.NO is really
hurting you here.

Best
Erick

On 2/24/07, no spam <[EMAIL PROTECTED]> wrote:


I didn't fully understand your last post and why I wanted to do
IndexReader.terms() then IndexReader.termDocs().  Won't something like
this
work?

        for (Business biz : updates)
        {
            Term t = new Term("id", biz.getId()+"");
            TermDocs tDocs = reader.termDocs(t);

            while (tDocs.next())
            {
                Document doc = reader.document(tDocs.doc());
            }
        }

But tDocs never contains any docs.   Is this because I've indexed my pk
like
this:

doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO));

instead of

doc.add(new Field("id", biz.getId(), Field.Store.YES,
Field.Index.UNTOKENIZED));

Mark

On 2/21/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> I think you can get MUCH better efficiency by using TermEnum/TermDocs.
But
> I
> think you need to index (UN_TOKENIZED) your primary key (although now
I'm
> not sure. But I'd be surprised if TermEnum worked with un-indexed data.
> Still, it'd be worth trying but I've always assumed that TermEnums only
> worked on indexed fields....).....
>
> Anyway, your loop looks more like this...
>
> TermEnum terms = IndexReader.terms(new Term("primarykey", ""));
> TermDocs tDocs = IndexRreader.termDocs();
>
> while (terms.next()) {
>    if (docsToUpdate.contains(terms.text()) {
>        tDocs.seek(terms.term());
>        writer.updateDocument(tDocs.doc());
>    }
> }
>
> NOTE: I've been fast and loose with edge conditions, like insuring that
> while (terms.next()) doesn't skip the first term, so caveat emptor....
> This
> loop also assumes that there is one and only one document in your index
> with
> the primary key. Otherwise, you have to do some more work with the
> TermDocs
> class to process each document that has your primary key...
>
> This is similar to creating Lucene filters, which is very fast....
>
> Hope this helps
> Erick
>
>
>
>

Re: updating index

Reply via email to