BTW Erick this works brilliantly with UN_TOKENIZED. SUPER fast :)
On 2/25/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Yes, I'm pretty sure you have to index the field (UN_TOKENIZED) to be able
to fetch it with TermDocs/TermEnum! The loop I posted works like this....
for each term in the index for the field
if this is one I want to update
use a TermDocs to get to that document and operate on it.
But this is actually pretty silly. Your loop uses a better approach,
except
you're not using TermDocs correctly. Try
TermDocs tDocs = new IndexReader.TermDocs()
for (Business biz : updates)
{
Term t = new Term("id", biz.getId());
tDocs.seek(t);
while (tDocs.next())
{
Document doc = reader.document(tDocs.doc());
}
}
But TermDocs/TermEnum is looking at terms in the index. If you haven't
indexed the term, you won't find it, so your Field.Index.NO is really
hurting you here.
Best
Erick
On 2/24/07, no spam <[EMAIL PROTECTED]> wrote:
>
> I didn't fully understand your last post and why I wanted to do
> IndexReader.terms() then IndexReader.termDocs(). Won't something like
> this
> work?
>
> for (Business biz : updates)
> {
> Term t = new Term("id", biz.getId()+"");
> TermDocs tDocs = reader.termDocs(t);
>
> while (tDocs.next())
> {
> Document doc = reader.document(tDocs.doc());
> }
> }
>
> But tDocs never contains any docs. Is this because I've indexed my pk
> like
> this:
>
> doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO));
>
> instead of
>
> doc.add(new Field("id", biz.getId(), Field.Store.YES,
> Field.Index.UNTOKENIZED));
>
> Mark
>
> On 2/21/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
> >
> > I think you can get MUCH better efficiency by using TermEnum/TermDocs.
> But
> > I
> > think you need to index (UN_TOKENIZED) your primary key (although now
> I'm
> > not sure. But I'd be surprised if TermEnum worked with un-indexed
data.
> > Still, it'd be worth trying but I've always assumed that TermEnums
only
> > worked on indexed fields....).....
> >
> > Anyway, your loop looks more like this...
> >
> > TermEnum terms = IndexReader.terms(new Term("primarykey", ""));
> > TermDocs tDocs = IndexRreader.termDocs();
> >
> > while (terms.next()) {
> > if (docsToUpdate.contains(terms.text()) {
> > tDocs.seek(terms.term());
> > writer.updateDocument(tDocs.doc());
> > }
> > }
> >
> > NOTE: I've been fast and loose with edge conditions, like insuring
that
> > while (terms.next()) doesn't skip the first term, so caveat emptor....
> > This
> > loop also assumes that there is one and only one document in your
index
> > with
> > the primary key. Otherwise, you have to do some more work with the
> > TermDocs
> > class to process each document that has your primary key...
> >
> > This is similar to creating Lucene filters, which is very fast....
> >
> > Hope this helps
> > Erick
> >
> >
> >
> >
>