Thanks for that info.  These indexes will be large, in the 10s of millions.
 id field is unique and is 29 bytes.  I guess that's still a lot of data to
trawl through to get to the term.

Have you tested how long it takes to look up docs from your id?

Not in indexes that size in a live environment as I don't have the hardware to make those sorts of test :( although I know in general, lookup is fast.

Couldn't you just give the base & full docs different ids?  Then you
can independently choose which one to update?

I considered that, but as the normal case will not need to worry about this scenario.

There is only ever one instance of a mail Doc, whether it is a root mail or part of a forward chain and a root mail can of course be part of a forward chain at some point, so it should be optimal to just fetch the one Document for the mail Id without first trying the true Id, then some pseudo Id if it isn't found.

Unfortunately, I'm having to solve this problem in my Lucene app as the tool that's generating this data is unable to know what has or has not been handled previously.

I'm implementing it using the IndexReader approach for now and will try to get some performance data, so thanks for your comments Mike.

Antony








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to