Thanks for that info. These indexes will be large, in the 10s of millions.
id field is unique and is 29 bytes. I guess that's still a lot of data to
trawl through to get to the term.
Have you tested how long it takes to look up docs from your id?
Not in indexes that size in a live environment as I don't have the hardware to
make those sorts of test :( although I know in general, lookup is fast.
Couldn't you just give the base & full docs different ids? Then you
can independently choose which one to update?
I considered that, but as the normal case will not need to worry about this
scenario.
There is only ever one instance of a mail Doc, whether it is a root mail or part
of a forward chain and a root mail can of course be part of a forward chain at
some point, so it should be optimal to just fetch the one Document for the mail
Id without first trying the true Id, then some pseudo Id if it isn't found.
Unfortunately, I'm having to solve this problem in my Lucene app as the tool
that's generating this data is unable to know what has or has not been handled
previously.
I'm implementing it using the IndexReader approach for now and will try to get
some performance data, so thanks for your comments Mike.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org