On Tue, May 5, 2009 at 7:24 PM, Antony Bowesman <a...@teamware.com> wrote: > Michael McCandless wrote: >> >> Lucene doesn't provide any way to do this, except opening a reader. >> >> Opening a reader is not "that" expensive if you use it for this >> purpose. EG neither norms nor FieldCache will be loaded if you just >> enumerate the term docs. > > Thanks for that info. These indexes will be large, in the 10s of millions. > id field is unique and is 29 bytes. I guess that's still a lot of data to > trawl through to get to the term.
Have you tested how long it takes to look up docs from your id? >> But, you can let Lucene do the same thing for you by just always using >> updateDocument, which'll remove the old doc if it's present. > > That's precisely what I don't want to occur. I have two forms of a > Document, which represent mail items. One 'full' version containing all > index and stored data, which represents a searchable mail item and one > 'base', which is simply a marker Document which represents a mail in a > forwarded mail chain, with just a couple of stored fields containing the > mail meta data. > > Under normal circumstances there are no problems as mails arrive in sequence > and are never handled twice, but there is one case, during a reindex op, > when the arrival of those mails can come out of sequence, i.e. a full mail > is indexed first, but that mail is later processed as part of a forwarded > mail chain of another mail. > > It is the second time that mail is handled as a base mail that I do not want > it to overwrite the full version. > > Would it be technically difficult to support something like this in the > IndexWriter API and if not, would it end up being more efficient that using > a reader/terms to check this? Couldn't you just give the base & full docs different ids? Then you can independently choose which one to update? Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org