On Thu, Jan 2, 2014 at 7:53 PM, Derek Lewis <de...@lewisd.com> wrote: > Sorry for the delay responding. Holidays and all that. :)
No problem. > The retry approach did work, our process finished in the end. At some > point, I suppose we'll just live with the chance this might happen and dump > a bunch of exceptions into the log, if the effort to fix it is too high. > Being pragmatic and all. Fair enough :) I do think retry is a valid approach. > You are correct that preventing the duplicate indexing is hard. We do have > things in place to try to prevent it, emphasis on the "try". Occasionally, > things go wrong and we get a small number of duplicates, but on at least on > occasion that number was anything but small. ;) > > I'm as sure as I can be that there were no merges running, since we're > locking that directory before running this process. All our things that > index use that same lock, so unless merges happen in a background thread > within Lucene, rather than the calling thread that's adding new documents > to the index, there should be no merges going on outside of this lock. In > that case, calling waitForMerges shouldn't have any effect. Merging does run in a background thread by default (ConcurrentMergeScheduler), and a still-running merge could be ongoing when you "lock that directory". I don't think IndexWriter kicks off merges on init today, but it's free to (it's an impl detail). Net/net one should not rely on when merges might happen... > I know you've mentioned the infoStream a couple times :) But I don't think > turning it on would be a good idea, in our case. We've only had this > problem crop up once, so there's no guarantee at all that it'll happen > again, and the infoStream logging would be a lot of data with all the > indexing we're doing. Unfortunately, I just don't think it's feasible. In fact infoStream doesn't generate THAT much data: it doesn't log for every added doc. Only when segment changes happen (a flush, a merge, deletes applied, etc.). And it can be very useful in post-mortem to figure out what happened when something goes wrong. > Thanks very much for the suggestion about FilterIndexReader with > addIndices. That sounds very promising. I'm going to investigate doing > our duplicate filtering that way instead. > > Thanks again for the help. Cheers :) You're welcome! Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org