Re: Preventing index corruption

2008-06-29 Thread Eran Sevi
Thanks for the information. >From what I read in other posts it's better to prevent using RAMDirectory since the same result can be achieved by using the autoCommit=false as you suggested. I'm using 2.3.1 so I guess I'll have to wait to 2.4 or take the latest trunk in order to benefit from these

Re: Preventing index corruption

2008-06-27 Thread Michael McCandless
If you open your IndexWriter with autoCommit=false, then no changes will be visible in the index until you call commit() or close(). Added documents can still be flushed to disk as new segments when the RAM buffer is full, but these segments are not referenced (by a new segments_N file) until commi

Re: Preventing index corruption

2008-06-27 Thread John Byrne
Hi, Rather than disabling the merging, have you considered putting the documents in a separate index, possibly in memory, and then deciding when to merge them with the main index yourself? That way, you can change you mind and simply not merge the new documents if you want. To do this, yo

Re: Preventing index corruption

2008-06-26 Thread Eran Sevi
Thanks Erick. You might be joking, but one of our clients indeed had all his servers destroyed in a flood. Of course in this rare case, a solution would be to keep the backup on another site. However I'm still confused about normal scenarios: Let's say that in the middle of the batch I got an exc

Re: Preventing index corruption

2008-06-26 Thread Erick Erickson
How big is your index? The simpleminded way would be to copy things around as your batches come in and only switch to the *real* one after the additions were verified. You could also just maintain two indexes but only update one at a time. In the 99.99% case where things went well, it would just b

Preventing index corruption

2008-06-26 Thread Eran Sevi
Hi, I'm looking for the correct way to create an index given the following restrictions: 1. The documents are received in batches of variable sizes (not more then 100 docs in a batch). 2. The batch insertion must be transactional - either the whole batch is added to the index (exists physically o