On 2010-05-12 14:29, Ivan Vasilev wrote: > Hi Michael, > Thanks for your answer. > What we do now: > 1. Splitting indexes. We do it not by reading indexes and distributing > docs in separate indexes like in MultiPassIndexSplitter. We do it by > binary copping segments to different folders and then recreate segment > descriptor file for each one (we have created tool for this). The > decision of which segment to which new index to go is taken by taking > segment sizes and calculating so that to have almost equal indexes. If > we have .cfx file this would be an obstacle for current logic of division. > I saw the class MultiPassIndexSplitter. It offers splitting index by > docs (not by segments). It has a big advantage - index could be split > better (to more similar in size parts). It would be done even if index > was just optimized and we have only one big segment. But it has also > disadvantages. Index is read as many times as the number of new indexes > is (it is bad for ~40Gb indexes). Also the original index remains all > the time this means if we do the split in one and the same partition we > need double disk space. > May be we should offer both index split approaches to the user... this > depends on higher levels :)
Hi, I wrote the MultiPassIndexSplitter. Yes, multi-pass is problematic with large indexes. I'm currently working on a single-pass TrueSplitter :) which should be ready within a couple weeks. However, even this new tool will make a copy of the original index, so you will need twice as much space. But in this case perhaps you could put the original index on a network FS, and split it into the target partition - the data would be read just once. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org