Thanks for picking up on this Anshum and Uwe. I used the following approach to convert by 2.3 index (which yes, was optimised already) to 3.0...
Using 3.0 Lucene, I created a new empty index with my IndexWriter. I opened my 2.3 index with an IndexReader. I added the 2.3 index with writer.addIndexes(reader) and then optimized and committed. I assume that counts as 2 segments being optimized, despite the fact that my new segment would have been empty. Of my 3 indexes I noticed a small growth in the index which has no compressed fields and a very small shrink in the two indexes which did have compressed fields. So, it looks like it wasn't a no-op, but looks like I was compressing <1K fields, as Uwe suspected. Typically these were synopsis fields with 3-sentence extracts from the texts being indexed. I hadn't realised that the threshold was as high as 1K to pay dividends. I would have been better off not compressing those fields. It looks like I'll benefit from Lucene 3 stopping me from abusing compression! 8-) Many thanks! -----Original Message----- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: 11 December 2009 18:43 To: java-user@lucene.apache.org Subject: RE: Lucene 3.0.0 writer with a Lucene 2.3.1 index The index *should* grow after merging/optimizing, but it will only do this, if the fields you had compressed were not bigger then without compression. One of the tests showed: A string field with 80 ascii chars needed compressed about 250 bytes, which is 3 times (as chars are UTF-8 encoded) the uncompressed size. So it was always a bad idea to compress only short fields, compression for say fields<1024 chars is simply waste of time and disk space. So maybe you hit bthis issue: Some fields were so small that the compressed representation were larger than uncompressed. And others the other way round. This leads to o/o change. By the way, if your index was already optimized in 2.3 and you try to optimize it in 3.0, it will be a no-op, as optimization needs at least two segments to merge. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Anshum [mailto:ansh...@gmail.com] > Sent: Friday, December 11, 2009 7:31 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene 3.0.0 writer with a Lucene 2.3.1 index > > Hi Tom, > Pt 3: As per my knowledge, it wouldn't be a 'mixture' of 2 index types. > Rather, as soon as you optimize (or do a IndexWriter operation on the > current index), it would expand the index to a non compressed format. I > read > it somewhere in the release notes that on doing so, a growth in the index > size should be anticipated and handled. > > -- > Anshum Gupta > Naukri Labs! > http://ai-cafe.blogspot.com > > The facts expressed here belong to everybody, the opinions to me. The > distinction is yours to draw............ > > > On Fri, Dec 11, 2009 at 10:50 PM, Rob Staveley (Tom) > <rstave...@seseit.com>wrote: > > > I'm upgrading from 2.3.1 to 3.0.0. I have 3.0.0 index readers ready to > go > > into production and writers in the process of upgrading to 3.0.0. > > > > I think understand the implications of > > http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats > for > > the upgrade, but I'd love it if someone could validate my following > > assumptions. > > > > 1. My 2.3.1 indexes have compressed fields in them, which the 3.0.0 > > readers work nicely with, as expected. I should assume that my 3.0.0 > > readers > > will continue to handle 2.3.1 indexes OK. > > > > 2. Presumably Lucene all future 3.x index readers will continue to > handle > > compressed fields and we should only anticipate Lucene 4.x choking on > them. > > > > I was naively expecting my index directories to grow when my 3.0.0 index > > writer merged the 2.3.1 indexes and/or optimize()'d them converting them > to > > 3.0.0. However, I don't see that. Presumably that means that.... > > > > 3. Documents added to existing 2.3.1 indexes will be added conforming > to > > 3.0.0, but existing documents in the index will continue to have > compressed > > content and old documents can coexist happily with the new ones, and my > > indexes will become a mixture of 2.3.1 and 3.0.0. > > > > 4. I should use > > > > > http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/util/Version > .h > > tml#LUCENE_23 for the StandardAnalyzer and QueryParser in mixed indexes > in > > 3.0.0 if I want to handle analysis consistently, or go for > LUCENE_CURRENT > > if > > I want to handle the new content "better" (bearing in mind that the new > > content will eventually replace the old content anyhow). > > > > 5. I should use > > > > > http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/analysis/Sto > pF > > > > > ilter.html#StopFilter%28boolean,%20org.apache.lucene.analysis.TokenStream, > %2 > > 0java.util.Set%29 with enablePositionIncrements=false in mixed indexes > in > > 3.0.0 if I want to handle analysis consistently, or go for > > enablePositionIncrements=true if I want to handle the new content > "better" > > (bearing in mind that the new content will eventually replace the old > > content anyhow). > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org