Thanks again. Disabling norms, where it was possible without influencing the search quality, has solved the problem: - The not optimized version of the index has become smaller. - The optimized index has practically the same size as the not optimized one.
Yuliya > -----Ursprüngliche Nachricht----- > Von: Michael McCandless [mailto:luc...@mikemccandless.com] > Gesendet: Freitag, 8. Januar 2010 14:38 > An: java-user@lucene.apache.org > Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice as > large as the not optimized index > > Lucene stores 1 byte (disk and RAM, when searching that > field) per document for any field that has norms enabled, > even for documents that do not contain that field. > > In your case, that's ~20 MB per field (once optimize is done), times > 559 fields = ~11TB of storage. > > You should index these fields with > Field.Index.ANALYZED_NO_NORMS to turn off norms. But, this > means field/doc boosting, and the normal length boosting > Lucene normally does (shorter documents get a better score), > will be silently disabled. Also: you must fully re-index > from scratch, otherwise the norms will turn themselves back > on when segments merge together. > > Mike > > On Fri, Jan 8, 2010 at 7:55 AM, Yuliya Palchaninava > <y...@solute.de> wrote: > > Thanks Michael. > > > > You are probably wright. > > > > Not optimized size is 4.1G, optimized index is about 15G. > > > > Yes, our documents do have many different indexed fields > and norms are enabled. > > Nr of fields: 559 > > Nr of documents: 20845906 > > Nr of terms: 25615389 > > > > Could you please give me a more detailled explanation, how > the storage of norms effects the size of an index. > > What do you mean exactly with "norms are not stored sparsely"? > > > > Thanks, > > Yuliya > > > >> -----Ursprüngliche Nachricht----- > >> Von: Michael McCandless [mailto:luc...@mikemccandless.com] > >> Gesendet: Donnerstag, 7. Januar 2010 18:00 > >> An: java-user@lucene.apache.org > >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice > as large > >> as the not optimized index > >> > >> Do your documents have many different indexed fields? If > you do, and > >> norms are enabled, that could be the cause (norms are not stored > >> sparsely). > >> > >> But: what actual sizes are we talking about? > >> > >> Mike > >> > >> On Thu, Jan 7, 2010 at 11:50 AM, Yuliya Palchaninava > <y...@solute.de> > >> wrote: > >> > Otis, > >> > > >> > thanks for the answer. > >> > > >> > Unfortunatelly the index *directory* remains larger *after" > >> the optimization. > >> > In our case the otimization was/is completed successfully > >> and, as you > >> > say, there is only one segment in the directory. > >> > > >> > Some other ideas? > >> > > >> > Thanks, > >> > Yuliya > >> > > >> >> -----Ursprüngliche Nachricht----- > >> >> Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > >> >> Gesendet: Donnerstag, 7. Januar 2010 17:35 > >> >> An: java-user@lucene.apache.org > >> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice > >> as large > >> >> as the not optimized index > >> >> > >> >> Yuliya, > >> >> > >> >> The index *directory* will be larger *while* you are optimizing. > >> >> After the optimization is completed successfully, the > >> index directory > >> >> will be smaller. It is possible that your index directory is > >> >> large(r) because you have some left-over segments (e.g. > from some > >> >> earlier failed/interrupted optimizations) that are not > >> really a part > >> >> of the index. After optimizing, you should have only 1 > >> segment, so > >> >> if you see more than 1 segment, look at the ones with older > >> >> timestamps. Those can be (re)moved. > >> >> > >> >> Otis > >> >> -- > >> >> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > >> >> > >> >> > >> >> > >> >> ----- Original Message ---- > >> >> > From: Yuliya Palchaninava <y...@solute.de> > >> >> > To: "java-user@lucene.apache.org" > <java-user@lucene.apache.org> > >> >> > Sent: Thu, January 7, 2010 11:23:08 AM > >> >> > Subject: Lucene 2.9 and 3.0: Optimized index is thrice as > >> >> large as the > >> >> > not optimized index > >> >> > > >> >> > Hi, > >> >> > > >> >> > According to the api documentation: "In general, once > >> the optimize > >> >> > completes, the total size of the index will be less than > >> >> the size of > >> >> > the starting index. It could be quite a bit smaller (if > >> there were > >> >> > many pending deletes) or just slightly smaller". In our > >> >> case the index > >> >> > becomes not smaller but larger, namely thrice as large. > >> >> > > >> >> > The not optimized index doesn't contain compressed fields, > >> >> what could > >> >> > have caused the growth of the index due to the > >> otimization. So we > >> >> > cannot explain what happens. > >> >> > > >> >> > Does someone have an explanation for the index growth due > >> >> to the optimization? > >> >> > > >> >> > Thanks, > >> >> > Yuliya > >> >> > > >> >> > > >> >> > > >> >> > >> > --------------------------------------------------------------------- > >> >> > To unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org > >> >> > For additional commands, e-mail: > >> >> > java-user-h...@lucene.apache.org > >> >> > >> >> > >> >> > >> > --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: > java-user-h...@lucene.apache.org > >> >> > >> >> > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > >> > > >> > >> > --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org