Super! Thanks for bringing closure. Mike
On Mon, Jan 11, 2010 at 12:55 PM, Yuliya Palchaninava <y...@solute.de> wrote: > Thanks again. > > Disabling norms, where it was possible without influencing the search quality, > has solved the problem: > - The not optimized version of the index has become smaller. > - The optimized index has practically the same size as the not optimized one. > > Yuliya > >> -----Ursprüngliche Nachricht----- >> Von: Michael McCandless [mailto:luc...@mikemccandless.com] >> Gesendet: Freitag, 8. Januar 2010 14:38 >> An: java-user@lucene.apache.org >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice as >> large as the not optimized index >> >> Lucene stores 1 byte (disk and RAM, when searching that >> field) per document for any field that has norms enabled, >> even for documents that do not contain that field. >> >> In your case, that's ~20 MB per field (once optimize is done), times >> 559 fields = ~11TB of storage. >> >> You should index these fields with >> Field.Index.ANALYZED_NO_NORMS to turn off norms. But, this >> means field/doc boosting, and the normal length boosting >> Lucene normally does (shorter documents get a better score), >> will be silently disabled. Also: you must fully re-index >> from scratch, otherwise the norms will turn themselves back >> on when segments merge together. >> >> Mike >> >> On Fri, Jan 8, 2010 at 7:55 AM, Yuliya Palchaninava >> <y...@solute.de> wrote: >> > Thanks Michael. >> > >> > You are probably wright. >> > >> > Not optimized size is 4.1G, optimized index is about 15G. >> > >> > Yes, our documents do have many different indexed fields >> and norms are enabled. >> > Nr of fields: 559 >> > Nr of documents: 20845906 >> > Nr of terms: 25615389 >> > >> > Could you please give me a more detailled explanation, how >> the storage of norms effects the size of an index. >> > What do you mean exactly with "norms are not stored sparsely"? >> > >> > Thanks, >> > Yuliya >> > >> >> -----Ursprüngliche Nachricht----- >> >> Von: Michael McCandless [mailto:luc...@mikemccandless.com] >> >> Gesendet: Donnerstag, 7. Januar 2010 18:00 >> >> An: java-user@lucene.apache.org >> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice >> as large >> >> as the not optimized index >> >> >> >> Do your documents have many different indexed fields? If >> you do, and >> >> norms are enabled, that could be the cause (norms are not stored >> >> sparsely). >> >> >> >> But: what actual sizes are we talking about? >> >> >> >> Mike >> >> >> >> On Thu, Jan 7, 2010 at 11:50 AM, Yuliya Palchaninava >> <y...@solute.de> >> >> wrote: >> >> > Otis, >> >> > >> >> > thanks for the answer. >> >> > >> >> > Unfortunatelly the index *directory* remains larger *after" >> >> the optimization. >> >> > In our case the otimization was/is completed successfully >> >> and, as you >> >> > say, there is only one segment in the directory. >> >> > >> >> > Some other ideas? >> >> > >> >> > Thanks, >> >> > Yuliya >> >> > >> >> >> -----Ursprüngliche Nachricht----- >> >> >> Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> >> >> Gesendet: Donnerstag, 7. Januar 2010 17:35 >> >> >> An: java-user@lucene.apache.org >> >> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice >> >> as large >> >> >> as the not optimized index >> >> >> >> >> >> Yuliya, >> >> >> >> >> >> The index *directory* will be larger *while* you are optimizing. >> >> >> After the optimization is completed successfully, the >> >> index directory >> >> >> will be smaller. It is possible that your index directory is >> >> >> large(r) because you have some left-over segments (e.g. >> from some >> >> >> earlier failed/interrupted optimizations) that are not >> >> really a part >> >> >> of the index. After optimizing, you should have only 1 >> >> segment, so >> >> >> if you see more than 1 segment, look at the ones with older >> >> >> timestamps. Those can be (re)moved. >> >> >> >> >> >> Otis >> >> >> -- >> >> >> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch >> >> >> >> >> >> >> >> >> >> >> >> ----- Original Message ---- >> >> >> > From: Yuliya Palchaninava <y...@solute.de> >> >> >> > To: "java-user@lucene.apache.org" >> <java-user@lucene.apache.org> >> >> >> > Sent: Thu, January 7, 2010 11:23:08 AM >> >> >> > Subject: Lucene 2.9 and 3.0: Optimized index is thrice as >> >> >> large as the >> >> >> > not optimized index >> >> >> > >> >> >> > Hi, >> >> >> > >> >> >> > According to the api documentation: "In general, once >> >> the optimize >> >> >> > completes, the total size of the index will be less than >> >> >> the size of >> >> >> > the starting index. It could be quite a bit smaller (if >> >> there were >> >> >> > many pending deletes) or just slightly smaller". In our >> >> >> case the index >> >> >> > becomes not smaller but larger, namely thrice as large. >> >> >> > >> >> >> > The not optimized index doesn't contain compressed fields, >> >> >> what could >> >> >> > have caused the growth of the index due to the >> >> otimization. So we >> >> >> > cannot explain what happens. >> >> >> > >> >> >> > Does someone have an explanation for the index growth due >> >> >> to the optimization? >> >> >> > >> >> >> > Thanks, >> >> >> > Yuliya >> >> >> > >> >> >> > >> >> >> > >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> > To unsubscribe, e-mail: >> java-user-unsubscr...@lucene.apache.org >> >> >> > For additional commands, e-mail: >> >> >> > java-user-h...@lucene.apache.org >> >> >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >> For additional commands, e-mail: >> java-user-h...@lucene.apache.org >> >> >> >> >> >> >> >> > >> >> >> --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > >> >> > >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> > >> --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org