Super!  Thanks for bringing closure.

Mike

On Mon, Jan 11, 2010 at 12:55 PM, Yuliya Palchaninava <y...@solute.de> wrote:
> Thanks again.
>
> Disabling norms, where it was possible without influencing the search quality,
> has solved the problem:
> - The not optimized version of the index has become smaller.
> - The optimized index has practically the same size as the not optimized one.
>
> Yuliya
>
>> -----Ursprüngliche Nachricht-----
>> Von: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Gesendet: Freitag, 8. Januar 2010 14:38
>> An: java-user@lucene.apache.org
>> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice as
>> large as the not optimized index
>>
>> Lucene stores 1 byte (disk and RAM, when searching that
>> field) per document for any field that has norms enabled,
>> even for documents that do not contain that field.
>>
>> In your case, that's ~20 MB per field (once optimize is done), times
>> 559 fields = ~11TB of storage.
>>
>> You should index these fields with
>> Field.Index.ANALYZED_NO_NORMS to turn off norms.  But, this
>> means field/doc boosting, and the normal length boosting
>> Lucene normally does (shorter documents get a better score),
>> will be silently disabled.  Also: you must fully re-index
>> from scratch, otherwise the norms will turn themselves back
>> on when segments merge together.
>>
>> Mike
>>
>> On Fri, Jan 8, 2010 at 7:55 AM, Yuliya Palchaninava
>> <y...@solute.de> wrote:
>> > Thanks Michael.
>> >
>> > You are probably wright.
>> >
>> > Not optimized size is 4.1G, optimized index is about 15G.
>> >
>> > Yes, our documents do have many different indexed fields
>> and norms are enabled.
>> > Nr of fields: 559
>> > Nr of documents: 20845906
>> > Nr of terms: 25615389
>> >
>> > Could you please give me a more detailled explanation, how
>> the storage of norms effects the size of an index.
>> > What do you mean exactly with "norms are not stored sparsely"?
>> >
>> > Thanks,
>> > Yuliya
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: Michael McCandless [mailto:luc...@mikemccandless.com]
>> >> Gesendet: Donnerstag, 7. Januar 2010 18:00
>> >> An: java-user@lucene.apache.org
>> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice
>> as large
>> >> as the not optimized index
>> >>
>> >> Do your documents have many different indexed fields?  If
>> you do, and
>> >> norms are enabled, that could be the cause (norms are not stored
>> >> sparsely).
>> >>
>> >> But: what actual sizes are we talking about?
>> >>
>> >> Mike
>> >>
>> >> On Thu, Jan 7, 2010 at 11:50 AM, Yuliya Palchaninava
>> <y...@solute.de>
>> >> wrote:
>> >> > Otis,
>> >> >
>> >> > thanks for the answer.
>> >> >
>> >> > Unfortunatelly the index *directory* remains larger *after"
>> >> the optimization.
>> >> > In our case the otimization was/is completed successfully
>> >> and, as you
>> >> > say, there is only one segment in the directory.
>> >> >
>> >> > Some other ideas?
>> >> >
>> >> > Thanks,
>> >> > Yuliya
>> >> >
>> >> >> -----Ursprüngliche Nachricht-----
>> >> >> Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> >> >> Gesendet: Donnerstag, 7. Januar 2010 17:35
>> >> >> An: java-user@lucene.apache.org
>> >> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice
>> >> as large
>> >> >> as the not optimized index
>> >> >>
>> >> >> Yuliya,
>> >> >>
>> >> >> The index *directory* will be larger *while* you are optimizing.
>> >> >> After the optimization is completed successfully, the
>> >> index directory
>> >> >> will be smaller.  It is possible that your index directory is
>> >> >> large(r) because you have some left-over segments (e.g.
>> from some
>> >> >> earlier failed/interrupted optimizations) that are not
>> >> really a part
>> >> >> of the index.  After optimizing, you should have only 1
>> >> segment, so
>> >> >> if you see more than 1 segment, look at the ones with older
>> >> >> timestamps.  Those can be (re)moved.
>> >> >>
>> >> >>  Otis
>> >> >> --
>> >> >> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>> >> >>
>> >> >>
>> >> >>
>> >> >> ----- Original Message ----
>> >> >> > From: Yuliya Palchaninava <y...@solute.de>
>> >> >> > To: "java-user@lucene.apache.org"
>> <java-user@lucene.apache.org>
>> >> >> > Sent: Thu, January 7, 2010 11:23:08 AM
>> >> >> > Subject: Lucene 2.9 and 3.0: Optimized index is thrice as
>> >> >> large as the
>> >> >> > not optimized index
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > According to the api documentation: "In general, once
>> >> the optimize
>> >> >> > completes, the total size of the index will be less than
>> >> >> the size of
>> >> >> > the starting index. It could be quite a bit smaller (if
>> >> there were
>> >> >> > many pending deletes) or just slightly smaller". In our
>> >> >> case the index
>> >> >> > becomes not smaller but larger, namely thrice as large.
>> >> >> >
>> >> >> > The not optimized index doesn't contain compressed fields,
>> >> >> what could
>> >> >> > have caused the growth of the index due to the
>> >> otimization. So we
>> >> >> > cannot explain what happens.
>> >> >> >
>> >> >> > Does someone have an explanation for the index growth due
>> >> >> to the optimization?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Yuliya
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> ---------------------------------------------------------------------
>> >> >> > To unsubscribe, e-mail:
>> java-user-unsubscr...@lucene.apache.org
>> >> >> > For additional commands, e-mail:
>> >> >> > java-user-h...@lucene.apache.org
>> >> >>
>> >> >>
>> >> >>
>> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> >> For additional commands, e-mail:
>> java-user-h...@lucene.apache.org
>> >> >>
>> >> >>
>> >> >
>> >>
>> ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >> >
>> >> >
>> >>
>> >>
>> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to