hi Erick,

Well, typically my application will start with some hundreds of
indexes...and then grow at a rate of several per day, for ever. At
some point I know I can do some merging etc if needed.

Size is dependant on the customer, could be up to a 1G per index. That
is way I would like to minimize them. I am not worried with search
performance.

I dont understand how not stemming can reduce the size of an index...I
would think it happens the other way, does not stemming makes the
words shorter? (I dont stemm, so I never looked into it)

thanks
On 3/14/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Store as little as possible, index as little as possible <G>.....

How big is your index, and how much do you expect it to grow?
I ask this because it's probably not worth your time to try to
reduce the index size below some threshold... I found that
reducing my index from 8G to 4G (through not stemming) gave
me about a 10% performance improvement, so at some point
it's just not worth the effort. Also, if you posted the index size,
it would give folks a chance to say "there's not much you can
gain by reducing things more". As it is, I don't have a clue
whether your index is 100M or 100T. The former is in the
"don't waste your time" class, and the latter is...er...
different....

I wouldn't bother compressing for 1%....

Question for "the guys" so I can check an assumption....
Is there any difference between these two?
Field(Name, Value, Store, index)
*<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/document/Field.html#Field%28java.lang.String,%20java.lang.String,%20org.apache.lucene.document.Field.Store,%20org.apache.lucene.document.Field.Index,%20org.apache.lucene.document.Field.TermVector%29>
*Field(Name, Value, Store, index, Field.TermVector.NO)


Best
Erick

On 3/14/07, jm <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I want to make my index as small as possible. I noticed about
> field.setOmitNorms(true), I read in the list the diff is 1 byte per
> field per doc, not huge but hey...is the only effect the score being
> different? I hardly mind about the score so that would be ok.
>
> And can I add to an index without norms when it has previous doc with
> norms?
>
> Any other way to minimize size of index? Most of my fields but one are
> Field.Store.NO, Field.Index.TOKENIZED and Field.TermVector.NO, one is
> Field.Store.YES, Field.Index.UN_TOKENIZED and Field.TermVector.NO. I
> tried compressing that one and size is reduced around 1% (it's a small
> field), but I guess compression means worse performance so I am not
> sure about applying that.
>
> thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to