hi Erick,
Well, typically my application will start with some hundreds of indexes...and then grow at a rate of several per day, for ever. At some point I know I can do some merging etc if needed. Size is dependant on the customer, could be up to a 1G per index. That is way I would like to minimize them. I am not worried with search performance. I dont understand how not stemming can reduce the size of an index...I would think it happens the other way, does not stemming makes the words shorter? (I dont stemm, so I never looked into it) thanks On 3/14/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Store as little as possible, index as little as possible <G>..... How big is your index, and how much do you expect it to grow? I ask this because it's probably not worth your time to try to reduce the index size below some threshold... I found that reducing my index from 8G to 4G (through not stemming) gave me about a 10% performance improvement, so at some point it's just not worth the effort. Also, if you posted the index size, it would give folks a chance to say "there's not much you can gain by reducing things more". As it is, I don't have a clue whether your index is 100M or 100T. The former is in the "don't waste your time" class, and the latter is...er... different.... I wouldn't bother compressing for 1%.... Question for "the guys" so I can check an assumption.... Is there any difference between these two? Field(Name, Value, Store, index) *<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/document/Field.html#Field%28java.lang.String,%20java.lang.String,%20org.apache.lucene.document.Field.Store,%20org.apache.lucene.document.Field.Index,%20org.apache.lucene.document.Field.TermVector%29> *Field(Name, Value, Store, index, Field.TermVector.NO) Best Erick On 3/14/07, jm <[EMAIL PROTECTED]> wrote: > > Hi, > > I want to make my index as small as possible. I noticed about > field.setOmitNorms(true), I read in the list the diff is 1 byte per > field per doc, not huge but hey...is the only effect the score being > different? I hardly mind about the score so that would be ok. > > And can I add to an index without norms when it has previous doc with > norms? > > Any other way to minimize size of index? Most of my fields but one are > Field.Store.NO, Field.Index.TOKENIZED and Field.TermVector.NO, one is > Field.Store.YES, Field.Index.UN_TOKENIZED and Field.TermVector.NO. I > tried compressing that one and size is reduced around 1% (it's a small > field), but I guess compression means worse performance so I am not > sure about applying that. > > thanks > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]