Re: Field name size and index size

Michael McCandless Sat, 22 Mar 2008 07:27:50 -0700

Summary: I think there will be no real impact if you use longer fieldnames.


Details:

Index size will be just a tiny bit bigger. There is a single fileper segment (*.fnm) that resolves the field names into integer IDs,then the rest of the index uses these integer IDs. So only thatfile, which holds one copy of the string name of each field, will bebigger.

Indexing may be a bit faster with shorter names just because Lucenedoes a hashtable lookup of that string (one per field per document)to find the corresponding integer. My guess is this is a verynegligible impact, especially if the content of each field is long.Searching goes through a similar hashtable lookup, after which thefield name is interned, but for a largish index that time is surelynegligible as well. But you should test! And please report back :)


Mike

John wrote:

Hi,
Lets say my data source consists of records like so (the example isField=Value):
? AAAAAAAAAA=Value1
? BBBBBBBBBB=Value2
? CCCCCCCCCC=Value3
? DDDDDDDDDD=Value4
And lets say I a second copy of my data but this time it looks likeso:
? A=Value1
? B=Value2
? C=Value3
? D=Value4

I..e, same data, the only change is the field names?are now shorter
Now if?i create two Lucene indexes ... one using the long fieldname and one using the short field name (my data has notchanged) .. will the Lucene index size be smaller for the shortfield name one?? Will updating and optimizing the index be faster??Will searching be faster?
That is, i'm I better off using short field names vs. long fieldnames?
Yes, i will do some performance analyses .. but i want to know ifthis matters before I do so.
Thanks in advance!

-JM



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Field name size and index size

Reply via email to