Summary: I think there will be no real impact if you use longer field names.

Details:

Index size will be just a tiny bit bigger. There is a single file per segment (*.fnm) that resolves the field names into integer IDs, then the rest of the index uses these integer IDs. So only that file, which holds one copy of the string name of each field, will be bigger.

Indexing may be a bit faster with shorter names just because Lucene does a hashtable lookup of that string (one per field per document) to find the corresponding integer. My guess is this is a very negligible impact, especially if the content of each field is long. Searching goes through a similar hashtable lookup, after which the field name is interned, but for a largish index that time is surely negligible as well. But you should test! And please report back :)

Mike

John wrote:
Hi,

Lets say my data source consists of records like so (the example is Field=Value):

? AAAAAAAAAA=Value1
? BBBBBBBBBB=Value2
? CCCCCCCCCC=Value3
? DDDDDDDDDD=Value4

And lets say I a second copy of my data but this time it looks like so:

? A=Value1
? B=Value2
? C=Value3
? D=Value4

I..e, same data, the only change is the field names?are now shorter

Now if?i create two Lucene indexes ... one using the long field name and one using the short field name (my data has not changed) .. will the Lucene index size be smaller for the short field name one?? Will updating and optimizing the index be faster?? Will searching be faster?

That is, i'm I better off using short field names vs. long field names?

Yes, i will do some performance analyses .. but i want to know if this matters before I do so.

Thanks in advance!

-JM


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to