RE: newbie seeking explanation of semantics of "Field" class

Uwe Schindler Tue, 17 Feb 2009 11:50:53 -0800

Hi Paul,

> I have copied some code and it is working for me, but I am a little
> uncertain how to decide what value of Field.Index and Field.Store to
> choose in order to get the behavior I'd like. If I read the javadocs, and
> decide to ignore all the "expert" items, it looks like this:
> 
> Field.Store.NO = I'll never see that data again; I wonder why I'd do this?


If you have the data somewhere else (like in XML) files, it makes no sense,
to store it additionally in the index. You would normally store one field
containing the filename or identifier and use it for retrieving the original
document. On the other hand, if you want to have a index-only solution, it
may be good to store the filed values in index, but this may not be needed
for all fields. E.g. you have a filed, in which you index the whole document
contents and another field, where you only index the title (with that you
can search only inside the title or in the whole documents). As the title is
part of the whole document contents, it makes no sense to additionally store
it, if it's not really needed for displaying results.

> Field.Store.YES = good, the data will be stored

Yes.

> Field.Store.COMPRESS = even better, stored and compressed; why would
> anyone do anything else?

Compressing is very contraproductive for small values (and decreases
performance). Short values like identifers and so on mostly "compress" to
larger values than before. So, only use compress, if you have large document
contents, where performance of retrieving is not important.

> ========
> 
> Field.Index.NO = I cannot search that data, but if I need its value for a
> given document (e.g., to decorate a result), I can retrieve it (use-case:
> maybe, the date the document was created -- but why not just make that
> searchable? I am having a hard time thinking of an actually useful piece
> of data that could go here and would not want to be one of ANALYZED or
> NOT_ANALYZED)

E.g. we have this here, to store the original XML document. The XML
documents does not get indexed directly, only the text contents are indexed.
For result display, I store the XML file in a stored-only field (and
compressed). By the way, you can also store binary data like images (but not
index it).

> Field.Index.ANALYZED = the normal value, I would guess, except in the
> special case of stuff not searchable but used to decorate results
> (Field.Index.NO)
> 
> Field.Index.NOT_ANALYZED = I can search for this value, but it won't get
> analyzed, so it is searched for as the very same value I put in (the docco
> suggests product numbers: any other interesting use-cases anyone can
> suggest?)

All type of identifiers or primary keys, numbers like prices, dates,...

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: newbie seeking explanation of semantics of "Field" class

Reply via email to