Hi Paul, > I have copied some code and it is working for me, but I am a little > uncertain how to decide what value of Field.Index and Field.Store to > choose in order to get the behavior I'd like. If I read the javadocs, and > decide to ignore all the "expert" items, it looks like this: > > Field.Store.NO = I'll never see that data again; I wonder why I'd do this?
If you have the data somewhere else (like in XML) files, it makes no sense, to store it additionally in the index. You would normally store one field containing the filename or identifier and use it for retrieving the original document. On the other hand, if you want to have a index-only solution, it may be good to store the filed values in index, but this may not be needed for all fields. E.g. you have a filed, in which you index the whole document contents and another field, where you only index the title (with that you can search only inside the title or in the whole documents). As the title is part of the whole document contents, it makes no sense to additionally store it, if it's not really needed for displaying results. > Field.Store.YES = good, the data will be stored Yes. > Field.Store.COMPRESS = even better, stored and compressed; why would > anyone do anything else? Compressing is very contraproductive for small values (and decreases performance). Short values like identifers and so on mostly "compress" to larger values than before. So, only use compress, if you have large document contents, where performance of retrieving is not important. > ======== > > Field.Index.NO = I cannot search that data, but if I need its value for a > given document (e.g., to decorate a result), I can retrieve it (use-case: > maybe, the date the document was created -- but why not just make that > searchable? I am having a hard time thinking of an actually useful piece > of data that could go here and would not want to be one of ANALYZED or > NOT_ANALYZED) E.g. we have this here, to store the original XML document. The XML documents does not get indexed directly, only the text contents are indexed. For result display, I store the XML file in a stored-only field (and compressed). By the way, you can also store binary data like images (but not index it). > Field.Index.ANALYZED = the normal value, I would guess, except in the > special case of stuff not searchable but used to decorate results > (Field.Index.NO) > > Field.Index.NOT_ANALYZED = I can search for this value, but it won't get > analyzed, so it is searched for as the very same value I put in (the docco > suggests product numbers: any other interesting use-cases anyone can > suggest?) All type of identifiers or primary keys, numbers like prices, dates,... Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org