IMO you should avoid storing any data in the index that you don't need for display. Lucene is an index (and a damn good one), not a database. If you find yourself storing large amounts of data in the index, this could be an indication that you may need to re-think your architecture.
In its simplest case, data storage is for storing data. Lucene is for indexing the data and searching it. You will certainly see performance implications with storing data in the index, particularly if you elect to have the data compressed by lucene. The lazy loading in the current trunk will help enormously with this (great work by the dev team), but I would still encourage you to design a system in which lucene is not the primary source of data. That is, if you need to re-index, get the data from its source location (or some interim location) rather than relying on storing it in lucene. I had all sorts of struggles with this very issue, and after several failed attempts came to the conclusion that whilst Lucene often forms a critical part of a good solution, it should only be used as an index/search tool.. not a database. On 8/12/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Large stored fields can affect performance when you are iterating over your hits (assuming you are not interested in the value of the stored field at that point in time) for a results display since all Fields are loaded when getting the Document. The SVN trunk has a version of lazy loading that allows you to specify which fields are loaded and which ones are lazy, so you can avoid loading fields that a user will never view. -Grant On Aug 11, 2006, at 9:07 AM, Prasenjit Mukherjee wrote: > I have a requirement ( use highlighter) to store the doc content > somewhere., and I am not allowed to use a RDBMS. I am thinking of > using Lucene's Field with (Field.Store.YES and Field.Index.NO) to > store the doc content. Will it have any negative affect on my > search performance ? > I think I have read somewhere that Lucene shouldn't be used(or > misused) to provide RDBMS like storage. > > --prasen > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Skype: grant_ingersoll Fax: 315-443-6886 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]