I have a project where I need to index documents using Lucene 4.1.0.  One 
of the fields for the stored Document is the actual text from the 
document(.pdf, .docx, etc.)  I want to be able to highlight text from the 
documents  in the search results.  I was looking at some older tutorials about 
storing the field with TermVectors and also storing it in the index with 
Store.COMPRESS.  However, with Lucene 4.1 they have done away with 
Store.COMPRESS.  Is there still a way to compress the field?
    I am worried about the amount of space that will be stored in the index if 
I have to have the "body" Field stored and uncompressed.
    Are there ways around having to store the whole Field in its original form?
    Since I am already going to be storing the actual documents on the server, 
would it be feasible (time) to not store TermVectors or Store the field at all 
until the user searches for a document.  Then at runtime I can re-index the top 
docs from the original documents in RAM and use Highlighter to return fragments?

Thanks

Reply via email to