I have a project where I need to index documents using Lucene 4.1.0. One of the fields for the stored Document is the actual text from the document(.pdf, .docx, etc.) I want to be able to highlight text from the documents in the search results. I was looking at some older tutorials about storing the field with TermVectors and also storing it in the index with Store.COMPRESS. However, with Lucene 4.1 they have done away with Store.COMPRESS. Is there still a way to compress the field? I am worried about the amount of space that will be stored in the index if I have to have the "body" Field stored and uncompressed. Are there ways around having to store the whole Field in its original form? Since I am already going to be storing the actual documents on the server, would it be feasible (time) to not store TermVectors or Store the field at all until the user searches for a document. Then at runtime I can re-index the top docs from the original documents in RAM and use Highlighter to return fragments?
Thanks