Thank you very much for the help Simon. I am amazed I was able to accomplish what I wanted. I didn't store the body in the Index. And I used Highlighter to return the best fragments by parsing my original document. ________________________________________ From: Simon Willnauer [simon.willna...@gmail.com] Sent: Monday, March 25, 2013 4:07 AM To: java-user@lucene.apache.org Subject: Re: Compression and Highlighter
On Mon, Mar 25, 2013 at 8:13 AM, Bushman, Lamont <bus08...@byui.edu> wrote: > I have a project where I need to index documents using Lucene 4.1.0. One > of the fields for the stored Document is the actual text from the > document(.pdf, .docx, etc.) I want to be able to highlight text from the > documents in the search results. I was looking at some older tutorials > about storing the field with TermVectors and also storing it in the index > with Store.COMPRESS. However, with Lucene 4.1 they have done away with > Store.COMPRESS. Is there still a way to compress the field? Lucene 4.1 uses a compressed stored fields format under the hoods by default. The compression is completely transparent and enabled by default. Here is some background: http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene > I am worried about the amount of space that will be stored in the index > if I have to have the "body" Field stored and uncompressed. > Are there ways around having to store the whole Field in its original > form? > Since I am already going to be storing the actual documents on the > server, would it be feasible (time) to not store TermVectors or Store the > field at all until the user searches for a document. Then at runtime I can > re-index the top docs from the original documents in RAM and use Highlighter > to return fragments? this is what the highlighter does if you are not using the FastVectorHighlighter. You can just pass in the string value you wanna highlight no matter if you stored it in lucene or not. You just need to see if that works for you performance wise without storing TV. simon > > Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org