Thank you very much for the help Simon.  I am amazed I was able to accomplish 
what I wanted.  I didn't store the body in the Index.  And I used Highlighter 
to return the best fragments by parsing my original document.
________________________________________
From: Simon Willnauer [simon.willna...@gmail.com]
Sent: Monday, March 25, 2013 4:07 AM
To: java-user@lucene.apache.org
Subject: Re: Compression and Highlighter

On Mon, Mar 25, 2013 at 8:13 AM, Bushman, Lamont <bus08...@byui.edu> wrote:
>     I have a project where I need to index documents using Lucene 4.1.0.  One 
> of the fields for the stored Document is the actual text from the 
> document(.pdf, .docx, etc.)  I want to be able to highlight text from the 
> documents  in the search results.  I was looking at some older tutorials 
> about storing the field with TermVectors and also storing it in the index 
> with Store.COMPRESS.  However, with Lucene 4.1 they have done away with 
> Store.COMPRESS.  Is there still a way to compress the field?

Lucene 4.1 uses a compressed stored fields format under the hoods by
default. The compression is completely transparent and enabled by
default. Here is some background:
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene

>     I am worried about the amount of space that will be stored in the index 
> if I have to have the "body" Field stored and uncompressed.
>     Are there ways around having to store the whole Field in its original 
> form?
>     Since I am already going to be storing the actual documents on the 
> server, would it be feasible (time) to not store TermVectors or Store the 
> field at all until the user searches for a document.  Then at runtime I can 
> re-index the top docs from the original documents in RAM and use Highlighter 
> to return fragments?

this is what the highlighter does if you are not using the
FastVectorHighlighter. You can just pass in the string value you wanna
highlight no matter if you stored it in lucene or not. You just need
to see if that works for you performance wise without storing TV.

simon
>
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to