Dalton, Jeffery wrote:
You mentioned that "it will scale well in the future". Does this imply
that it doesn't scale well now?
Absolutely not. I'm saying the practice of storing your documents
outside of a Lucene, is, in general, a good practice to follow. Lucene
is brilliant, but specialized. It should be used for it's intended
porpoise (everything should have an intended porpoise). I wasn't
slighting the Highlighter, my project has been very much the better for
having used it.
What are the current limitations of the
Lucene Highlighter? Does does it perform under high query load?
I don't know. I can run ten full page text documents through the
highlighter and generate a search results page in about 50ms, about the
same amount of time it took to get the hits object. And I have reason
to believe I may not have used the highlighter in the most effective
manor. I stopped futzing with it when I felt I was getting results back
in an acceptable amount of time.
This is just a curiousity of mine, but nutch has a separate Summarizer:
net.nutch.search.summarizer. The Nutch summarizer looks much more
efficient ( aka more simplistic) and therefore probably more scalable?
This is probably a question for the Nutch user list, but why doesn't
Nutch use the Lucene Summarizer?
Thoughts, comments?
- Jeff
-----Original Message-----
From: Dan Funk [mailto:[EMAIL PROTECTED]
Sent: Friday, September 23, 2005 8:28 AM
To: java-user@lucene.apache.org
Subject: Re: Displaying search context
What you are doing is a good, scalable practice. You need to store
those email messages somewhere outside of Lucene, and use a unique id to
correlate the two. When you want to display relevant text for a search
result, find the file on disk, and pass it through the Lucene
Highlighter (see the Lucene sandbox). This will give you what you are
looking for, and it will scale well in the future.
Anand Kishore wrote:
Hi,
I am indexing emails through Lucene. The body of the mails is stored in
an ''Unstored" field. I also have a search interface setup which
returns me all Documents matching my query. What i need is to display a
few lines from the body of the mails where the queryTerm was found. How
can this be achieved as the body is just indexed but not stored.
Thanx
- Andy
http://da-tek-ee.blogspot.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Dan Funk
Software Engineer
Information Technology Solutions
Battelle Charlottesville Operations
1000 Research Park Boulevard, Suite 105
Charlottesville, Virginia 22911
434.984.0951 x244
434.984.0947 (fax)
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]