For what it's worth Mark (Miller), there *is* a need for "just highlight the query terms without trying to get excerpts" functionality - something a la Google cache (different colours...mmm, nice). I've had people ask me for this before, and I know I could use this functionality, too. Please contrib to contrib/ if you end up working on this.
Otis -- Simpy -- http://www.simpy.com/ -- Tag. Search. Share. ----- Original Message ---- From: Mark Miller <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Sunday, January 28, 2007 7:39:29 AM Subject: Re: Multiword Highlighting markharw00d wrote: > >>Isn't it semi trivial if you are not interested in the fragments (I > swear it seems that most people are not)? I > > I haven't conducted a survey but it's the typical web search engine > scenario - select only a small subset of the matching document content > for display in SERPS. I would expect that to be a pretty commonplace > requirement for which we should retain a solution. No doubt. I certainly am not suggesting you ditch fragments and I have no evidence more people just want to highlight a doc...it's just the impression that I get from the mailing list is that most people just want to highlight the returned doc...I am sure plenty of people need google style results too, but my experience with Lucene has not often been in the area of web search engines. I bet a lot of users would benefit from a highlighter that highlights actual hits and doesn't summarize though (both would be great). I wouln't claim to be an authority on any of this though...take my opinion for what its worth -- very little. > > Maybe a new highlighter with no attempt at summarising could more > easily address phrase support for small pieces of content. It will > always be hard to faithfully represent all possible query match logic > - especially if there are NOTs, ANDs and ORs mixed in with all the > term proximity logic e.g. NotNear. Some compromise is required. I did > suggest that spans maybe a better basis for highlighting than terms > and pointed at some existing code to get you along this path - see > here http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2 I have some code that you wrote that seems to turn almost any query into a series of spans. Perhaps it is not as robust as my limited testing made it seem. > > There are also a couple of other Highlighter packages contributed > recently which I listed in my previous mail but I simply haven't had > the time to look at in detail so they may be useful. Anyone had any > experience of those? Non of them seem to do full span highlighting...again based on my limited investigation. > > >> every new highlight has to be compared against every previous > highlight for overlap > Yes, Analyzers that produce overlapping tokens are an added > complication when implementing highlighting logic. I think we have a > reasonable Junit test containing several of the more exotic analyzer > scenarios which you could/should use for testing any other highlighter > implementation. thanks for the tip. I appreciate your response Mark. I will continue to look at your span extractor...I thought that it alone was enough to what I wanted, but your comments seem to suggest maybe I'll need more. I hope not <g> If I do manage something I will be sure to post my results. - Mark --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]