Re: Highlighting html pages

2012-11-06 Thread Michael Sokolov
On 11/6/2012 3:29 AM, Steve Rowe wrote: Hi Scott, HTMLStripCharFilter doesn't require that its input be valid HTML - there is no assumption of balanced tags. Also, highlighted sections could span tags, e.g. if you highlight "this phrase", and the original HTML looks like: … thisphras

Re: Highlighting html pages

2012-11-06 Thread Steve Rowe
gt; properly nested. > > Cheers > > Scott > > -Original Message----- > From: Scott Smith [mailto:ssm...@mainstreamdata.com] > Sent: Thursday, November 01, 2012 7:16 PM > To: Michael Sokolov; java-user@lucene.apache.org > Subject: RE: Highlighting

RE: Highlighting html pages

2012-11-05 Thread Scott Smith
tags being properly nested. Cheers Scott -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Thursday, November 01, 2012 7:16 PM To: Michael Sokolov; java-user@lucene.apache.org Subject: RE: Highlighting html pages I was trying to play with this. Am I correct in

Re: Highlighting html pages

2012-11-05 Thread Michael Sokolov
ve me after I've stripped the HTML. Suggestions? Scott -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Tuesday, October 23, 2012 9:04 PM To: java-user@lucene.apache.org Cc: Scott Smith Subject: Re: Highlighting html pages If you use HTMLStripCharFilter, i

RE: Highlighting html pages

2012-11-01 Thread Scott Smith
actory.com] Sent: Tuesday, October 23, 2012 9:04 PM To: java-user@lucene.apache.org Cc: Scott Smith Subject: Re: Highlighting html pages If you use HTMLStripCharFilter, it extracts the text only, leaving tags out, and remembering the word positions so that highlighting works properly. Should do ex

Re: Highlighting html pages

2012-10-23 Thread Michael Sokolov
If you use HTMLStripCharFilter, it extracts the text only, leaving tags out, and remembering the word positions so that highlighting works properly. Should do exactly what you want out of the box... On 10/23/2012 8:00 PM, Scott Smith wrote: I need to take an html page that I retrieve from m

Highlighting html pages

2012-10-23 Thread Scott Smith
I need to take an html page that I retrieve from my lucene search and highlight all of the terms that are part of the search. I need to skip over any html tags since I don't want any words in tags which happen to match the search to be highlighted. Note that I don't want sections of the docum