On 11/6/2012 3:29 AM, Steve Rowe wrote:
Hi Scott,
HTMLStripCharFilter doesn't require that its input be valid HTML - there is no
assumption of balanced tags.
Also, highlighted sections could span tags, e.g. if you highlight "this
phrase", and the original HTML looks like:
… thisphras
gt; properly nested.
>
> Cheers
>
> Scott
>
> -Original Message-----
> From: Scott Smith [mailto:ssm...@mainstreamdata.com]
> Sent: Thursday, November 01, 2012 7:16 PM
> To: Michael Sokolov; java-user@lucene.apache.org
> Subject: RE: Highlighting
tags being properly nested.
Cheers
Scott
-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com]
Sent: Thursday, November 01, 2012 7:16 PM
To: Michael Sokolov; java-user@lucene.apache.org
Subject: RE: Highlighting html pages
I was trying to play with this. Am I correct in
ve me after I've stripped the HTML.
Suggestions?
Scott
-Original Message-
From: Michael Sokolov [mailto:soko...@ifactory.com]
Sent: Tuesday, October 23, 2012 9:04 PM
To: java-user@lucene.apache.org
Cc: Scott Smith
Subject: Re: Highlighting html pages
If you use HTMLStripCharFilter, i
actory.com]
Sent: Tuesday, October 23, 2012 9:04 PM
To: java-user@lucene.apache.org
Cc: Scott Smith
Subject: Re: Highlighting html pages
If you use HTMLStripCharFilter, it extracts the text only, leaving tags out,
and remembering the word positions so that highlighting works properly. Should
do ex
If you use HTMLStripCharFilter, it extracts the text only, leaving tags
out, and remembering the word positions so that highlighting works
properly. Should do exactly what you want out of the box...
On 10/23/2012 8:00 PM, Scott Smith wrote:
I need to take an html page that I retrieve from m
I need to take an html page that I retrieve from my lucene search and
highlight all of the terms that are part of the search. I need to skip over
any html tags since I don't want any words in tags which happen to match the
search to be highlighted.
Note that I don't want sections of the docum