On 11/6/2012 3:29 AM, Steve Rowe wrote:
Hi Scott,

HTMLStripCharFilter doesn't require that its input be valid HTML - there is no 
assumption of balanced tags.

Also, highlighted sections could span tags, e.g. if you highlight "this 
phrase", and the original HTML looks like:

        … this<span>phrase</span> …

the highlighting code would have to know to put multiple tags to avoid 
non-wellformedness, maybe something like:

        … <b>this</b><span><b>phrase</b></span> …

If you do develop a solution here, it would be great if you could share it with 
the community.

Also, I think it would be useful to have an XML-specific stripping char filter 
- it's on my long term to-do list :).

Steve: see https://issues.apache.org/jira/browse/SOLR-2597. I have updates for this, but since no committers took it up, I haven't bothered to keep the issue up to date with my latest code.

I would also love to see a tag-balancer for highlighting phrases. Our current solution is to use the old highlighter (not FastVectorHighlighter), which seems to tag each word in a phrase independently, rather than as an entire phrase.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to