On 11/6/2012 3:29 AM, Steve Rowe wrote:
Hi Scott,
HTMLStripCharFilter doesn't require that its input be valid HTML - there is no
assumption of balanced tags.
Also, highlighted sections could span tags, e.g. if you highlight "this
phrase", and the original HTML looks like:
… this<span>phrase</span> …
the highlighting code would have to know to put multiple tags to avoid
non-wellformedness, maybe something like:
… <b>this</b><span><b>phrase</b></span> …
If you do develop a solution here, it would be great if you could share it with
the community.
Also, I think it would be useful to have an XML-specific stripping char filter
- it's on my long term to-do list :).
Steve: see https://issues.apache.org/jira/browse/SOLR-2597. I have
updates for this, but since no committers took it up, I haven't bothered
to keep the issue up to date with my latest code.
I would also love to see a tag-balancer for highlighting phrases. Our
current solution is to use the old highlighter (not
FastVectorHighlighter), which seems to tag each word in a phrase
independently, rather than as an entire phrase.
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org