Hi group,
I'm trying to highlight my complete(!) XML document, which is indexed for that
purpose in a special field called "wkxmlsource". I configured the "wkxmlsource"
field like
<field indexed="true" multiValued="false" name="wkxmlsource" omitNorms="true"
stored="true" termPositions="true" termOffsets="true" termVectors="true"
type="text_xml"/>
And the text_xml fieldtype is almost equal to the text_en field, but with the
<charFilter class="solr.HTMLStripCharFilterFactory" /> as the first class in
the index analyzer. That prevents highlighting inside XML tags.
First I tried the simple highlighter and that almost worked: I get my document
back with my search terms and phrases highlighted, each individual term gets it
own highlight tags. But the problem is that not the complete value of field
"wkxmlsource" is returned; it cuts off the bottom part, no matter how big I set
the hl.fragsize.
So my next try was to use the FVH (hl.useFastVectorHighlighter=true) instead.
That helped: it returns now the complete value of "wkxmlsource" with all my
search terms/phrases highlighted. But...in case of a phrase search, it doesn't
highlight each individual term anymore, but it only puts highlight tags around
the complete phrase. That could possible lead to malformed XML. An example:
Search for phrase: "across the country Santa Fe" it highlights like this in the
document:
<para align="left">...spread <em>across the country.</para><para
align="left">Santa Fe</em> Pacific... </para>
How can I let the FVH highlight individual terms instead of the complete
phrase? Ideally I like to have something like:
<para align="left">...spread <em>across</em> <em>the</em>
<em>country</em>.</para><para align="left"><em>Santa</em> <em>Fe</em>
Pacific... </para>
which is still valid XML.
My boundaryscanner is configured like:
<boundaryScanner
name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
<lst
name="defaults">
<str name="hl.bs.type">WORD</str>
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
Thanks, Tom
--
Tom Burgmans
[cid:[email protected]]
Search Specialist
Tel: +31 (0)17 246 66 33
Mobile: +31 (0)6 306 821 78
Platform Technologies
Global Platform Organization
Zuidpoolsingel 2
2408 ZE, Alphen aan den Rijn The Netherlands
[email protected]
www.wolterskluwer.com
________________________________
This email and any attachments may contain confidential or privileged
information
and is intended for the addressee only. If you are not the intended recipient,
please
immediately notify us by email or telephone and delete the original email and
attachments
without using, disseminating or reproducing its contents to anyone other than
the intended
recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete
transmission of
of this email or any attachments, nor for unauthorized use by its employees.
Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The
Netherlands, and is registered
with the Trade Registry of the Dutch Chamber of Commerce under number 33202517.