Hi Shaun,
This is a common issue when upgrading to Solr 9. The default highlighter 
changed from the "original" highlighter to the "UnifiedHighlighter". Your 
GapFragmenter configuration is being completely ignored because 
UnifiedHighlighter doesn't use it.

The UnifiedHighlighter fragments based on sentence boundaries by default, which 
is why you're seeing fragments that extend to full stops and pull in multiple 
keyword occurrences.

Quick fix: Add this to your query parameters or solrconfig.xml defaults:

hl.method=original

This tells Solr to use the original highlighter, which will respect your 
GapFragmenter settings again.

Alternative: If you'd prefer to stick with UnifiedHighlighter (it's faster and 
handles multi-valued fields better), you can tune its behavior with:

hl.method=unified

hl.fragsize=100

hl.bs.type=WORD

Setting hl.bs.type=WORD breaks at word boundaries near your target size rather 
than sentence boundaries, which should give you results much closer to what you 
had before.

Hope this helps.


Opensolr.com
Your Path to AI Search 
<https://opensolr.com/faq/view/web-crawler/46/Opensolr-Web-Crawler-Site-Search-Solution>
[email protected] <mailto:[email protected]>
https://opensolr.com <https://opensolr.com/>
VAT: RO-35410526




> On 8 Dec 2025, at 23:12, Shaun Campbell <[email protected]> wrote:
> 
> Hi
> 
> I have an existing application based on Solr 7 and highlighting and
> fragment size worked great. I'm just upgrading Solr to 9.10 and noticed
> some of my highlighted fragments can be quite a bit longer than they
> were before.
> 
> I have a simple setup using the fragmenter the same as in the latest Solr
> documentation in my solrconfig.xml.
> 
> <fragmenter name="gap"
> 
>                  default="true"
> 
>                  class="solr.highlight.GapFragmenter">
> 
>        <lst name="defaults">
> 
>          <int name="hl.fragsize">100</int>
> 
>        </lst>
> 
>  </fragmenter>
> 
> 
> In the old production Solr 7 I get this output when I search for COVID
> where I string two highlight fragments together with ... in between.
> 
> 
> Before (7)
> 
> ... Some people with *COVID*-19 experience symptoms for several weeks or
> months (Long *COVID*), while... The REACT-Long *COVID* (REACT-LC) programme
> aims to characterise the genetic, biological, social and...
> 
> 
> 
> After (9.10)
> 
> ... Some people with *COVID*-19 experience symptoms for several weeks or
> months (Long *COVID*), while others have a short illness or no symptoms.
> ... The REACT-Long *COVID* (REACT-LC) programme aims to characterise the
> genetic, biological, social and environmental signatures and pathways, and
> their inter-relationships, that underpin progression to Long *COVID*, and
> to understand the natural history and long-term sequelae post-SARS-CoV-2
> infection. ...
> 
> 
> In the latter version the first fragment is almost the same length. I think
> it's gone on to the end of a sentence with a full stop. The second fragment
> is much longer and looks like it's taken in the second mention of COVID.
> 
> 
> Any ideas on how I can get back to the earlier shorter form?
> 
> 
> Shaun

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to