Hi Shaun, This is a common issue when upgrading to Solr 9. The default highlighter changed from the "original" highlighter to the "UnifiedHighlighter". Your GapFragmenter configuration is being completely ignored because UnifiedHighlighter doesn't use it.
The UnifiedHighlighter fragments based on sentence boundaries by default, which is why you're seeing fragments that extend to full stops and pull in multiple keyword occurrences. Quick fix: Add this to your query parameters or solrconfig.xml defaults: hl.method=original This tells Solr to use the original highlighter, which will respect your GapFragmenter settings again. Alternative: If you'd prefer to stick with UnifiedHighlighter (it's faster and handles multi-valued fields better), you can tune its behavior with: hl.method=unified hl.fragsize=100 hl.bs.type=WORD Setting hl.bs.type=WORD breaks at word boundaries near your target size rather than sentence boundaries, which should give you results much closer to what you had before. Hope this helps. Opensolr.com Your Path to AI Search <https://opensolr.com/faq/view/web-crawler/46/Opensolr-Web-Crawler-Site-Search-Solution> [email protected] <mailto:[email protected]> https://opensolr.com <https://opensolr.com/> VAT: RO-35410526 > On 8 Dec 2025, at 23:12, Shaun Campbell <[email protected]> wrote: > > Hi > > I have an existing application based on Solr 7 and highlighting and > fragment size worked great. I'm just upgrading Solr to 9.10 and noticed > some of my highlighted fragments can be quite a bit longer than they > were before. > > I have a simple setup using the fragmenter the same as in the latest Solr > documentation in my solrconfig.xml. > > <fragmenter name="gap" > > default="true" > > class="solr.highlight.GapFragmenter"> > > <lst name="defaults"> > > <int name="hl.fragsize">100</int> > > </lst> > > </fragmenter> > > > In the old production Solr 7 I get this output when I search for COVID > where I string two highlight fragments together with ... in between. > > > Before (7) > > ... Some people with *COVID*-19 experience symptoms for several weeks or > months (Long *COVID*), while... The REACT-Long *COVID* (REACT-LC) programme > aims to characterise the genetic, biological, social and... > > > > After (9.10) > > ... Some people with *COVID*-19 experience symptoms for several weeks or > months (Long *COVID*), while others have a short illness or no symptoms. > ... The REACT-Long *COVID* (REACT-LC) programme aims to characterise the > genetic, biological, social and environmental signatures and pathways, and > their inter-relationships, that underpin progression to Long *COVID*, and > to understand the natural history and long-term sequelae post-SARS-CoV-2 > infection. ... > > > In the latter version the first fragment is almost the same length. I think > it's gone on to the end of a sentence with a full stop. The second fragment > is much longer and looks like it's taken in the second mention of COVID. > > > Any ideas on how I can get back to the earlier shorter form? > > > Shaun
smime.p7s
Description: S/MIME cryptographic signature
