That's excellent. Really clear explanation. Just wondering whether to keep on with the UnifiedHighlighter now.
On Mon, 8 Dec 2025 at 22:04, Chip Ryan via users <[email protected]> wrote: > Hi Shaun, > > This is a common issue when upgrading to Solr 9. The default highlighter > changed from the "original" highlighter to the "UnifiedHighlighter". Your > GapFragmenter configuration is being completely ignored because > UnifiedHighlighter doesn't use it. > > The UnifiedHighlighter fragments based on sentence boundaries by default, > which is why you're seeing fragments that extend to full stops and pull in > multiple keyword occurrences. > > *Quick fix:* Add this to your query parameters or solrconfig.xml defaults: > > hl.method=original > > This tells Solr to use the original highlighter, which will respect your > GapFragmenter settings again. > > *Alternative:* If you'd prefer to stick with UnifiedHighlighter (it's > faster and handles multi-valued fields better), you can tune its behavior > with: > > hl.method=unified > > hl.fragsize=100 > > hl.bs.type=WORD > > Setting hl.bs.type=WORD breaks at word boundaries near your target size > rather than sentence boundaries, which should give you results much closer > to what you had before. > > Hope this helps. > > *Opensolr.com* > *Your Path to *AI Search > <https://opensolr.com/faq/view/web-crawler/46/Opensolr-Web-Crawler-Site-Search-Solution> > [email protected] > https://opensolr.com > VAT: RO-35410526 > > > > > On 8 Dec 2025, at 23:12, Shaun Campbell <[email protected]> wrote: > > Hi > > I have an existing application based on Solr 7 and highlighting and > fragment size worked great. I'm just upgrading Solr to 9.10 and noticed > some of my highlighted fragments can be quite a bit longer than they > were before. > > I have a simple setup using the fragmenter the same as in the latest Solr > documentation in my solrconfig.xml. > > <fragmenter name="gap" > > default="true" > > class="solr.highlight.GapFragmenter"> > > <lst name="defaults"> > > <int name="hl.fragsize">100</int> > > </lst> > > </fragmenter> > > > In the old production Solr 7 I get this output when I search for COVID > where I string two highlight fragments together with ... in between. > > > Before (7) > > ... Some people with *COVID*-19 experience symptoms for several weeks or > months (Long *COVID*), while... The REACT-Long *COVID* (REACT-LC) programme > aims to characterise the genetic, biological, social and... > > > > After (9.10) > > ... Some people with *COVID*-19 experience symptoms for several weeks or > months (Long *COVID*), while others have a short illness or no symptoms. > ... The REACT-Long *COVID* (REACT-LC) programme aims to characterise the > genetic, biological, social and environmental signatures and pathways, and > their inter-relationships, that underpin progression to Long *COVID*, and > to understand the natural history and long-term sequelae post-SARS-CoV-2 > infection. ... > > > In the latter version the first fragment is almost the same length. I think > it's gone on to the end of a sentence with a full stop. The second fragment > is much longer and looks like it's taken in the second mention of COVID. > > > Any ideas on how I can get back to the earlier shorter form? > > > Shaun > > >
