CustomBreakIterator Performance Issues

df2832368_...@amberoad.de df2832368_...@amberoad.de Mon, 08 Mar 2021 02:05:34 -0800

Hello,

I am currently working on getting a custom BreakIterator for the Unified 
Highlighter to work, and struggle a bit performance wise.


I need a BreakIterator for getting nice highlights of passages. For this I want 
the start of the highlight to be a sentence-start and the end to be a word-end. 
There are also some weird edge cases.

I already coded the BreakIterator and integrated it to our custom 
UnifiedHighlighter class, but when I use this Iterator the qTime of all 
requests rise from ~1000 to 12000+ which is not acceptable for this application.

Here is a link to my implementation. I can't really find where I am horrible 
inefficient.(I know that these functions get called very often)

Any suggestions are welcome, also other approaches.

So are there some nice resources to learn more about BreakIterators and stuff, 
since digging into the code is really hard here.

Another approach I am considering next is to do this highlight "trimming", when 
the final highlights are found. This would reduce the amount of logic called, 
but I guess the scoring system of SOLR wouldn't be taken in to account the 
right way.

As I said all suggestions are welcome and thanks in advance.

Jan Ulrich Robens

CustomBreakIterator Performance Issues

Reply via email to