You might be able to get something “good enough” with one of the pattern tokenizers, see: https://lucene.apache.org/solr/guide/8_6/tokenizers.html.
Won’t be 100% of course. And Paul’s comments are well taken, especially since your input will be inconsistent I’d guess. How much you want to bet that the same document will have "the abort() function” in one paragraph and "the abort function” in the next with abort italicized? Best, Erick > On Nov 23, 2020, at 2:42 AM, Trevor Nicholls <tre...@castingthevoid.com> > wrote: > > the abort() function --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org