Re: Offset-Based Analysis

2023-02-21 Thread Mikhail Khludnev
Hello Luke. Using offsets seems really doubtful to me. What comes to my mind is pre-analyzed field https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html#the-preanalyzedfield-type. Thus, external NLP service can provide ready-made tokens for straightforward indexing

Offset-Based Analysis

2023-02-21 Thread Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A)
Hi All, I am trying to enrich a lucene-powered search index with data from various different NLP systems that are distributed throughout my company. Ideally this internally-derived data could be tied back to specific positions of the original text. I’ve searched around and this is the closest t

RE: Highlighting query results, my method is too crude, but how to improve it?

2023-02-21 Thread Trevor Nicholls
Thank you David, very useful cheers T -Original Message- From: Dawid Weiss Sent: Tuesday, February 21, 2023 7:17 PM To: java-user@lucene.apache.org Subject: Re: Highlighting query results, my method is too crude, but how to improve it? You can use two different queries - the query is

Binning/Grouping large result sets efficiently

2023-02-21 Thread Matthias Mueller
Hi, I am still learning about the performance implications of Lucene's APIs when aggregating large result sets. It seems that some cases require a deeper understanding of Lucenes internals and the use of not-so-front-facing APIs. For some time I am struggling with poor grouping/ aggregation per