[I] UnifiedHighlighter incorrectly returns field 'X' was indexed without offsets [lucene]

via GitHub Wed, 14 Feb 2024 09:21:09 -0800


mayya-sharipova opened a new issue, #13103:
URL: https://github.com/apache/lucene/issues/13103


   ### Description
   
   UnifiedHighlighter based on matches incorrectly returns field 'X' was 
indexed without offsets, cannot highlight
   
   Test to reproduce:
   ```java
    static final FieldType textType = new FieldType(TextField.TYPE_STORED);
       static {
           textType.setStoreTermVectors(true);
           textType.setStoreTermVectorPositions(true);
           textType.setStoreTermVectorOffsets(true);
           textType.freeze();
       }
   
       public void testHighlgiht() {
           String indexPath = "../lucene-test-indices/index1";
           Path path = Paths.get(indexPath);
           try {
               Directory directory = NIOFSDirectory.open(path);
               Analyzer analyzer = new ClassicAnalyzer();
               IndexWriterConfig config = new IndexWriterConfig(analyzer);
   
               try (IndexWriter writer = new IndexWriter(directory, config)) {
                   addDoc(writer, "The quick brown fox jumps over the lazy 
dog");
               }
   
               try (IndexReader reader = DirectoryReader.open(directory)) {
                   IndexSearcher searcher = new IndexSearcher(reader);
                   Query query = new IntervalQuery("content",
                           Intervals.analyzedText("quick brown fox jumps over 
the lazy dog", analyzer, "content", 0, true));
                   TopDocs topDocs = searcher.search(query, 10);
   
                   UnifiedHighlighter.Builder uhBuilder = new 
UnifiedHighlighter.Builder(searcher, analyzer)
                           .withWeightMatches(true);
                   UnifiedHighlighter highlighter = new 
UnifiedHighlighter(uhBuilder);
   
                   String[] highlights = highlighter.highlight("content", 
query, topDocs, 1);
                   System.out.println(Arrays.toString(highlights));
               }
           } catch (IOException e) {
               e.printStackTrace();
           }
       }
   
      private static void addDoc(IndexWriter writer, String content) throws 
IOException {
           Document doc = new Document();
           doc.add(new Field("content", content, textType));
           writer.addDocument(doc);
       }
   ```
   
   produces an error:
   ```
   java.lang.IllegalArgumentException: field 'content' was indexed without 
offsets, cannot highlight
   
        at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightOffsetsEnums(FieldHighlighter.java:157)
        at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:83)
        at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:944)
        at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:814)
        at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:792)
        at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:725)
   ```
   
   A workaround to disable highlighting based on matches:
   
   ```java
    UnifiedHighlighter.Builder uhBuilder = new 
UnifiedHighlighter.Builder(searcher, analyzer)
                           .withWeightMatches(false);
   ```
   
   
   This happens because of `ClassicAnalyzer` that removes stop words, and 
because of it usage of `ExtendedIntervalsSource` that returns -1 offsets.
   
   ### Version and environment details
   
   Lucene v 9.9.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] UnifiedHighlighter incorrectly returns field 'X' was indexed without offsets [lucene]

Reply via email to