Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/317#discussion_r165375993
  
    --- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/Passage.java ---
    @@ -60,42 +65,35 @@ public void addMatch(int startOffset, int endOffset, 
BytesRef term) {
         matchEnds[numMatches] = endOffset;
         matchTerms[numMatches] = term;
         numMatches++;
    -  }
    -
    -  /** @lucene.internal */
    -  public void sort() {
    -    final int starts[] = matchStarts;
    -    final int ends[] = matchEnds;
    -    final BytesRef terms[] = matchTerms;
    -    new InPlaceMergeSorter() {
    -      @Override
    -      protected void swap(int i, int j) {
    -        int temp = starts[i];
    -        starts[i] = starts[j];
    -        starts[j] = temp;
    -
    -        temp = ends[i];
    -        ends[i] = ends[j];
    -        ends[j] = temp;
    -
    -        BytesRef tempTerm = terms[i];
    -        terms[i] = terms[j];
    -        terms[j] = tempTerm;
    -      }
    -
    -      @Override
    -      protected int compare(int i, int j) {
    -        return Integer.compare(starts[i], starts[j]);
    -      }
    -
    -    }.sort(0, numMatches);
    +    int termIndex = termsHash.add(term);
    --- End diff --
    
    Perhaps we could push as much of this as possible to 
`setScore(scorer,contentLength)`?  So Imagine the only extra field we add to 
Passage here is termFreqsInDoc[] that is sized alone with the other arrays.  No 
BytesRefHash, no termFreqsInPassage).  Instead, in setScore, that's where it 
computes the distinct set and termFreqsInPassage, only for the score 
computation within the scope of that method.  It keeps the score related stuff 
as isolated as possible.  And IMO as I stated in JIRA, we could add a method to 
PassageScorer to do all that stuff there.  So a simple PassageScorer that only 
cared about first-occurrence would simply use the inverse of startOffset.  Or 
we could leave that PassageScorer modification to a separate issue if you 
prefer.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to