Re: [PR] Add support for index sorting with document blocks [lucene]

via GitHub Tue, 12 Dec 2023 00:16:49 -0800


s1monw commented on code in PR #12829:
URL: https://github.com/apache/lucene/pull/12829#discussion_r1423606084



##########
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##########
@@ -219,15 +222,33 @@ private Sorter.DocMap maybeSortSegment(SegmentWriteState 
state) throws IOExcepti
     }
 
     LeafReader docValuesReader = getDocValuesLeafReader();
-
+    Function<IndexSorter.DocComparator, IndexSorter.DocComparator> 
comparatorWrapper = in -> in;
+
+    if (state.segmentInfo.getHasBlocks() && indexSort.getParentField() != 
null) {
+      final DocIdSetIterator readerValues =
+          docValuesReader.getNumericDocValues(indexSort.getParentField());
+      BitSet parents = BitSet.of(readerValues, state.segmentInfo.maxDoc());
+      comparatorWrapper =
+          in ->
+              (docID1, docID2) ->
+                  in.compare(parents.nextSetBit(docID1), 
parents.nextSetBit(docID2));
+    }
+    assert state.segmentInfo.getHasBlocks() == false
+            || indexSort.getParentField() != null
+            || indexCreatedVersionMajor < Version.LUCENE_10_0_0.major
+        : "parent field is not set but the index has blocks. 
indexCreatedVersionMajor: "
+            + indexCreatedVersionMajor;
     List<IndexSorter.DocComparator> comparators = new ArrayList<>();
     for (int i = 0; i < indexSort.getSort().length; i++) {
       SortField sortField = indexSort.getSort()[i];
       IndexSorter sorter = sortField.getIndexSorter();
       if (sorter == null) {
         throw new UnsupportedOperationException("Cannot sort index using sort 
field " + sortField);
       }
-      comparators.add(sorter.getDocComparator(docValuesReader, 
state.segmentInfo.maxDoc()));
+
+      IndexSorter.DocComparator docComparator =

Review Comment:
   @msokolov This is basically what I had in my first version or this. There 
are a couple of issues with this:
   
    - we can't execute arbitrary queries as a sort supplier since the 
datastructures inside DWPT don't support this
    - in-fact we can only really access DV in such a fashion, we would likely 
be able with a non-trivial amount of work to walk a postinglist but executing a 
query ie. have a IndexReader on top of DWPT would be a lot of work.
    - a custom comparator also needs a field, a type etc. that is more to 
configure and store in the index than just a field name that IW fully controls 
it's type and content.
    
   I am still under the impression that you think this change dictates a type 
and name of a parent field for the application that uses Lucene. It's not. You 
can think of this as a purely internal field. You don't have to use it for you 
application or to model you block structure. It only marks the end of the block 
such that sort doesn't break it. It's basically an index level guarantee to the 
API guarantee we provide.
   We do not need to model sub-blocks here since the order of the docs must not 
be changed also not by a sort? If it needs to be sorted then just within the 
block and that can / should happen before it's passed to the IW? 
   I hope this makes sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for index sorting with document blocks [lucene]

Reply via email to