s1monw commented on code in PR #12829:
URL: https://github.com/apache/lucene/pull/12829#discussion_r1423606084
##########
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##########
@@ -219,15 +222,33 @@ private Sorter.DocMap maybeSortSegment(SegmentWriteState
state) throws IOExcepti
}
LeafReader docValuesReader = getDocValuesLeafReader();
-
+ Function<IndexSorter.DocComparator, IndexSorter.DocComparator>
comparatorWrapper = in -> in;
+
+ if (state.segmentInfo.getHasBlocks() && indexSort.getParentField() !=
null) {
+ final DocIdSetIterator readerValues =
+ docValuesReader.getNumericDocValues(indexSort.getParentField());
+ BitSet parents = BitSet.of(readerValues, state.segmentInfo.maxDoc());
+ comparatorWrapper =
+ in ->
+ (docID1, docID2) ->
+ in.compare(parents.nextSetBit(docID1),
parents.nextSetBit(docID2));
+ }
+ assert state.segmentInfo.getHasBlocks() == false
+ || indexSort.getParentField() != null
+ || indexCreatedVersionMajor < Version.LUCENE_10_0_0.major
+ : "parent field is not set but the index has blocks.
indexCreatedVersionMajor: "
+ + indexCreatedVersionMajor;
List<IndexSorter.DocComparator> comparators = new ArrayList<>();
for (int i = 0; i < indexSort.getSort().length; i++) {
SortField sortField = indexSort.getSort()[i];
IndexSorter sorter = sortField.getIndexSorter();
if (sorter == null) {
throw new UnsupportedOperationException("Cannot sort index using sort
field " + sortField);
}
- comparators.add(sorter.getDocComparator(docValuesReader,
state.segmentInfo.maxDoc()));
+
+ IndexSorter.DocComparator docComparator =
Review Comment:
@msokolov This is basically what I had in my first version or this. There
are a couple of issues with this:
- we can't execute arbitrary queries as a sort supplier since the
datastructures inside DWPT don't support this
- in-fact we can only really access DV in such a fashion, we would likely
be able with a non-trivial amount of work to walk a postinglist but executing a
query ie. have a IndexReader on top of DWPT would be a lot of work.
- a custom comparator also needs a field, a type etc. that is more to
configure and store in the index than just a field name that IW fully controls
it's type and content.
I am still under the impression that you think this change dictates a type
and name of a parent field for the application that uses Lucene. It's not. You
can think of this as a purely internal field. You don't have to use it for you
application or to model you block structure. It only marks the end of the block
such that sort doesn't break it. It's basically an index level guarantee to the
API guarantee we provide.
We do not need to model sub-blocks here since the order of the docs must not
be changed also not by a sort? If it needs to be sorted then just within the
block and that can / should happen before it's passed to the IW?
I hope this makes sense?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]