Hi John, I did an initial test before, but I didn’t read the code at the time. Today, I spent time reviewing 0001. Overall, I believe the implementation is solid. I just got a few comments/suggestions.
> On Nov 26, 2025, at 21:11, John Naylor <[email protected]> wrote: > > > For v5 I've also added CHECK_FOR_INTERRUPTS and rewrote some comments. > > > -- > John Naylor > Amazon Web Services > <v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch><v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patch><v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patch><v5-0002-WIP-Adjust-regression-tests.patch> 1 - 0001 ``` + /* extract the byte for this level from the normalized datum */ + current_byte = extract_byte(normalize_datum(tup->datum1, ssup), + level); + + /* save it for the permutation step */ + tup->current_byte = current_byte; ``` We recompute normalize_datum(tup->datum1, ssup) for every tuple in every level, why don’t cache the result in SortTuple. As we have cached current_byte in SortTuple, it shouldn’t be a big deal to add one more field to it. 2 - 0001 ``` + while (num_remaining > 1) + { + /* start the count over */ + num_remaining = num_partitions; ``` The swap loop always start the count over, so that sorted partitions are re-scanned as well. I think we can do an optimization like: ``` num_active = num_partitions; while (num_active > 1) { for (int i = 0; i < num_active; ) { uint8 idx = remaining_partitions[i]; // do the swaps for the partition … if (partitions[idx].offset == partitions[idx].next_offset) { remaining_partitions[i] = remaining_partitions[num_active - 1]; num_active--; } else i++; } } ``` This way we move out sorted partitions, so they will not be re-scanned. 3 - 0001 In sort_byvalue_datum, we can add a fast-path for all NULL and all NOT NULL cases, so that they won’t need to run the branchless cyclic permutation and the two “for” loops of assertions. Something like: ``` while (d1 < state->memtupcount && data[d1].isnull1 == nulls_first) d1++; null_count = d1; not_null_count = state->memtupcount - d1; /* fast paths: all on one side */ if (null_count == 0 || not_null_count == 0) { if (nulls_first) { null_start = data; not_null_start = data + null_count; } else { not_null_start = data; null_start = data + not_null_count; } /* only one partition is non-empty; sort it and return */ if (not_null_count > 1) { if (not_null_count < QSORT_THRESHOLD) qsort_tuple(not_null_start, not_null_count, state->base.comparetup, state); else radix_sort_tuple(not_null_start, not_null_count, 0, state); } else if (null_count > 1 && state->base.onlyKey == NULL) { qsort_tuple(null_start, null_count, state->base.comparetup_tiebreak, state); } return; } /* ... existing branchless cyclic permutation ... */ ``` Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
