Hi John,

I did an initial test before, but I didn’t read the code at the time. Today, I 
spent time reviewing 0001. Overall, I believe the implementation is solid. I 
just got a few comments/suggestions.

> On Nov 26, 2025, at 21:11, John Naylor <[email protected]> wrote:
> 
> 
> For v5 I've also added CHECK_FOR_INTERRUPTS and rewrote some comments.
> 
> 
> --
> John Naylor
> Amazon Web Services
> <v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch><v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patch><v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patch><v5-0002-WIP-Adjust-regression-tests.patch>


1 - 0001
```
+               /* extract the byte for this level from the normalized datum */
+               current_byte = extract_byte(normalize_datum(tup->datum1, ssup),
+                                                                       level);
+
+               /* save it for the permutation step */
+               tup->current_byte = current_byte;
```

We recompute normalize_datum(tup->datum1, ssup) for every tuple in every level, 
why don’t cache the result in SortTuple. As we have cached current_byte in 
SortTuple, it shouldn’t be a big deal to add one more field to it.

2 - 0001
```
+       while (num_remaining > 1)
+       {
+               /* start the count over */
+               num_remaining = num_partitions;
```

The swap loop always start the count over, so that sorted partitions are 
re-scanned as well. I think we can do an optimization like:

```
num_active = num_partitions;
while (num_active > 1)
{
    for (int i = 0; i < num_active; )
    {
        uint8 idx = remaining_partitions[i];
        // do the swaps for the partition …

        if (partitions[idx].offset == partitions[idx].next_offset)
        {
            remaining_partitions[i] = remaining_partitions[num_active - 1];
            num_active--;
        }
        else
            i++;
    }
}
```

This way we move out sorted partitions, so they will not be re-scanned.

3 - 0001

In sort_byvalue_datum, we can add a fast-path for all NULL and all NOT NULL 
cases, so that they won’t need to run the branchless cyclic permutation and the 
two “for” loops of assertions. Something like:

```
while (d1 < state->memtupcount && data[d1].isnull1 == nulls_first)
    d1++;

null_count = d1;
not_null_count = state->memtupcount - d1;

/* fast paths: all on one side */
if (null_count == 0 || not_null_count == 0)
{
    if (nulls_first)
    {
        null_start = data;
        not_null_start = data + null_count;
    }
    else
    {
        not_null_start = data;
        null_start = data + not_null_count;
    }

    /* only one partition is non-empty; sort it and return */
    if (not_null_count > 1)
    {
        if (not_null_count < QSORT_THRESHOLD)
            qsort_tuple(not_null_start, not_null_count, state->base.comparetup, 
state);
        else
            radix_sort_tuple(not_null_start, not_null_count, 0, state);
    }
    else if (null_count > 1 && state->base.onlyKey == NULL)
    {
        qsort_tuple(null_start, null_count, state->base.comparetup_tiebreak, 
state);
    }
    return;
}

/* ... existing branchless cyclic permutation ... */

```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to