Hi!

On Mon, Nov 20, 2017 at 12:24 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Wed, Nov 15, 2017 at 7:42 AM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > Sure, please find rebased patch attached.
>
> + /*
> +  * Check if first "skipCols" sort values are equal.
> +  */
> + static bool
> + cmpSortSkipCols(IncrementalSortState *node, TupleTableSlot *a,
> +
>  TupleTableSlot *b)
> + {
> +     int n, i;
> +
> +     Assert(IsA(node->ss.ps.plan, IncrementalSort));
> +
> +     n = ((IncrementalSort *) node->ss.ps.plan)->skipCols;
> +
> +     for (i = 0; i < n; i++)
> +     {
> +         Datum datumA, datumB, result;
> +         bool isnullA, isnullB;
> +         AttrNumber attno = node->skipKeys[i].attno;
> +         SkipKeyData *key;
> +
> +         datumA = slot_getattr(a, attno, &isnullA);
> +         datumB = slot_getattr(b, attno, &isnullB);
> +
> +         /* Special case for NULL-vs-NULL, else use standard comparison */
> +         if (isnullA || isnullB)
> +         {
> +             if (isnullA == isnullB)
> +                 continue;
> +             else
> +                 return false;
> +         }
> +
> +         key = &node->skipKeys[i];
> +
> +         key->fcinfo.arg[0] = datumA;
> +         key->fcinfo.arg[1] = datumB;
> +
> +         /* just for paranoia's sake, we reset isnull each time */
> +         key->fcinfo.isnull = false;
> +
> +         result = FunctionCallInvoke(&key->fcinfo);
> +
> +         /* Check for null result, since caller is clearly not expecting
> one */
> +         if (key->fcinfo.isnull)
> +             elog(ERROR, "function %u returned NULL", key->flinfo.fn_oid);
> +
> +         if (!DatumGetBool(result))
> +             return false;
> +     }
> +     return true;
> + }
>
> Is there some reason not to use ApplySortComparator for this?  I think
> you're missing out on lower-overhead comparators, and in any case it's
> probably good code reuse, no?
>

However, for incremental sort case we don't need to know here whether A > B
or B > A.  It's enough for us to know if A = B or A != B.  In some cases
it's way cheaper.  For instance, for texts equality check is basically
memcmp while comparison may use collation.

Embarrassingly, I was unaware of this patch and started prototyping
> exactly the same thing independently[1].  I hadn't got very far and
> will now abandon that, but that's one thing I did differently.  Two
> other things that may be different: I had a special case for groups of
> size 1 that skipped the sorting, and I only sorted on the suffix
> because I didn't put tuples with different prefixes into the sorter (I
> was assuming that tuplesort_reset was going to be super efficient,
> though I hadn't got around to writing that).  I gather that you have
> determined empirically that it's better to be able to sort groups of
> at least MIN_GROUP_SIZE than to be able to skip the comparisons on the
> leading attributes, but why is that the case?
>

Right.  The issue that not only case of one tuple per group could cause
overhead, but few tuples (like 2 or 3) is also case of overhead.  Also,
overhead is related not only to sorting.  While investigate of regression
case provided by Heikki [1], I've seen extra time spent mostly in extra
copying of sample tuple and comparison with that.  In order to cope this
overhead I've introduced MIN_GROUP_SIZE which allows to skip copying sample
tuples too frequently.

[1]
https://www.postgresql.org/message-id/2c59b009-61d3-9350-04ee-4b701eb93...@iki.fi

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to