Hi! On Mon, Nov 20, 2017 at 12:24 AM, Thomas Munro < thomas.mu...@enterprisedb.com> wrote:
> On Wed, Nov 15, 2017 at 7:42 AM, Alexander Korotkov > <a.korot...@postgrespro.ru> wrote: > > Sure, please find rebased patch attached. > > + /* > + * Check if first "skipCols" sort values are equal. > + */ > + static bool > + cmpSortSkipCols(IncrementalSortState *node, TupleTableSlot *a, > + > TupleTableSlot *b) > + { > + int n, i; > + > + Assert(IsA(node->ss.ps.plan, IncrementalSort)); > + > + n = ((IncrementalSort *) node->ss.ps.plan)->skipCols; > + > + for (i = 0; i < n; i++) > + { > + Datum datumA, datumB, result; > + bool isnullA, isnullB; > + AttrNumber attno = node->skipKeys[i].attno; > + SkipKeyData *key; > + > + datumA = slot_getattr(a, attno, &isnullA); > + datumB = slot_getattr(b, attno, &isnullB); > + > + /* Special case for NULL-vs-NULL, else use standard comparison */ > + if (isnullA || isnullB) > + { > + if (isnullA == isnullB) > + continue; > + else > + return false; > + } > + > + key = &node->skipKeys[i]; > + > + key->fcinfo.arg[0] = datumA; > + key->fcinfo.arg[1] = datumB; > + > + /* just for paranoia's sake, we reset isnull each time */ > + key->fcinfo.isnull = false; > + > + result = FunctionCallInvoke(&key->fcinfo); > + > + /* Check for null result, since caller is clearly not expecting > one */ > + if (key->fcinfo.isnull) > + elog(ERROR, "function %u returned NULL", key->flinfo.fn_oid); > + > + if (!DatumGetBool(result)) > + return false; > + } > + return true; > + } > > Is there some reason not to use ApplySortComparator for this? I think > you're missing out on lower-overhead comparators, and in any case it's > probably good code reuse, no? > However, for incremental sort case we don't need to know here whether A > B or B > A. It's enough for us to know if A = B or A != B. In some cases it's way cheaper. For instance, for texts equality check is basically memcmp while comparison may use collation. Embarrassingly, I was unaware of this patch and started prototyping > exactly the same thing independently[1]. I hadn't got very far and > will now abandon that, but that's one thing I did differently. Two > other things that may be different: I had a special case for groups of > size 1 that skipped the sorting, and I only sorted on the suffix > because I didn't put tuples with different prefixes into the sorter (I > was assuming that tuplesort_reset was going to be super efficient, > though I hadn't got around to writing that). I gather that you have > determined empirically that it's better to be able to sort groups of > at least MIN_GROUP_SIZE than to be able to skip the comparisons on the > leading attributes, but why is that the case? > Right. The issue that not only case of one tuple per group could cause overhead, but few tuples (like 2 or 3) is also case of overhead. Also, overhead is related not only to sorting. While investigate of regression case provided by Heikki [1], I've seen extra time spent mostly in extra copying of sample tuple and comparison with that. In order to cope this overhead I've introduced MIN_GROUP_SIZE which allows to skip copying sample tuples too frequently. [1] https://www.postgresql.org/message-id/2c59b009-61d3-9350-04ee-4b701eb93...@iki.fi ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company