On Mon, Sep 29, 2014 at 10:34 PM, Peter Geoghegan <p...@heroku.com> wrote: > <single sortsupport state patch>.
You probably noticed that I posted an independently useful patch to make all tuplesort cases use sortsupport [1] - currently, both the B-Tree and CLUSTER cases do not use the sortsupport infrastructure more or less for no good reason. That patch was primarily written to make abbreviated keys work for all cases, though. I think that making heap tuple sorts based on a text attribute much faster is a very nice thing, but in practice most OLTP or web applications are not all that sensitive to the amount of time taken to sort heap tuples. However, the length of time it takes to build indexes is something that most busy production applications are very sensitive to, since of course apart from the large consumption of system resources often required, however long the sort takes is virtually the same amount of time as however long we hold a very heavy, disruptive relation-level ShareLock. Obviously the same applies to CLUSTER, but more so, since it must acquire an AccessExclusiveLock on the relation to be reorganized. I think almost everyone will agree that making B-Tree builds much faster is the really compelling case here, because that's where slow sorts cause by far the most pain for users in the real world. Attached patch, when applied, accelerates all tuplesort cases using abbreviated keys, building on previous work here, as well as the patch posted to that other thread. Exact instructions are in the commit message of 0004-* (i.e. where to find the pieces I haven't posted here). I also attach a minor bitrot fix commit/patch. Performance is improved for B-Tree index builds by a great deal, too. The improvements are only slightly less than those seen for comparable heap tuple sorts (that is, my earlier test cases that had client overhead removed). With larger sorts, that difference tends to get lost in the noise easily. I'm very encouraged by this. I think that being able to build B-Tree indexes on text attributes very significantly faster than previous versions of PostgreSQL is likely to be a significant feature for PostgreSQL 9.5. After all, external sorts are where improvements are most noticeable [2] - they're so much faster with this patch that they're actually sometimes faster than internal sorts *with* abbreviated keys. This would something that I found quite surprising. [1] http://www.postgresql.org/message-id/CAM3SWZTfKZHTUiWDdHg+6tcYuMsdHoC=bmuaivgmp9athj4...@mail.gmail.com [2] http://www.postgresql.org/message-id/cam3swzqvjcgme6ube-ydipu0n5bo7rmz31zrhmskdduynej...@mail.gmail.com -- Peter Geoghegan
From 72070126e707cf356ddcb3de60cbdb14658ed077 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <p...@heroku.com> Date: Fri, 10 Oct 2014 22:26:21 -0700 Subject: [PATCH 4/4] Make B-Tree/CLUSTER sortsupport use abbreviation This commit is intended to be applied on top of several others. First, apply the abbreviated key support patch -- the revision posted here: cam3swzt+v3llrbscyuj9jovumdpbq6bupf9wofbfuywgwuh...@mail.gmail.com. Then, apply the bitrot-fixing commit that was attached alongside this patch. In addition, the B-Tree/CLUSTER sortsupport patch must then be applied. Get it from: CAM3SWZTfKZHTUiWDdHg+6tcYuMsdHoC=bmuaivgmp9athj4...@mail.gmail.com Finally, apply this patch. B-Tree sortsupport with abbreviated keys (for text) shows benefits (and costs) that are similar to the heavily tested/benchmarked MinimalTuple case. There is additional overhead when building a B-Tree index as compared to heap tuplesorts (with no client overhead), but even still the vast majority of system time is spent actually sorting index tuples. Therefore, it is not expected that the addition of B-Tree/CLUSTER support will influence the review of abbreviated keys much either way. --- src/backend/access/nbtree/nbtsort.c | 3 + src/backend/utils/sort/tuplesort.c | 385 +++++++++++++++++++++++++++--------- 2 files changed, 298 insertions(+), 90 deletions(-) diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index a6c44a7..1ccc9fb 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -720,6 +720,9 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2) strategy = (scanKey->sk_flags & SK_BT_DESC) != 0 ? BTGreaterStrategyNumber : BTLessStrategyNumber; + /* Abbreviation is not supported here */ + sortKey->abbrev_state = ABBREVIATED_KEYS_NO; + PrepareSortSupportFromIndexRel(wstate->index, strategy, sortKey); } diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 1d57de7..c6a18dc 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -453,6 +453,7 @@ struct Tuplesortstate static Tuplesortstate *tuplesort_begin_common(int workMem, bool randomAccess); static void puttuple_common(Tuplesortstate *state, SortTuple *tuple); +static bool consider_abort_common(Tuplesortstate *state); static void inittapes(Tuplesortstate *state); static void selectnewtape(Tuplesortstate *state); static void mergeruns(Tuplesortstate *state); @@ -709,6 +710,8 @@ tuplesort_begin_cluster(TupleDesc tupDesc, state->copytup = copytup_cluster; state->writetup = writetup_cluster; state->readtup = readtup_cluster; + state->abbrevNext = 10; + state->abbrevRowsHint = -1; state->indexInfo = BuildIndexInfo(indexRel); @@ -754,6 +757,15 @@ tuplesort_begin_cluster(TupleDesc tupDesc, strategy = (scanKey->sk_flags & SK_BT_DESC) != 0 ? BTGreaterStrategyNumber : BTLessStrategyNumber; + /* + * Convey to sortsupport routine if abbreviation optimization is + * applicable in principle + */ + if (i == 0) + sortKey->abbrev_state = ABBREVIATED_KEYS_YES; + else + sortKey->abbrev_state = ABBREVIATED_KEYS_NO; + PrepareSortSupportFromIndexRel(indexRel, strategy, sortKey); } @@ -797,6 +809,8 @@ tuplesort_begin_index_btree(Relation heapRel, state->copytup = copytup_index; state->writetup = writetup_index; state->readtup = readtup_index; + state->abbrevNext = 10; + state->abbrevRowsHint = -1; state->heapRel = heapRel; state->indexRel = indexRel; @@ -825,6 +839,15 @@ tuplesort_begin_index_btree(Relation heapRel, strategy = (scanKey->sk_flags & SK_BT_DESC) != 0 ? BTGreaterStrategyNumber : BTLessStrategyNumber; + /* + * Convey to sortsupport routine if abbreviation optimization is + * applicable in principle + */ + if (i == 0) + sortKey->abbrev_state = ABBREVIATED_KEYS_YES; + else + sortKey->abbrev_state = ABBREVIATED_KEYS_NO; + PrepareSortSupportFromIndexRel(indexRel, strategy, sortKey); } @@ -1234,15 +1257,62 @@ tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel, { MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext); SortTuple stup; + Datum original; + IndexTuple tuple; stup.tuple = index_form_tuple(RelationGetDescr(rel), values, isnull); - ((IndexTuple) stup.tuple)->t_tid = *self; + tuple = ((IndexTuple) stup.tuple); + tuple->t_tid = *self; USEMEM(state, GetMemoryChunkSpace(stup.tuple)); /* set up first-column key value */ - stup.datum1 = index_getattr((IndexTuple) stup.tuple, - 1, - RelationGetDescr(state->indexRel), - &stup.isnull1); + original = index_getattr(tuple, + 1, + RelationGetDescr(state->indexRel), + &stup.isnull1); + + if (!state->sortKeys->abbrev_converter || stup.isnull1) + { + /* + * Store ordinary Datum representation. + * + * Converter won't expect NULL value, and cost model is not required to + * account for NULL, so only call converter for non-NULL values. + */ + stup.datum1 = original; + } + else if (!consider_abort_common(state)) + { + /* Store abbreviated key representation */ + stup.datum1 = state->sortKeys->abbrev_converter(original, + state->sortKeys); + } + else + { + /* Abort abbreviation */ + int i; + + stup.datum1 = original; + + /* + * Set state to be consistent with never trying abbreviation. + * + * Alter datum1 representation in already-copied tuples, so as to + * ensure a consistent representation (current tuple was just handled). + * Note that we rely on all tuples copied so far actually being + * contained within memtuples array. + */ + for (i = 0; i < state->memtupcount; i++) + { + SortTuple *mtup = &state->memtuples[i]; + + tuple = mtup->tuple; + mtup->datum1 = index_getattr(tuple, + 1, + RelationGetDescr(state->indexRel), + &stup.isnull1); + } + } + puttuple_common(state, &stup); MemoryContextSwitchTo(oldcontext); @@ -1407,6 +1477,46 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple) } } +static bool +consider_abort_common(Tuplesortstate *state) +{ + Assert(state->sortKeys[0].abbrev_converter != NULL); + Assert(state->sortKeys[0].auth_comparator != NULL); + + /* + * Check effectiveness of abbreviation optimization. Consider aborting + * when still within memory limit. + */ + if (state->status == TSS_INITIAL && + state->memtupcount >= state->abbrevNext) + { + state->abbrevNext *= 2; + + /* + * Check opclass-supplied abbreviation abort routine. It may + * indicate that abbreviation should not proceed. + */ + if (!state->sortKeys->abbrev_abort(state->memtupcount, + state->abbrevRowsHint, + state->sortKeys)) + return false; + + /* + * Finally, restore authoritative comparator, and indicate that + * abbreviation is not in play by setting abbrev_converter to NULL + */ + state->sortKeys[0].comparator = state->sortKeys[0].auth_comparator; + state->sortKeys[0].abbrev_converter = NULL; + /* Not strictly necessary, but be tidy */ + state->sortKeys[0].auth_comparator = NULL; + + /* Give up - expect original pass-by-value representation */ + return true; + } + + return false; +} + /* * All tuples have been provided; finish the sort. */ @@ -2978,74 +3088,50 @@ copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup) state->tupDesc, &stup->isnull1); - if (!state->sortKeys->abbrev_converter) - { - /* Store ordinary Datum representation */ - stup->datum1 = original; - } - else + if (!state->sortKeys->abbrev_converter || stup->isnull1) { /* - * Store abbreviated key representation. + * Store ordinary Datum representation. * * Converter won't expect NULL value, and cost model is not required to * account for NULL, so only call converter for non-NULL values. */ - if (stup->isnull1) - stup->datum1 = original; - else - stup->datum1 = state->sortKeys->abbrev_converter(original, - state->sortKeys); + stup->datum1 = original; + } + else if (!consider_abort_common(state)) + { + /* Store abbreviated key representation */ + stup->datum1 = state->sortKeys->abbrev_converter(original, + state->sortKeys); + } + else + { + /* Abort abbreviation */ + int i; + + stup->datum1 = original; /* - * Check effectiveness of optimization. Consider aborting when - * still within memory limit. + * Set state to be consistent with never trying abbreviation. + * + * Alter datum1 representation in already-copied tuples, so as to + * ensure a consistent representation (current tuple was just handled). + * Note that we rely on all tuples copied so far actually being + * contained within memtuples array. */ - if (state->status == TSS_INITIAL && - state->memtupcount >= state->abbrevNext) + for (i = 0; i < state->memtupcount; i++) { - int i; - - state->abbrevNext *= 2; - - /* - * Check opclass-supplied abbreviation abort routine. It may - * indicate that abbreviation should not proceed. - */ - if (!state->sortKeys->abbrev_abort(state->memtupcount, - state->abbrevRowsHint, - state->sortKeys)) - return; - - /* Give up - expect original pass-by-value representation */ - stup->datum1 = original; - - /* - * Set state to be consistent with never trying abbreviation. - * - * Alter datum1 representation in already-copied tuples, so as to - * ensure a consistent representation (current tuple was just - * handled). - */ - for (i = 0; i < state->memtupcount; i++) - { - SortTuple *mtup = &state->memtuples[i]; - - htup.t_len = ((MinimalTuple) mtup->tuple)->t_len + MINIMAL_TUPLE_OFFSET; - htup.t_data = (HeapTupleHeader) ((char *) mtup->tuple - MINIMAL_TUPLE_OFFSET); + SortTuple *mtup = &state->memtuples[i]; - mtup->datum1 = heap_getattr(&htup, - state->sortKeys[0].ssup_attno, - state->tupDesc, - &mtup->isnull1); - } + htup.t_len = ((MinimalTuple) mtup->tuple)->t_len + + MINIMAL_TUPLE_OFFSET; + htup.t_data = (HeapTupleHeader) ((char *) mtup->tuple - + MINIMAL_TUPLE_OFFSET); - /* - * Finally, restore authoritative comparator, and indicate that - * abbreviation is not in play by setting abbrev_converter to NULL - */ - state->sortKeys[0].comparator = state->sortKeys[0].auth_comparator; - state->sortKeys[0].abbrev_converter = NULL; + mtup->datum1 = heap_getattr(&htup, + state->sortKeys[0].ssup_attno, + state->tupDesc, + &mtup->isnull1); } } } @@ -3117,13 +3203,35 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b, TupleDesc tupDesc; int nkey; int32 compare; + Datum datum1, + datum2; + bool isnull1, + isnull2; + AttrNumber leading = state->indexInfo->ii_KeyAttrNumbers[0]; + + /* Be prepared to compare additional sort keys */ + ltup = (HeapTuple) a->tuple; + rtup = (HeapTuple) b->tuple; + tupDesc = state->tupDesc; /* Compare the leading sort key, if it's simple */ - if (state->indexInfo->ii_KeyAttrNumbers[0] != 0) + if (leading != 0) { compare = ApplySortComparator(a->datum1, a->isnull1, b->datum1, b->isnull1, sortKey); + if (compare != 0) + return compare; + + if (sortKey->abbrev_converter) + { + datum1 = heap_getattr(ltup, leading, tupDesc, &isnull1); + datum2 = heap_getattr(rtup, leading, tupDesc, &isnull2); + + compare = ApplySortComparatorAuth(datum1, isnull1, + datum2, isnull2, + sortKey); + } if (compare != 0 || state->nKeys == 1) return compare; /* Compare additional columns the hard way */ @@ -3136,22 +3244,13 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b, nkey = 0; } - /* Compare additional sort keys */ - ltup = (HeapTuple) a->tuple; - rtup = (HeapTuple) b->tuple; - if (state->indexInfo->ii_Expressions == NULL) { /* If not expression index, just compare the proper heap attrs */ - tupDesc = state->tupDesc; for (; nkey < state->nKeys; nkey++, sortKey++) { AttrNumber attno = state->indexInfo->ii_KeyAttrNumbers[nkey]; - Datum datum1, - datum2; - bool isnull1, - isnull2; datum1 = heap_getattr(ltup, attno, tupDesc, &isnull1); datum2 = heap_getattr(rtup, attno, tupDesc, &isnull2); @@ -3209,17 +3308,66 @@ static void copytup_cluster(Tuplesortstate *state, SortTuple *stup, void *tup) { HeapTuple tuple = (HeapTuple) tup; + Datum original; /* copy the tuple into sort storage */ tuple = heap_copytuple(tuple); stup->tuple = (void *) tuple; USEMEM(state, GetMemoryChunkSpace(tuple)); - /* set up first-column key value, if it's a simple column */ - if (state->indexInfo->ii_KeyAttrNumbers[0] != 0) - stup->datum1 = heap_getattr(tuple, - state->indexInfo->ii_KeyAttrNumbers[0], - state->tupDesc, - &stup->isnull1); + /* + * set up first-column key value, and potentially abbreviate, if it's a + * simple column + */ + if (state->indexInfo->ii_KeyAttrNumbers[0] == 0) + return; + + original = heap_getattr(tuple, + state->indexInfo->ii_KeyAttrNumbers[0], + state->tupDesc, + &stup->isnull1); + + if (!state->sortKeys->abbrev_converter || stup->isnull1) + { + /* + * Store ordinary Datum representation. + * + * Converter won't expect NULL value, and cost model is not required to + * account for NULL, so only call converter for non-NULL values. + */ + stup->datum1 = original; + } + else if (!consider_abort_common(state)) + { + /* Store abbreviated key representation */ + stup->datum1 = state->sortKeys->abbrev_converter(original, + state->sortKeys); + } + else + { + /* Abort abbreviation */ + int i; + + stup->datum1 = original; + + /* + * Set state to be consistent with never trying abbreviation. + * + * Alter datum1 representation in already-copied tuples, so as to + * ensure a consistent representation (current tuple was just handled). + * Note that we rely on all tuples copied so far actually being + * contained within memtuples array. + */ + for (i = 0; i < state->memtupcount; i++) + { + SortTuple *mtup = &state->memtuples[i]; + + tuple = (HeapTuple) mtup->tuple; + mtup->datum1 = heap_getattr(tuple, + state->indexInfo->ii_KeyAttrNumbers[0], + state->tupDesc, + &stup->isnull1); + } + } } static void @@ -3300,6 +3448,11 @@ comparetup_index_btree(const SortTuple *a, const SortTuple *b, bool equal_hasnull = false; int nkey; int32 compare; + Datum datum1, + datum2; + bool isnull1, + isnull2; + /* Compare the leading sort key */ compare = ApplySortComparator(a->datum1, a->isnull1, @@ -3308,23 +3461,31 @@ comparetup_index_btree(const SortTuple *a, const SortTuple *b, if (compare != 0) return compare; - /* they are equal, so we only need to examine one null flag */ - if (a->isnull1) - equal_hasnull = true; - /* Compare additional sort keys */ tuple1 = (IndexTuple) a->tuple; tuple2 = (IndexTuple) b->tuple; keysz = state->nKeys; tupDes = RelationGetDescr(state->indexRel); + + if (sortKey->abbrev_converter) + { + datum1 = index_getattr(tuple1, 1, tupDes, &isnull1); + datum2 = index_getattr(tuple2, 1, tupDes, &isnull2); + + compare = ApplySortComparatorAuth(datum1, isnull1, + datum2, isnull2, + sortKey); + if (compare != 0) + return compare; + } + + /* they are equal, so we only need to examine one null flag */ + if (a->isnull1) + equal_hasnull = true; + sortKey++; for (nkey = 2; nkey <= keysz; nkey++, sortKey++) { - Datum datum1, - datum2; - bool isnull1, - isnull2; - datum1 = index_getattr(tuple1, nkey, tupDes, &isnull1); datum2 = index_getattr(tuple2, nkey, tupDes, &isnull2); @@ -3451,6 +3612,7 @@ copytup_index(Tuplesortstate *state, SortTuple *stup, void *tup) IndexTuple tuple = (IndexTuple) tup; unsigned int tuplen = IndexTupleSize(tuple); IndexTuple newtuple; + Datum original; /* copy the tuple into sort storage */ newtuple = (IndexTuple) palloc(tuplen); @@ -3458,10 +3620,53 @@ copytup_index(Tuplesortstate *state, SortTuple *stup, void *tup) USEMEM(state, GetMemoryChunkSpace(newtuple)); stup->tuple = (void *) newtuple; /* set up first-column key value */ - stup->datum1 = index_getattr(newtuple, - 1, - RelationGetDescr(state->indexRel), - &stup->isnull1); + original = index_getattr(newtuple, + 1, + RelationGetDescr(state->indexRel), + &stup->isnull1); + + if (!state->sortKeys->abbrev_converter || stup->isnull1) + { + /* + * Store ordinary Datum representation. + * + * Converter won't expect NULL value, and cost model is not required to + * account for NULL, so only call converter for non-NULL values. + */ + stup->datum1 = original; + } + else if (!consider_abort_common(state)) + { + /* Store abbreviated key representation */ + stup->datum1 = state->sortKeys->abbrev_converter(original, + state->sortKeys); + } + else + { + /* Abort abbreviation */ + int i; + + stup->datum1 = original; + + /* + * Set state to be consistent with never trying abbreviation. + * + * Alter datum1 representation in already-copied tuples, so as to + * ensure a consistent representation (current tuple was just handled). + * Note that we rely on all tuples copied so far actually being + * contained within memtuples array. + */ + for (i = 0; i < state->memtupcount; i++) + { + SortTuple *mtup = &state->memtuples[i]; + + tuple = (IndexTuple) mtup->tuple; + mtup->datum1 = index_getattr(tuple, + 1, + RelationGetDescr(state->indexRel), + &stup->isnull1); + } + } } static void -- 1.9.1
From e06f50e3415a24b82af21a32999521451dc1cc58 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <p...@heroku.com> Date: Sat, 11 Oct 2014 13:49:53 -0700 Subject: [PATCH 2/4] Fix PG_CACHE_LINE_SIZE bit rot --- src/backend/utils/adt/varlena.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 29407d9..6b26967 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -2083,9 +2083,9 @@ retry: * in order to compensate for cases where differences are past * CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing. */ - hash = hash_any((unsigned char *) tss->buf1, Min(len, CACHE_LINE_SIZE)); + hash = hash_any((unsigned char *) tss->buf1, Min(len, PG_CACHE_LINE_SIZE)); - if (len > CACHE_LINE_SIZE) + if (len > PG_CACHE_LINE_SIZE) hash ^= DatumGetUInt32(hash_uint32((uint32) len)); addHyperLogLog(&tss->full_card, hash); -- 1.9.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers