Tomas Vondra <tomas.von...@enterprisedb.com> writes: Hi Tomas, > > I took a quick look on this thread/patch today, so let me share a couple > initial thoughts. I may not have a particularly coherent/consistent > opinion on the patch or what would be a better way to do this yet, but > perhaps it'll start a discussion ...
Thank you for this! > > The goal of the patch (as I understand it) is essentially to cache > detoasted values, so that the value does not need to be detoasted > repeatedly in different parts of the plan. I think that's perfectly > sensible and worthwhile goal - detoasting is not cheap, and complex > plans may easily spend a lot of time on it. exactly. > > That being said, the approach seems somewhat invasive, and touching > parts I wouldn't have expected to need a change to implement this. For > example, I certainly would not have guessed the patch to need changes in > createplan.c or setrefs.c. > > Perhaps it really needs to do these things, but neither the thread nor > the comments are very enlightening as for why it's needed :-( In many > cases I can guess, but I'm not sure my guess is correct. And comments in > code generally describe what's happening locally / next line, not the > bigger picture & why it's happening. there were explaination at [1], but it probably is too high level. Writing a proper comments is challenging for me, but I am pretty happy to try more. At the end of this writing, I explained the data workflow, I am feeling that would be useful for reviewers. > IIUC we walk the plan to decide which Vars should be detoasted (and > cached) once, and which cases should not do that because it'd inflate > the amount of data we need to keep in a Sort, Hash etc. Exactly. > Not sure if > there's a better way to do this - it depends on what happens in the > upper parts of the plan, so we can't decide while building the paths. I'd say I did this intentionally. Deciding such things in paths will be more expensive than create_plan stage IMO. > But maybe we could decide this while transforming the paths into a plan? > (I realize the JIT thread nearby needs to do something like that in > create_plan, and in that one I suggested maybe walking the plan would be > a better approach, so I may be contradicting myself a little bit.). I think that's pretty similar what I'm doing now. Just that I did it *just after* the create_plan. This is because the create_plan doesn't transform the path to plan in the top->down manner all the time, the known exception is create_mergejoin_plan. so I have to walk just after the create_plan is done. In the create_mergejoin_plan, the Sort node is created *after* the subplan for the Sort is created. /* Recursively process the path tree, demanding the correct tlist result */ plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST); + /* + * After the plan tree is built completed, we start to walk for which + * expressions should not used the shared-detoast feature. + */ + set_plan_forbid_pre_detoast_vars_recurse(plan, NIL); > > In any case, set_plan_forbid_pre_detoast_vars_recurse should probably > explain the overall strategy / reasoning in a bit more detail. Maybe > it's somewhere in this thread, but that's not great for reviewers. a lession learnt, thanks. a revisted version of comments from the lastest patch. /* * set_plan_forbid_pre_detoast_vars_recurse * Walking the Plan tree in the top-down manner to gather the vars which * should be as small as possible and record them in Plan.forbid_pre_detoast_vars * * plan: the plan node to walk right now. * small_tlist: a list of nodes which its subplan should provide them as * small as possible. */ static void set_plan_forbid_pre_detoast_vars_recurse(Plan *plan, List *small_tlist) > > Similar for the setrefs.c changes. It seems a bit suspicious to piggy > back the new code into fix_scan_expr/fix_scan_list and similar code. > Those functions have a pretty clearly defined purpose, not sure we want > to also extend them to also deal with this new thing. (FWIW I'd 100%% > did it this way if I hacked on a PoC of this, to make it work. But I'm > not sure it's the right solution.) The main reason of doing so is because I want to share the same walk effort as fix_scan_expr. otherwise I have to walk the plan for every expression again. I thought this as a best practice in the past and thought we can treat the pre_detoast_attrs as a valuable side effects:( > I don't know what to thing about the Bitset - maybe it's necessary, but > how would I know? I don't have any way to measure the benefits, because > the 0002 patch uses it right away. a revisted version of comments from the latest patch. graph 2 explains this decision. /* * The attributes whose values are the detoasted version in tts_values[*], * if so these memory needs some extra clean-up. These memory can't be put * into ecxt_per_tuple_memory since many of them needs a longer life span, * for example the Datum in outer join. These memory is put into * TupleTableSlot.tts_mcxt and be clear whenever the tts_values[*] is * invalidated. * * Bitset rather than Bitmapset is chosen here because when all the members * of Bitmapset are deleted, the allocated memory will be deallocated * automatically, which is too expensive in this case since we need to * deleted all the members in each ExecClearTuple and repopulate it again * when fill the detoast datum to tts_values[*]. This situation will be run * again and again in an execution cycle. * * These values are populated by EEOP_{INNER/OUTER/SCAN}_VAR_TOAST steps. */ Bitset *pre_detoasted_attrs; > I think it should be done the other > way around, i.e. the patch should introduce the main feature first > (using the traditional Bitmapset), and then add Bitset on top of that. > That way we could easily measure the impact and see if it's useful. Acutally v4 used the Bitmapset, and then both perf and pgbench's tps indicate it is too expensive. and after talk with David at [2], I introduced bitset and use it here. the test case I used comes from [1]. IRCC, there were 5% performance difference because of this. create table w(a int, b numeric); insert into w select i, i from generate_series(1, 1000000)i; select b from w where b > 0; To reproduce the difference, we can replace the bitset_clear() with bitset_free(slot->pre_detoasted_attrs); slot->pre_detoasted_attrs = bitset_init(slot->tts_tupleDescriptor->natts); in ExecFreePreDetoastDatum. then it works same as Bitmapset. > On the whole, my biggest concern is memory usage & leaks. It's not > difficult to already have problems with large detoasted values, and if > we start keeping more of them, that may get worse. Or at least that's my > intuition - it can't really get better by keeping the values longer, right? > > The other thing is the risk of leaks (in the sense of keeping detoasted > values longer than expected). I see the values are allocated in > tts_mcxt, and maybe that's the right solution - not sure. about the memory usage, first it is kept as the same lifesplan as the tts_values[*] which can be released pretty quickly, only if the certain values of the tuples is not needed. it is true that we keep the detoast version longer than before, but that's something we have to pay I think. Leaks may happen since tts_mcxt is reset at the end of *executor*. So if we forget to release the memory when the tts_values[*] is invalidated somehow, the memory will be leaked until the end of executor. I think that will be enough to cause an issue. Currently besides I release such memory at the ExecClearTuple, I also relase such memory whenever we set tts_nvalid to 0, the theory used here is: /* * tts_values is treated invalidated since tts_nvalid is set to 0, so * let's free the pre-detoast datum. */ ExecFreePreDetoastDatum(slot); I will do more test on the memory leak stuff, since there are so many operation aginst slot like ExecCopySlot etc, I don't know how to test it fully. the method in my mind now is use TPCH with 10GB data size, and monitor the query runtime memory usage. > FWIW while looking at the patch, I couldn't help but to think about > expanded datums. There's similarity in what these two features do - keep > detoasted values for a while, so that we don't need to do the expensive > processing if we access them repeatedly. Could you provide some keyword or function names for the expanded datum here, I probably miss this. > Of course, expanded datums are > not meant to be long-lived, while "shared detoasted values" are meant to > exist (potentially) for the query duration. hmm, acutally the "shared detoast value" just live in the TupleTableSlot->tts_values[*], rather than the whole query duration. The simple case is: SELECT * FROM t WHERE a_text LIKE 'abc%'; when we scan to the next tuple, the detoast value for the previous tuple will be relased. > But maybe there's something > we could learn from expanded datums? For example how the varlena pointer > is leveraged to point to the expanded object. maybe. currently I just use detoast_attr to get the desired version. I'm pleasure if we have more effective way. if (!slot->tts_isnull[attnum] && VARATT_IS_EXTENDED(slot->tts_values[attnum])) { Datum oldDatum; MemoryContext old = MemoryContextSwitchTo(slot->tts_mcxt); oldDatum = slot->tts_values[attnum]; slot->tts_values[attnum] = PointerGetDatum(detoast_attr( (struct varlena *) oldDatum)); Assert(slot->tts_nvalid > attnum); Assert(oldDatum != slot->tts_values[attnum]); bitset_add_member(slot->pre_detoasted_attrs, attnum); MemoryContextSwitchTo(old); } > For example, what if we add a "TOAST cache" as a query-level hash table, > and modify the detoasting to first check the hash table (with the TOAST > pointer as a key)? It'd be fairly trivial to enforce a memory limit on > the hash table, evict values from it, etc. And it wouldn't require any > of the createplan/setrefs changes, I think ... Hmm, I am not sure I understand you correctly at this part. In the current patch, to avoid the run-time (ExecExprInterp) check if we should detoast and save the datum, I defined 3 extra steps so that the *extra check itself* is not needed for unnecessary attributes. for example an datum for int or a detoast datum should not be saved back to tts_values[*] due to the small_tlist reason. However these steps can be generated is based on the output of createplan/setrefs changes. take the INNER_VAR for example: In ExecInitExprRec: switch (variable->varno) { case INNER_VAR: if (is_join_plan(plan) && bms_is_member(attnum, ((JoinState *) state->parent)->inner_pre_detoast_attrs)) { scratch.opcode = EEOP_INNER_VAR_TOAST; } else { scratch.opcode = EEOP_INNER_VAR; } } The data workflow is: 1. set_plan_forbid_pre_detoast_vars_recurse (in the createplan.c) decides which Vars should *not* be pre_detoasted because of small_tlist reason and record it in Plan.forbid_pre_detoast_vars. 2. fix_scan_expr (in the setrefs.c) tracks which Vars should be detoasted for the specific plan node and record them in it. Currently only Scan and Join nodes support this feature. typedef struct Scan { ... /* * Records of var's varattno - 1 where the Var is accessed indirectly by * any expression, like a > 3. However a IS [NOT] NULL is not included * since it doesn't access the tts_values[*] at all. * * This is a essential information to figure out which attrs should use * the pre-detoast-attrs logic. */ Bitmapset *reference_attrs; } Scan; typedef struct Join { .. /* * Records of var's varattno - 1 where the Var is accessed indirectly by * any expression, like a > 3. However a IS [NOT] NULL is not included * since it doesn't access the tts_values[*] at all. * * This is a essential information to figure out which attrs should use * the pre-detoast-attrs logic. */ Bitmapset *outer_reference_attrs; Bitmapset *inner_reference_attrs; } Join; 3. during the InitPlan stage, we maintain the PlanState.xxx_pre_detoast_attrs and generated different StepOp for them. 4. At the ExecExprInterp stage, only the new StepOp do the extra check to see if the detoast should happen. Other steps doesn't need this check at all. If we avoid the createplan/setref.c changes, probabaly some unrelated StepOp needs the extra check as well? When I worked with the UniqueKey feature, I maintained a UniqueKey.README to summaried all the dicussed topics in threads, the README is designed to save the effort for more reviewer, I think I should apply the same logic for this feature. Thank you very much for your feedback! v7 attached, just some comments and Assert changes. [1] https://www.postgresql.org/message-id/87il4jrk1l.fsf%40163.com [2] https://www.postgresql.org/message-id/CAApHDvpdp9LyAoMXvS7iCX-t3VonQM3fTWCmhconEvORrQ%2BZYA%40mail.gmail.com -- Best Regards Andy Fan
>From f2e7772228e8a18027b9c29f10caba9c6570d934 Mon Sep 17 00:00:00 2001 From: "yizhi.fzh" <yizhi....@alibaba-inc.com> Date: Tue, 20 Feb 2024 11:11:53 +0800 Subject: [PATCH v7 1/2] Introduce a Bitset data struct. While Bitmapset is designed for variable-length of bits, Bitset is designed for fixed-length of bits, the fixed length must be specified at the bitset_init stage and keep unchanged at the whole lifespan. Because of this, some operations on Bitset is simpler than Bitmapset. The bitset_clear unsets all the bits but kept the allocated memory, this capacity is impossible for bit Bitmapset for some solid reasons and this is the main reason to add this data struct. [1] https://postgr.es/m/CAApHDvpdp9LyAoMXvS7iCX-t3VonQM3fTWCmhconEvORrQ%2BZYA%40mail.gmail.com [2] https://postgr.es/m/875xzqxbv5.fsf%40163.com --- src/backend/nodes/bitmapset.c | 200 +++++++++++++++++- src/backend/nodes/outfuncs.c | 51 +++++ src/include/nodes/bitmapset.h | 28 +++ src/include/nodes/nodes.h | 4 + src/test/modules/test_misc/Makefile | 11 + src/test/modules/test_misc/README | 4 +- .../test_misc/expected/test_bitset.out | 7 + src/test/modules/test_misc/meson.build | 17 ++ .../modules/test_misc/sql/test_bitset.sql | 3 + src/test/modules/test_misc/test_misc--1.0.sql | 5 + src/test/modules/test_misc/test_misc.c | 118 +++++++++++ src/test/modules/test_misc/test_misc.control | 4 + src/tools/pgindent/typedefs.list | 1 + 13 files changed, 441 insertions(+), 12 deletions(-) create mode 100644 src/test/modules/test_misc/expected/test_bitset.out create mode 100644 src/test/modules/test_misc/sql/test_bitset.sql create mode 100644 src/test/modules/test_misc/test_misc--1.0.sql create mode 100644 src/test/modules/test_misc/test_misc.c create mode 100644 src/test/modules/test_misc/test_misc.control diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c index 65805d4527..40cfea2308 100644 --- a/src/backend/nodes/bitmapset.c +++ b/src/backend/nodes/bitmapset.c @@ -1315,23 +1315,18 @@ bms_join(Bitmapset *a, Bitmapset *b) * It makes no difference in simple loop usage, but complex iteration logic * might need such an ability. */ -int -bms_next_member(const Bitmapset *a, int prevbit) + +static int +bms_next_member_internal(int nwords, const bitmapword *words, int prevbit) { - int nwords; int wordnum; bitmapword mask; - Assert(bms_is_valid_set(a)); - - if (a == NULL) - return -2; - nwords = a->nwords; prevbit++; mask = (~(bitmapword) 0) << BITNUM(prevbit); for (wordnum = WORDNUM(prevbit); wordnum < nwords; wordnum++) { - bitmapword w = a->words[wordnum]; + bitmapword w = words[wordnum]; /* ignore bits before prevbit */ w &= mask; @@ -1351,6 +1346,19 @@ bms_next_member(const Bitmapset *a, int prevbit) return -2; } +int +bms_next_member(const Bitmapset *a, int prevbit) +{ + Assert(a == NULL || IsA(a, Bitmapset)); + + Assert(bms_is_valid_set(a)); + + if (a == NULL) + return -2; + + return bms_next_member_internal(a->nwords, a->words, prevbit); +} + /* * bms_prev_member - find prev member of a set * @@ -1458,3 +1466,177 @@ bitmap_match(const void *key1, const void *key2, Size keysize) return !bms_equal(*((const Bitmapset *const *) key1), *((const Bitmapset *const *) key2)); } + +/* + * bitset_init - create a Bitset. the set will be round up to nwords; + */ +Bitset * +bitset_init(size_t size) +{ + int nword = (size + BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD; + Bitset *result; + + if (size == 0) + return NULL; + + result = (Bitset *) palloc0(sizeof(Bitset) + nword * sizeof(bitmapword)); + result->nwords = nword; + + return result; +} + +/* + * bitset_clear - clear the bits only, but the memory is still there. + */ +void +bitset_clear(Bitset *a) +{ + if (a != NULL) + memset(a->words, 0, sizeof(bitmapword) * a->nwords); +} + +void +bitset_free(Bitset *a) +{ + if (a != NULL) + pfree(a); +} + +bool +bitset_is_empty(Bitset *a) +{ + int i; + + if (a == NULL) + return true; + + for (i = 0; i < a->nwords; i++) + { + bitmapword w = a->words[i]; + + if (w != 0) + return false; + } + + return true; +} + +Bitset * +bitset_copy(Bitset *a) +{ + Bitset *result; + + if (a == NULL) + return NULL; + + result = bitset_init(a->nwords * BITS_PER_BITMAPWORD); + + memcpy(result->words, a->words, sizeof(bitmapword) * a->nwords); + return result; +} + +void +bitset_add_member(Bitset *a, int x) +{ + int wordnum, + bitnum; + + Assert(x >= 0); + + wordnum = WORDNUM(x); + bitnum = BITNUM(x); + + Assert(wordnum < a->nwords); + + a->words[wordnum] |= ((bitmapword) 1 << bitnum); +} + +void +bitset_del_member(Bitset *a, int x) +{ + int wordnum, + bitnum; + + Assert(x >= 0); + + wordnum = WORDNUM(x); + bitnum = BITNUM(x); + + Assert(wordnum < a->nwords); + + a->words[wordnum] &= ~((bitmapword) 1 << bitnum); +} + +int +bitset_is_member(int x, Bitset *a) +{ + int wordnum, + bitnum; + + /* used in expression engine */ + Assert(x >= 0); + + wordnum = WORDNUM(x); + bitnum = BITNUM(x); + + if (a == NULL) + return false; + + if (wordnum >= a->nwords) + return false; + + return (a->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0; +} + +int +bitset_next_member(const Bitset *a, int prevbit) +{ + if (a == NULL) + return -2; + + return bms_next_member_internal(a->nwords, a->words, prevbit); +} + + +/* + * bitset_to_bitmap - build a legal bitmapset from bitset. + */ +Bitmapset * +bitset_to_bitmap(Bitset *a) +{ + int n; + + bool found = false; /* any non-empty bits */ + Bitmapset *result; + int i; + + if (a == NULL) + return NULL; + + n = a->nwords - 1; + do + { + if (a->words[n] > 0) + { + found = true; + break; + } + } while (--n >= 0); + + if (!found) + return NULL; + + result = (Bitmapset *) palloc0(BITMAPSET_SIZE(n + 1)); + result->type = T_Bitmapset; + result->nwords = n + 1; + + Assert(result->nwords <= a->nwords); + + i = 0; + do + { + result->words[i] = a->words[i]; + } while (++i < result->nwords); + + return result; +} diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 25171864db..e144b62418 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -331,6 +331,43 @@ outBitmapset(StringInfo str, const Bitmapset *bms) appendStringInfoChar(str, ')'); } + + +/* + * outBitset - + * similar to outBitmapset, but for Bitset. + */ +static void +outBitsetInternal(struct StringInfoData *str, + const struct Bitset *bs, + bool asBitmap) +{ + int x; + + appendStringInfoChar(str, '('); + if (asBitmap) + appendStringInfoChar(str, 'b'); + else + appendStringInfo(str, "bs"); + x = -1; + while ((x = bitset_next_member(bs, x)) >= 0) + appendStringInfo(str, " %d", x); + appendStringInfoChar(str, ')'); +} + + +/* + * outBitset - + * similar to outBitmapset, but for Bitset. + */ +void +outBitset(struct StringInfoData *str, + const struct Bitset *bs) +{ + outBitsetInternal(str, bs, false); +} + + /* * Print the value of a Datum given its type. */ @@ -911,3 +948,17 @@ bmsToString(const Bitmapset *bms) outBitmapset(&str, bms); return str.data; } + +/* + * bitsetToString - + * similar to bmsToString, but for Bitset + */ +char * +bitsetToString(const struct Bitset *bs, bool asBitmap) +{ + StringInfoData str; + + initStringInfo(&str); + outBitsetInternal(&str, bs, asBitmap); + return str.data; +} diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h index 906e8dcc15..95ff37c6e9 100644 --- a/src/include/nodes/bitmapset.h +++ b/src/include/nodes/bitmapset.h @@ -55,6 +55,24 @@ typedef struct Bitmapset bitmapword words[FLEXIBLE_ARRAY_MEMBER]; /* really [nwords] */ } Bitmapset; +/* + * While Bitmapset is designed for variable-length of bits, Bitset is + * designed for fixed-length of bits, the fixed length must be specified at + * the bitset_init stage and keep unchanged at the whole lifespan. Because + * of this, some operations on Bitset is simpler than Bitmapset. + * + * The bitset_clear unsets all the bits but kept the allocated memory, this + * capacity is impossible for bit Bitmapset for some solid reasons. + * + * Also for performance aspect, the functions for Bitset removed some + * unlikely checks, instead with some Asserts. + */ + +typedef struct Bitset +{ + int nwords; /* number of words in array */ + bitmapword words[FLEXIBLE_ARRAY_MEMBER]; /* really [nwords] */ +} Bitset; /* result of bms_subset_compare */ typedef enum @@ -124,4 +142,14 @@ extern uint32 bms_hash_value(const Bitmapset *a); extern uint32 bitmap_hash(const void *key, Size keysize); extern int bitmap_match(const void *key1, const void *key2, Size keysize); +extern Bitset *bitset_init(size_t size); +extern void bitset_clear(Bitset *a); +extern void bitset_free(Bitset *a); +extern bool bitset_is_empty(Bitset *a); +extern Bitset *bitset_copy(Bitset *a); +extern void bitset_add_member(Bitset *a, int x); +extern void bitset_del_member(Bitset *a, int x); +extern int bitset_is_member(int bit, Bitset *a); +extern int bitset_next_member(const Bitset *a, int prevbit); +extern Bitmapset *bitset_to_bitmap(Bitset *a); #endif /* BITMAPSET_H */ diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 2969dd831b..4d13107990 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -186,16 +186,20 @@ castNodeImpl(NodeTag type, void *ptr) * nodes/{outfuncs.c,print.c} */ struct Bitmapset; /* not to include bitmapset.h here */ +struct Bitset; /* not to include bitmapset.h here */ struct StringInfoData; /* not to include stringinfo.h here */ extern void outNode(struct StringInfoData *str, const void *obj); extern void outToken(struct StringInfoData *str, const char *s); extern void outBitmapset(struct StringInfoData *str, const struct Bitmapset *bms); +extern void outBitset(struct StringInfoData *str, const struct Bitset *bs); + extern void outDatum(struct StringInfoData *str, uintptr_t value, int typlen, bool typbyval); extern char *nodeToString(const void *obj); extern char *bmsToString(const struct Bitmapset *bms); +extern char *bitsetToString(const struct Bitset *bs, bool asBitmap); /* * nodes/{readfuncs.c,read.c} diff --git a/src/test/modules/test_misc/Makefile b/src/test/modules/test_misc/Makefile index 39c6c2014a..af96604096 100644 --- a/src/test/modules/test_misc/Makefile +++ b/src/test/modules/test_misc/Makefile @@ -2,6 +2,17 @@ TAP_TESTS = 1 +MODULE_big = test_misc +OBJS = \ + $(WIN32RES) \ + test_misc.o +PGFILEDESC = "test_misc" + +EXTENSION = test_misc +DATA = test_misc--1.0.sql + +REGRESS = test_bitset + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/src/test/modules/test_misc/README b/src/test/modules/test_misc/README index 4876733fa2..ec426c4ad5 100644 --- a/src/test/modules/test_misc/README +++ b/src/test/modules/test_misc/README @@ -1,4 +1,2 @@ -This directory doesn't actually contain any extension module. - -What it is is a home for otherwise-unclassified TAP tests that exercise core +What it is is a home for otherwise-unclassified tests that exercise core server features. We might equally well have called it, say, src/test/misc. diff --git a/src/test/modules/test_misc/expected/test_bitset.out b/src/test/modules/test_misc/expected/test_bitset.out new file mode 100644 index 0000000000..3d0302d30d --- /dev/null +++ b/src/test/modules/test_misc/expected/test_bitset.out @@ -0,0 +1,7 @@ +CREATE EXTENSION test_misc; +SELECT test_bitset(); + test_bitset +------------- + +(1 row) + diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build index 964d95db26..a23f3e3f47 100644 --- a/src/test/modules/test_misc/meson.build +++ b/src/test/modules/test_misc/meson.build @@ -1,5 +1,22 @@ # Copyright (c) 2022-2024, PostgreSQL Global Development Group +test_misc_sources = files( + 'test_misc.c', +) + +if host_system == 'windows' + test_misc_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_misc', + '--FILEDESC', 'test_misc - ',]) +endif + +test_misc = shared_module('test_misc', + test_misc_sources, + kwargs: pg_test_mod_args, +) + +test_install_libs += test_misc + tests += { 'name': 'test_misc', 'sd': meson.current_source_dir(), diff --git a/src/test/modules/test_misc/sql/test_bitset.sql b/src/test/modules/test_misc/sql/test_bitset.sql new file mode 100644 index 0000000000..0f73bbf532 --- /dev/null +++ b/src/test/modules/test_misc/sql/test_bitset.sql @@ -0,0 +1,3 @@ +CREATE EXTENSION test_misc; + +SELECT test_bitset(); diff --git a/src/test/modules/test_misc/test_misc--1.0.sql b/src/test/modules/test_misc/test_misc--1.0.sql new file mode 100644 index 0000000000..79afaa6263 --- /dev/null +++ b/src/test/modules/test_misc/test_misc--1.0.sql @@ -0,0 +1,5 @@ +\echo Use "CREATE EXTENSION test_misc" to load this file. \quit + +CREATE FUNCTION test_bitset() + RETURNS pg_catalog.void + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_misc/test_misc.c b/src/test/modules/test_misc/test_misc.c new file mode 100644 index 0000000000..70d0255ada --- /dev/null +++ b/src/test/modules/test_misc/test_misc.c @@ -0,0 +1,118 @@ +/*-------------------------------------------------------------------------- + * + * test_misc.c + * + * Copyright (c) 2022-2024, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_dsa/test_misc.c + * + * ------------------------------------------------------------------------- + */ +#include "postgres.h" +#include "fmgr.h" +#include "nodes/bitmapset.h" +#include "nodes/nodes.h" +#define BIT_ADD 0 +#define BIT_DEL 1 +#ifdef USE_ASSERT_CHECKING +static void compare_bms_bs(Bitmapset **bms, Bitset *bs, int member, int op); +#endif +PG_MODULE_MAGIC; +/* Test basic DSA functionality */ +PG_FUNCTION_INFO_V1(test_bitset); +Datum +test_bitset(PG_FUNCTION_ARGS) +{ +#ifdef USE_ASSERT_CHECKING + Bitset *bs; + Bitset *bs2; + char *str1, + *str2, + *empty_str; + Bitmapset *bms = NULL; + int i; + + empty_str = bmsToString(NULL); + /* size = 0 */ + bs = bitset_init(0); + Assert(bs == NULL); + bitset_clear(bs); + Assert(bitset_is_empty(bs)); + /* bitset_add_member(bs, 0); // crash. */ + /* bitset_del_member(bs, 0); // crash. */ + Assert(!bitset_is_member(0, bs)); + Assert(bitset_next_member(bs, -1) == -2); + bs2 = bitset_copy(bs); + Assert(bs2 == NULL); + bitset_free(bs); + bitset_free(bs2); + /* size == 68, nword == 2 */ + bs = bitset_init(68); + for (i = 0; i < 68; i = i + 3) + { + compare_bms_bs(&bms, bs, i, BIT_ADD); + } + Assert(!bitset_is_empty(bs)); + for (i = 0; i < 68; i = i + 3) + { + compare_bms_bs(&bms, bs, i, BIT_DEL); + } + Assert(bitset_is_empty(bs)); + bitset_clear(bs); + str1 = bitsetToString(bs, true); + Assert(strcmp(str1, empty_str) == 0); + bms = bitset_to_bitmap(bs); + str2 = bmsToString(bms); + Assert(strcmp(str1, str2) == 0); + bms = bitset_to_bitmap(NULL); + Assert(strcmp(bmsToString(bms), empty_str) == 0); + bitset_free(bs); +#endif + PG_RETURN_VOID(); +} +#ifdef USE_ASSERT_CHECKING +static void +compare_bms_bs(Bitmapset **bms, Bitset *bs, int member, int op) +{ + char *str1, + *str2, + *str3, + *str4; + Bitmapset *bms3; + Bitset *bs4; + + if (op == BIT_ADD) + { + *bms = bms_add_member(*bms, member); + bitset_add_member(bs, member); + Assert(bms_is_member(member, *bms)); + Assert(bitset_is_member(member, bs)); + } + else if (op == BIT_DEL) + { + *bms = bms_del_member(*bms, member); + bitset_del_member(bs, member); + Assert(!bms_is_member(member, *bms)); + Assert(!bitset_is_member(member, bs)); + } + else + Assert(false); + /* compare the rest existing bit */ + str1 = bmsToString(*bms); + str2 = bitsetToString(bs, true); + Assert(strcmp(str1, str2) == 0); + /* test bitset_to_bitmap */ + bms3 = bitset_to_bitmap(bs); + str3 = bmsToString(bms3); + Assert(strcmp(str3, str2) == 0); + /* test bitset_copy */ + bs4 = bitset_copy(bs); + str4 = bitsetToString(bs4, true); + Assert(strcmp(str3, str4) == 0); + pfree(str1); + pfree(str2); + pfree(str3); + pfree(str4); +} +#endif diff --git a/src/test/modules/test_misc/test_misc.control b/src/test/modules/test_misc/test_misc.control new file mode 100644 index 0000000000..48fd08758f --- /dev/null +++ b/src/test/modules/test_misc/test_misc.control @@ -0,0 +1,4 @@ +comment = 'Test misc' +default_version = '1.0' +module_pathname = '$libdir/test_misc' +relocatable = true diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index d808aad8b0..dcba329ee4 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -268,6 +268,7 @@ BitmapOr BitmapOrPath BitmapOrState Bitmapset +Bitset Block BlockId BlockIdData -- 2.34.1
>From d37b53cdaeb41989ff53e81da33cf15f5f08a450 Mon Sep 17 00:00:00 2001 From: "yizhi.fzh" <yizhi....@alibaba-inc.com> Date: Tue, 20 Feb 2024 14:16:10 +0800 Subject: [PATCH v7 2/2] Shared detoast feature. details at https://postgr.es/m/87il4jrk1l.fsf%40163.com --- src/backend/executor/execExpr.c | 35 +- src/backend/executor/execExprInterp.c | 180 ++++++++ src/backend/executor/execTuples.c | 134 ++++++ src/backend/executor/execUtils.c | 2 + src/backend/executor/nodeHashjoin.c | 2 + src/backend/executor/nodeMergejoin.c | 2 + src/backend/executor/nodeNestloop.c | 1 + src/backend/jit/llvm/llvmjit_expr.c | 26 +- src/backend/jit/llvm/llvmjit_types.c | 1 + src/backend/optimizer/plan/createplan.c | 107 ++++- src/backend/optimizer/plan/setrefs.c | 551 +++++++++++++++++++----- src/include/executor/execExpr.h | 12 + src/include/executor/tuptable.h | 67 +++ src/include/nodes/execnodes.h | 14 + src/include/nodes/plannodes.h | 53 +++ src/tools/pgindent/typedefs.list | 2 + 16 files changed, 1083 insertions(+), 106 deletions(-) diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 3181b1136a..779fcfaab1 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -932,22 +932,51 @@ ExecInitExprRec(Expr *node, ExprState *state, } else { + int attnum; + Plan *plan = state->parent ? state->parent->plan : NULL; + /* regular user column */ scratch.d.var.attnum = variable->varattno - 1; scratch.d.var.vartype = variable->vartype; + attnum = scratch.d.var.attnum; + switch (variable->varno) { case INNER_VAR: - scratch.opcode = EEOP_INNER_VAR; + + if (is_join_plan(plan) && + bms_is_member(attnum, + ((JoinState *) state->parent)->inner_pre_detoast_attrs)) + { + scratch.opcode = EEOP_INNER_VAR_TOAST; + } + else + { + scratch.opcode = EEOP_INNER_VAR; + } break; case OUTER_VAR: - scratch.opcode = EEOP_OUTER_VAR; + if (is_join_plan(plan) && + bms_is_member(attnum, + ((JoinState *) state->parent)->outer_pre_detoast_attrs)) + { + scratch.opcode = EEOP_OUTER_VAR_TOAST; + } + else + scratch.opcode = EEOP_OUTER_VAR; break; /* INDEX_VAR is handled by default case */ default: - scratch.opcode = EEOP_SCAN_VAR; + if (is_scan_plan(plan) && bms_is_member( + attnum, + ((ScanState *) state->parent)->scan_pre_detoast_attrs)) + { + scratch.opcode = EEOP_SCAN_VAR_TOAST; + } + else + scratch.opcode = EEOP_SCAN_VAR; break; } } diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 3f20f1dd31..878235ae76 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -57,6 +57,7 @@ #include "postgres.h" #include "access/heaptoast.h" +#include "access/detoast.h" #include "catalog/pg_type.h" #include "commands/sequence.h" #include "executor/execExpr.h" @@ -158,6 +159,9 @@ static void ExecEvalRowNullInt(ExprState *state, ExprEvalStep *op, static Datum ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull); +static Datum ExecJustInnerVarToast(ExprState *state, ExprContext *econtext, bool *isnull); +static Datum ExecJustOuterVarToast(ExprState *state, ExprContext *econtext, bool *isnull); +static Datum ExecJustScanVarToast(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustAssignInnerVar(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustAssignOuterVar(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustAssignScanVar(ExprState *state, ExprContext *econtext, bool *isnull); @@ -166,6 +170,9 @@ static Datum ExecJustConst(ExprState *state, ExprContext *econtext, bool *isnull static Datum ExecJustInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull); +static Datum ExecJustInnerVarVirtToast(ExprState *state, ExprContext *econtext, bool *isnull); +static Datum ExecJustOuterVarVirtToast(ExprState *state, ExprContext *econtext, bool *isnull); +static Datum ExecJustScanVarVirtToast(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustAssignInnerVarVirt(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustAssignOuterVarVirt(ExprState *state, ExprContext *econtext, bool *isnull); static Datum ExecJustAssignScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull); @@ -181,6 +188,43 @@ static pg_attribute_always_inline void ExecAggPlainTransByRef(AggState *aggstate AggStatePerGroup pergroup, ExprContext *aggcontext, int setno); +static inline void +ExecSlotDetoastDatum(TupleTableSlot *slot, int attnum) +{ + if (!slot->tts_isnull[attnum] && + VARATT_IS_EXTENDED(slot->tts_values[attnum])) + { + Datum oldDatum; + MemoryContext old = MemoryContextSwitchTo(slot->tts_mcxt); + + oldDatum = slot->tts_values[attnum]; + slot->tts_values[attnum] = PointerGetDatum(detoast_attr( + (struct varlena *) oldDatum)); + Assert(slot->tts_nvalid > attnum); + Assert(oldDatum != slot->tts_values[attnum]); + bitset_add_member(slot->pre_detoasted_attrs, attnum); + MemoryContextSwitchTo(old); + } +} + +/* JIT requires a non-static (and external?) function */ +void +ExecSlotDetoastDatumExternal(TupleTableSlot *slot, int attnum) +{ + return ExecSlotDetoastDatum(slot, attnum); +} + + +static inline void +ExecEvalToastVar(TupleTableSlot *slot, + ExprEvalStep *op, + int attnum) +{ + ExecSlotDetoastDatum(slot, attnum); + + *op->resvalue = slot->tts_values[attnum]; + *op->resnull = slot->tts_isnull[attnum]; +} /* * ScalarArrayOpExprHashEntry @@ -296,6 +340,24 @@ ExecReadyInterpretedExpr(ExprState *state) state->evalfunc_private = (void *) ExecJustScanVar; return; } + if (step0 == EEOP_INNER_FETCHSOME && + step1 == EEOP_INNER_VAR_TOAST) + { + state->evalfunc_private = (void *) ExecJustInnerVarToast; + return; + } + else if (step0 == EEOP_OUTER_FETCHSOME && + step1 == EEOP_OUTER_VAR_TOAST) + { + state->evalfunc_private = (void *) ExecJustOuterVarToast; + return; + } + else if (step0 == EEOP_SCAN_FETCHSOME && + step1 == EEOP_SCAN_VAR_TOAST) + { + state->evalfunc_private = (void *) ExecJustScanVarToast; + return; + } else if (step0 == EEOP_INNER_FETCHSOME && step1 == EEOP_ASSIGN_INNER_VAR) { @@ -346,6 +408,21 @@ ExecReadyInterpretedExpr(ExprState *state) state->evalfunc_private = (void *) ExecJustScanVarVirt; return; } + else if (step0 == EEOP_INNER_VAR_TOAST) + { + state->evalfunc_private = (void *) ExecJustInnerVarVirtToast; + return; + } + else if (step0 == EEOP_OUTER_VAR_TOAST) + { + state->evalfunc_private = (void *) ExecJustOuterVarVirtToast; + return; + } + else if (step0 == EEOP_SCAN_VAR_TOAST) + { + state->evalfunc_private = (void *) ExecJustScanVarVirtToast; + return; + } else if (step0 == EEOP_ASSIGN_INNER_VAR) { state->evalfunc_private = (void *) ExecJustAssignInnerVarVirt; @@ -413,6 +490,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) &&CASE_EEOP_INNER_VAR, &&CASE_EEOP_OUTER_VAR, &&CASE_EEOP_SCAN_VAR, + &&CASE_EEOP_INNER_VAR_TOAST, + &&CASE_EEOP_OUTER_VAR_TOAST, + &&CASE_EEOP_SCAN_VAR_TOAST, &&CASE_EEOP_INNER_SYSVAR, &&CASE_EEOP_OUTER_SYSVAR, &&CASE_EEOP_SCAN_SYSVAR, @@ -597,6 +677,25 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) Assert(attnum >= 0 && attnum < scanslot->tts_nvalid); *op->resvalue = scanslot->tts_values[attnum]; *op->resnull = scanslot->tts_isnull[attnum]; + EEO_NEXT(); + } + + EEO_CASE(EEOP_INNER_VAR_TOAST) + { + ExecEvalToastVar(innerslot, op, op->d.var.attnum); + EEO_NEXT(); + } + + EEO_CASE(EEOP_OUTER_VAR_TOAST) + { + ExecEvalToastVar(outerslot, op, op->d.var.attnum); + + EEO_NEXT(); + } + + EEO_CASE(EEOP_SCAN_VAR_TOAST) + { + ExecEvalToastVar(scanslot, op, op->d.var.attnum); EEO_NEXT(); } @@ -2137,6 +2236,42 @@ ExecJustScanVar(ExprState *state, ExprContext *econtext, bool *isnull) return ExecJustVarImpl(state, econtext->ecxt_scantuple, isnull); } +static pg_attribute_always_inline Datum +ExecJustVarImplToast(ExprState *state, TupleTableSlot *slot, bool *isnull) +{ + ExprEvalStep *op = &state->steps[1]; + int attnum = op->d.var.attnum; + + CheckOpSlotCompatibility(&state->steps[0], slot); + + slot_getattr(slot, attnum + 1, isnull); + + ExecSlotDetoastDatum(slot, attnum); + + return slot->tts_values[attnum]; +} + +/* Simple reference to inner Var */ +static Datum +ExecJustInnerVarToast(ExprState *state, ExprContext *econtext, bool *isnull) +{ + return ExecJustVarImplToast(state, econtext->ecxt_innertuple, isnull); +} + +/* Simple reference to outer Var */ +static Datum +ExecJustOuterVarToast(ExprState *state, ExprContext *econtext, bool *isnull) +{ + return ExecJustVarImplToast(state, econtext->ecxt_outertuple, isnull); +} + +/* Simple reference to scan Var */ +static Datum +ExecJustScanVarToast(ExprState *state, ExprContext *econtext, bool *isnull) +{ + return ExecJustVarImplToast(state, econtext->ecxt_scantuple, isnull); +} + /* implementation of ExecJustAssign(Inner|Outer|Scan)Var */ static pg_attribute_always_inline Datum ExecJustAssignVarImpl(ExprState *state, TupleTableSlot *inslot, bool *isnull) @@ -2275,6 +2410,51 @@ ExecJustScanVarVirt(ExprState *state, ExprContext *econtext, bool *isnull) return ExecJustVarVirtImpl(state, econtext->ecxt_scantuple, isnull); } +/* implementation of ExecJust(Inner|Outer|Scan)VarVirt */ +static pg_attribute_always_inline Datum +ExecJustVarVirtImplToast(ExprState *state, TupleTableSlot *slot, bool *isnull) +{ + ExprEvalStep *op = &state->steps[0]; + int attnum = op->d.var.attnum; + + /* + * As it is guaranteed that a virtual slot is used, there never is a need + * to perform tuple deforming (nor would it be possible). Therefore + * execExpr.c has not emitted an EEOP_*_FETCHSOME step. Verify, as much as + * possible, that that determination was accurate. + */ + Assert(TTS_IS_VIRTUAL(slot)); + Assert(TTS_FIXED(slot)); + Assert(attnum >= 0 && attnum < slot->tts_nvalid); + + *isnull = slot->tts_isnull[attnum]; + + ExecSlotDetoastDatum(slot, attnum); + + return slot->tts_values[attnum]; +} + +/* Like ExecJustInnerVar, optimized for virtual slots */ +static Datum +ExecJustInnerVarVirtToast(ExprState *state, ExprContext *econtext, bool *isnull) +{ + return ExecJustVarVirtImplToast(state, econtext->ecxt_innertuple, isnull); +} + +/* Like ExecJustOuterVar, optimized for virtual slots */ +static Datum +ExecJustOuterVarVirtToast(ExprState *state, ExprContext *econtext, bool *isnull) +{ + return ExecJustVarVirtImplToast(state, econtext->ecxt_outertuple, isnull); +} + +/* Like ExecJustScanVar, optimized for virtual slots */ +static Datum +ExecJustScanVarVirtToast(ExprState *state, ExprContext *econtext, bool *isnull) +{ + return ExecJustVarVirtImplToast(state, econtext->ecxt_scantuple, isnull); +} + /* implementation of ExecJustAssign(Inner|Outer|Scan)VarVirt */ static pg_attribute_always_inline Datum ExecJustAssignVarVirtImpl(ExprState *state, TupleTableSlot *inslot, bool *isnull) diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c index a7aa2ee02b..830e1d4ea6 100644 --- a/src/backend/executor/execTuples.c +++ b/src/backend/executor/execTuples.c @@ -79,6 +79,9 @@ static inline void tts_buffer_heap_store_tuple(TupleTableSlot *slot, bool transfer_pin); static void tts_heap_store_tuple(TupleTableSlot *slot, HeapTuple tuple, bool shouldFree); +static Bitmapset *cal_final_pre_detoast_attrs(Bitmapset *reference_attrs, + TupleDesc tupleDesc, + List *forbid_pre_detoast_vars); const TupleTableSlotOps TTSOpsVirtual; const TupleTableSlotOps TTSOpsHeapTuple; @@ -176,6 +179,10 @@ tts_virtual_materialize(TupleTableSlot *slot) if (att->attbyval || slot->tts_isnull[natt]) continue; + if (bitset_is_member(natt, slot->pre_detoasted_attrs)) + /* it has been in slot->tts_mcxt already. */ + continue; + val = slot->tts_values[natt]; if (att->attlen == -1 && @@ -392,6 +399,13 @@ tts_heap_materialize(TupleTableSlot *slot) slot->tts_flags |= TTS_FLAG_SHOULDFREE; MemoryContextSwitchTo(oldContext); + + /* + * tts_values is treated invalidated since tts_nvalid is set to 0, so + * let's free the pre-detoast datum. + */ + ExecFreePreDetoastDatum(slot); + } static void @@ -457,6 +471,9 @@ tts_heap_store_tuple(TupleTableSlot *slot, HeapTuple tuple, bool shouldFree) if (shouldFree) slot->tts_flags |= TTS_FLAG_SHOULDFREE; + + /* slot_nvalid = 0 */ + ExecFreePreDetoastDatum(slot); } @@ -567,6 +584,9 @@ tts_minimal_materialize(TupleTableSlot *slot) mslot->minhdr.t_data = (HeapTupleHeader) ((char *) mslot->mintuple - MINIMAL_TUPLE_OFFSET); MemoryContextSwitchTo(oldContext); + + /* slot_nvalid = 0 */ + ExecFreePreDetoastDatum(slot); } static void @@ -637,6 +657,9 @@ tts_minimal_store_tuple(TupleTableSlot *slot, MinimalTuple mtup, bool shouldFree if (shouldFree) slot->tts_flags |= TTS_FLAG_SHOULDFREE; + + /* tts_nvalid = 0 */ + ExecFreePreDetoastDatum(slot); } @@ -771,6 +794,9 @@ tts_buffer_heap_materialize(TupleTableSlot *slot) slot->tts_flags |= TTS_FLAG_SHOULDFREE; MemoryContextSwitchTo(oldContext); + + /* slot_nvalid = 0 */ + ExecFreePreDetoastDatum(slot); } static void @@ -904,6 +930,9 @@ tts_buffer_heap_store_tuple(TupleTableSlot *slot, HeapTuple tuple, */ ReleaseBuffer(buffer); } + + /* tts_nvalid = 0 */ + ExecFreePreDetoastDatum(slot); } /* @@ -1150,7 +1179,10 @@ MakeTupleTableSlot(TupleDesc tupleDesc, + MAXALIGN(tupleDesc->natts * sizeof(Datum))); PinTupleDesc(tupleDesc); + slot->pre_detoasted_attrs = bitset_init(tupleDesc->natts); } + else + slot->pre_detoasted_attrs = NULL; /* * And allow slot type specific initialization. @@ -1288,6 +1320,8 @@ void ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */ TupleDesc tupdesc) /* new tuple descriptor */ { + MemoryContext old; + Assert(!TTS_FIXED(slot)); /* For safety, make sure slot is empty before changing it */ @@ -1304,6 +1338,8 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */ pfree(slot->tts_values); if (slot->tts_isnull) pfree(slot->tts_isnull); + if (slot->pre_detoasted_attrs) + bitset_free(slot->pre_detoasted_attrs); /* * Install the new descriptor; if it's refcounted, bump its refcount. @@ -1319,6 +1355,10 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */ MemoryContextAlloc(slot->tts_mcxt, tupdesc->natts * sizeof(Datum)); slot->tts_isnull = (bool *) MemoryContextAlloc(slot->tts_mcxt, tupdesc->natts * sizeof(bool)); + + old = MemoryContextSwitchTo(slot->tts_mcxt); + slot->pre_detoasted_attrs = bitset_init(tupdesc->natts); + MemoryContextSwitchTo(old); } /* -------------------------------- @@ -1810,12 +1850,26 @@ void ExecInitScanTupleSlot(EState *estate, ScanState *scanstate, TupleDesc tupledesc, const TupleTableSlotOps *tts_ops) { + Scan *splan = (Scan *) scanstate->ps.plan; + scanstate->ss_ScanTupleSlot = ExecAllocTableSlot(&estate->es_tupleTable, tupledesc, tts_ops); scanstate->ps.scandesc = tupledesc; scanstate->ps.scanopsfixed = tupledesc != NULL; scanstate->ps.scanops = tts_ops; scanstate->ps.scanopsset = true; + + if (is_scan_plan((Plan *) splan)) + { + /* + * We may run detoast in Qual or Projection, but all of them happen at + * the ss_ScanTupleSlot rather than ps_ResultTupleSlot. So we can only + * take care of the ss_ScanTupleSlot. + */ + scanstate->scan_pre_detoast_attrs = cal_final_pre_detoast_attrs(splan->reference_attrs, + tupledesc, + splan->plan.forbid_pre_detoast_vars); + } } /* ---------------- @@ -2336,3 +2390,83 @@ end_tup_output(TupOutputState *tstate) ExecDropSingleTupleTableSlot(tstate->slot); pfree(tstate); } + +/* + * cal_final_pre_detoast_attrs + * Calculate the final attributes which pre-detoast be helpful. + * + * reference_attrs: the attributes which will be detoast at this plan level. + * due to the implementation issue, some non-toast attribute may be included + * which should be filtered out with tupleDesc. + * + * forbid_pre_detoast_vars: the vars which should not be pre-detoast as the + * small_tlist reason. + */ +static Bitmapset * +cal_final_pre_detoast_attrs(Bitmapset *reference_attrs, + TupleDesc tupleDesc, + List *forbid_pre_detoast_vars) +{ + Bitmapset *final = NULL, + *toast_attrs = NULL, + *forbid_pre_detoast_attrs = NULL; + + int i; + ListCell *lc; + + if (bms_is_empty(reference_attrs)) + return NULL; + + /* + * there is no exact data type in create_plan or set_plan_refs stage, so + * reference_attrs may have some attribute which is not toast attrs at + * all, which should be removed. + */ + for (i = 0; i < tupleDesc->natts; i++) + { + Form_pg_attribute attr = TupleDescAttr(tupleDesc, i); + + if (attr->attlen == -1 && attr->attstorage != TYPSTORAGE_PLAIN) + toast_attrs = bms_add_member(toast_attrs, attr->attnum - 1); + } + + /* Filter out the non-toastable attributes. */ + final = bms_intersect(reference_attrs, toast_attrs); + + /* + * Due to the fact of detoast-datum will make the tuple bigger which is + * bad for some nodes like Sort/Hash, to avoid performance regression, + * such attribute should be removed as well. + */ + foreach(lc, forbid_pre_detoast_vars) + { + Var *var = lfirst_node(Var, lc); + + forbid_pre_detoast_attrs = bms_add_member(forbid_pre_detoast_attrs, var->varattno - 1); + } + + final = bms_del_members(final, forbid_pre_detoast_attrs); + + bms_free(toast_attrs); + bms_free(forbid_pre_detoast_attrs); + + return final; +} + + +void +SetPredetoastAttrsForJoin(JoinState *j) +{ + PlanState *outerstate = outerPlanState(j); + PlanState *innerstate = innerPlanState(j); + + j->outer_pre_detoast_attrs = cal_final_pre_detoast_attrs( + ((Join *) j->ps.plan)->outer_reference_attrs, + outerstate->ps_ResultTupleDesc, + outerstate->plan->forbid_pre_detoast_vars); + + j->inner_pre_detoast_attrs = cal_final_pre_detoast_attrs( + ((Join *) j->ps.plan)->inner_reference_attrs, + innerstate->ps_ResultTupleDesc, + innerstate->plan->forbid_pre_detoast_vars); +} diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index cff5dc723e..a8646ded02 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -572,6 +572,8 @@ ExecConditionalAssignProjectionInfo(PlanState *planstate, TupleDesc inputDesc, planstate->resultopsset = planstate->scanopsset; planstate->resultopsfixed = planstate->scanopsfixed; planstate->resultops = planstate->scanops; + + Assert(planstate->ps_ResultTupleDesc != NULL); } else { diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index 1cbec4647c..19a05ed624 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -756,6 +756,8 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags) innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags); innerDesc = ExecGetResultType(innerPlanState(hjstate)); + SetPredetoastAttrsForJoin((JoinState *) hjstate); + /* * Initialize result slot, type and projection. */ diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index c1a8ca2464..be7cbd7f30 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -1497,6 +1497,8 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags) (eflags | EXEC_FLAG_MARK)); innerDesc = ExecGetResultType(innerPlanState(mergestate)); + SetPredetoastAttrsForJoin((JoinState *) mergestate); + /* * For certain types of inner child nodes, it is advantageous to issue * MARK every time we advance past an inner tuple we will never return to. diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c index 06fa0a9b31..2d40d19192 100644 --- a/src/backend/executor/nodeNestloop.c +++ b/src/backend/executor/nodeNestloop.c @@ -306,6 +306,7 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags) */ ExecInitResultTupleSlotTL(&nlstate->js.ps, &TTSOpsVirtual); ExecAssignProjectionInfo(&nlstate->js.ps, NULL); + SetPredetoastAttrsForJoin((JoinState *) nlstate); /* * initialize child expressions diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index 0c448422e2..74563c3454 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -396,30 +396,52 @@ llvm_compile_expr(ExprState *state) case EEOP_INNER_VAR: case EEOP_OUTER_VAR: case EEOP_SCAN_VAR: + case EEOP_INNER_VAR_TOAST: + case EEOP_OUTER_VAR_TOAST: + case EEOP_SCAN_VAR_TOAST: { LLVMValueRef value, isnull; LLVMValueRef v_attnum; LLVMValueRef v_values; LLVMValueRef v_nulls; + LLVMValueRef v_slot; - if (opcode == EEOP_INNER_VAR) + if (opcode == EEOP_INNER_VAR || opcode == EEOP_INNER_VAR_TOAST) { + v_slot = v_innerslot; v_values = v_innervalues; v_nulls = v_innernulls; } - else if (opcode == EEOP_OUTER_VAR) + else if (opcode == EEOP_OUTER_VAR || opcode == EEOP_OUTER_VAR_TOAST) { + v_slot = v_outerslot; v_values = v_outervalues; v_nulls = v_outernulls; } else { + v_slot = v_scanslot; v_values = v_scanvalues; v_nulls = v_scannulls; } v_attnum = l_int32_const(lc, op->d.var.attnum); + + if (opcode == EEOP_INNER_VAR_TOAST || + opcode == EEOP_OUTER_VAR_TOAST || + opcode == EEOP_SCAN_VAR_TOAST) + { + LLVMValueRef params[2]; + + params[0] = v_slot; + params[1] = l_int32_const(lc, op->d.var.attnum); + l_call(b, + llvm_pg_var_func_type("ExecSlotDetoastDatumExternal"), + llvm_pg_func(mod, "ExecSlotDetoastDatumExternal"), + params, lengthof(params), ""); + } + value = l_load_gep1(b, TypeSizeT, v_values, v_attnum, ""); isnull = l_load_gep1(b, TypeStorageBool, v_nulls, v_attnum, ""); LLVMBuildStore(b, value, v_resvaluep); diff --git a/src/backend/jit/llvm/llvmjit_types.c b/src/backend/jit/llvm/llvmjit_types.c index 47c9daf402..1dcf0c2fd8 100644 --- a/src/backend/jit/llvm/llvmjit_types.c +++ b/src/backend/jit/llvm/llvmjit_types.c @@ -178,4 +178,5 @@ void *referenced_functions[] = strlen, varsize_any, ExecInterpExprStillValid, + ExecSlotDetoastDatumExternal, }; diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 610f4a56d6..8acb48240e 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -314,7 +314,9 @@ static ModifyTable *make_modifytable(PlannerInfo *root, Plan *subplan, List *mergeActionLists, int epqParam); static GatherMerge *create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path); - +static void set_plan_forbid_pre_detoast_vars_recurse(Plan *plan, + List *small_tlist); +static void set_plan_not_pre_detoast_vars(Plan *plan, List *small_tlist); /* * create_plan @@ -346,6 +348,12 @@ create_plan(PlannerInfo *root, Path *best_path) /* Recursively process the path tree, demanding the correct tlist result */ plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST); + /* + * After the plan tree is built completed, we start to walk for which + * expressions should not used the shared-detoast feature. + */ + set_plan_forbid_pre_detoast_vars_recurse(plan, NIL); + /* * Make sure the topmost plan node's targetlist exposes the original * column names and other decorative info. Targetlists generated within @@ -378,6 +386,101 @@ create_plan(PlannerInfo *root, Path *best_path) return plan; } +/* + * set_plan_forbid_pre_detoast_vars_recurse + * Walking the Plan tree in the top-down manner to gather the vars which + * should be as small as possible and record them in Plan.forbid_pre_detoast_vars + * + * plan: the plan node to walk right now. + * small_tlist: a list of nodes which its subplan should provide them as + * small as possible. + */ +static void +set_plan_forbid_pre_detoast_vars_recurse(Plan *plan, List *small_tlist) +{ + if (plan == NULL) + return; + + set_plan_not_pre_detoast_vars(plan, small_tlist); + + /* Recurse to its subplan.. */ + if (IsA(plan, Sort) || IsA(plan, Memoize) || IsA(plan, WindowAgg) || + IsA(plan, Hash) || IsA(plan, Material) || IsA(plan, IncrementalSort)) + { + List *small_tlist = get_tlist_exprs(plan->lefttree->targetlist, true); + + /* + * For the sort-like nodes, we want the output of its subplan as small + * as possible, but the subplan's other expressions like Qual doesn't + * have this restriction since they are not output to the upper nodes. + * so we set the small_tlist to the subplan->targetlist. + */ + set_plan_forbid_pre_detoast_vars_recurse(plan->lefttree, small_tlist); + } + else if (IsA(plan, HashJoin) && castNode(HashJoin, plan)->left_small_tlist) + { + List *small_tlist = get_tlist_exprs(plan->lefttree->targetlist, true); + + /* + * If the left_small_tlist wants a as small as possible tlist, set it + * in a way like sort for the left node. + */ + set_plan_forbid_pre_detoast_vars_recurse(plan->lefttree, small_tlist); + + /* + * The righttree is a Hash node, it can be set with its own rule, so + * the small_tlist provided is not important, we just need to recuse + * to its subplan. + */ + set_plan_forbid_pre_detoast_vars_recurse(plan->righttree, plan->forbid_pre_detoast_vars); + } + else + { + /* + * Recurse to its children, just push down the forbid_pre_detoast_vars + * to its children. + */ + set_plan_forbid_pre_detoast_vars_recurse(plan->lefttree, plan->forbid_pre_detoast_vars); + set_plan_forbid_pre_detoast_vars_recurse(plan->righttree, plan->forbid_pre_detoast_vars); + } +} + +/* + * set_plan_not_pre_detoast_vars + * + * Set the Plan.forbid_pre_detoast_vars according the small_tlist information. + * + * small_tlist = NIL means nothing is forbidden, or else if a Var belongs to the + * small_tlist, then it must not be pre-detoasted. + */ +static void +set_plan_not_pre_detoast_vars(Plan *plan, List *small_tlist) +{ + ListCell *lc; + Var *var; + + /* + * fast path, if we don't have a small_tlist, the var in targetlist is + * impossible member of it. and this case might be a pretty common case. + */ + if (small_tlist == NIL) + return; + + foreach(lc, plan->targetlist) + { + TargetEntry *te = lfirst_node(TargetEntry, lc); + + if (!IsA(te->expr, Var)) + continue; + var = castNode(Var, te->expr); + if (var->varattno <= 0) + continue; + if (list_member(small_tlist, var)) + /* pass the recheck */ + plan->forbid_pre_detoast_vars = lappend(plan->forbid_pre_detoast_vars, var); + } +} + /* * create_plan_recurse * Recursive guts of create_plan(). @@ -4893,6 +4996,8 @@ create_hashjoin_plan(PlannerInfo *root, copy_generic_path_info(&join_plan->join.plan, &best_path->jpath.path); + join_plan->left_small_tlist = (best_path->num_batches > 1); + return join_plan; } diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 22a1fa29f3..f9b30903c2 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -27,6 +27,7 @@ #include "optimizer/tlist.h" #include "parser/parse_relation.h" #include "tcop/utility.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/syscache.h" @@ -55,11 +56,48 @@ typedef struct tlist_vinfo vars[FLEXIBLE_ARRAY_MEMBER]; /* has num_vars entries */ } indexed_tlist; +/* + * Decide which attrs are detoasted in a expressions level, this is judged + * at the fix_scan/join_expr stage. The recursed level is tracked when we + * walk to a Var, if the level is greater than 1, then it means the + * var needs an detoast in this expression list, there are some exceptions + * here, see increase_level_for_pre_detoast for details. + */ +typedef struct +{ + /* if the level is added during a certain walk. */ + bool level_added; + /* the current level during the walk. */ + int level; +} intermediate_level_context; + +/* + * Context to hold the detoast attribute within a expression. + * + * XXX: this design was intent to avoid the pre-detoast-logic if the var + * only need to be detoasted *once*, but for now, this context is only + * maintained at the expression level rather than plan tree level, so it + * can't detect if a Var will be detoasted 2+ time at the plan level. + * Recording the times of a Var is detoasted in the plan tree level is + * complex, so before we decide it is a must, I am not willing to do too + * many changes here. + */ +typedef struct +{ + /* var is accessed for the first time. */ + Bitmapset *existing_attrs; + /* var is accessed for the 2+ times. */ + Bitmapset **final_ref_attrs; +} intermediate_var_ref_context; + + typedef struct { PlannerInfo *root; int rtoffset; double num_exec; + intermediate_level_context level_ctx; + intermediate_var_ref_context scan_reference_attrs; } fix_scan_expr_context; typedef struct @@ -71,6 +109,9 @@ typedef struct int rtoffset; NullingRelsMatch nrm_match; double num_exec; + intermediate_level_context level_ctx; + intermediate_var_ref_context outer_reference_attrs; + intermediate_var_ref_context inner_reference_attrs; } fix_join_expr_context; typedef struct @@ -127,8 +168,8 @@ typedef struct (((con)->consttype == REGCLASSOID || (con)->consttype == OIDOID) && \ !(con)->constisnull) -#define fix_scan_list(root, lst, rtoffset, num_exec) \ - ((List *) fix_scan_expr(root, (Node *) (lst), rtoffset, num_exec)) +#define fix_scan_list(root, lst, rtoffset, num_exec, pre_detoast_attrs) \ + ((List *) fix_scan_expr(root, (Node *) (lst), rtoffset, num_exec, pre_detoast_attrs)) static void add_rtes_to_flat_rtable(PlannerInfo *root, bool recursing); static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte); @@ -158,7 +199,8 @@ static Plan *set_mergeappend_references(PlannerInfo *root, static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset); static Relids offset_relid_set(Relids relids, int rtoffset); static Node *fix_scan_expr(PlannerInfo *root, Node *node, - int rtoffset, double num_exec); + int rtoffset, double num_exec, + Bitmapset **scan_reference_attrs); static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context); static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context); static void set_join_references(PlannerInfo *root, Join *join, int rtoffset); @@ -190,7 +232,10 @@ static List *fix_join_expr(PlannerInfo *root, Index acceptable_rel, int rtoffset, NullingRelsMatch nrm_match, - double num_exec); + double num_exec, + Bitmapset **outer_reference_attrs, + Bitmapset **inner_reference_attrs); + static Node *fix_join_expr_mutator(Node *node, fix_join_expr_context *context); static Node *fix_upper_expr(PlannerInfo *root, @@ -628,10 +673,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); + + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), NULL); } break; case T_SampleScan: @@ -641,13 +692,20 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs + ); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); splan->tablesample = (TableSampleClause *) fix_scan_expr(root, (Node *) splan->tablesample, - rtoffset, 1); + rtoffset, 1, + &splan->scan.reference_attrs); + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), NULL); } break; case T_IndexScan: @@ -657,28 +715,40 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); + splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); + splan->indexqual = fix_scan_list(root, splan->indexqual, - rtoffset, 1); + rtoffset, 1, &splan->scan.reference_attrs); splan->indexqualorig = fix_scan_list(root, splan->indexqualorig, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); splan->indexorderby = fix_scan_list(root, splan->indexorderby, - rtoffset, 1); + rtoffset, 1, &splan->scan.reference_attrs); splan->indexorderbyorig = fix_scan_list(root, splan->indexorderbyorig, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), &splan->scan.reference_attrs); + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), NULL); } break; case T_IndexOnlyScan: { IndexOnlyScan *splan = (IndexOnlyScan *) plan; + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), NULL); + return set_indexonlyscan_references(root, splan, rtoffset); } break; @@ -691,10 +761,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) Assert(splan->scan.plan.targetlist == NIL); Assert(splan->scan.plan.qual == NIL); splan->indexqual = - fix_scan_list(root, splan->indexqual, rtoffset, 1); + fix_scan_list(root, splan->indexqual, rtoffset, 1, + &splan->scan.reference_attrs); splan->indexqualorig = fix_scan_list(root, splan->indexqualorig, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), NULL); } break; case T_BitmapHeapScan: @@ -704,13 +779,20 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); splan->bitmapqualorig = fix_scan_list(root, splan->bitmapqualorig, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), + NULL); } break; case T_TidScan: @@ -720,13 +802,20 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); splan->tidquals = fix_scan_list(root, splan->tidquals, - rtoffset, 1); + rtoffset, 1, + &splan->scan.reference_attrs); + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), + NULL); } break; case T_TidRangeScan: @@ -736,13 +825,20 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); splan->tidrangequals = fix_scan_list(root, splan->tidrangequals, - rtoffset, 1); + rtoffset, 1, + &splan->scan.reference_attrs); + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), + NULL); } break; case T_SubqueryScan: @@ -757,12 +853,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); splan->functions = - fix_scan_list(root, splan->functions, rtoffset, 1); + fix_scan_list(root, splan->functions, rtoffset, 1, + &splan->scan.reference_attrs); + } break; case T_TableFuncScan: @@ -772,13 +872,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); + splan->tablefunc = (TableFunc *) fix_scan_expr(root, (Node *) splan->tablefunc, - rtoffset, 1); + rtoffset, 1, + &splan->scan.reference_attrs); } break; case T_ValuesScan: @@ -788,13 +892,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); splan->values_lists = fix_scan_list(root, splan->values_lists, - rtoffset, 1); + rtoffset, 1, + &splan->scan.reference_attrs); } break; case T_CteScan: @@ -804,10 +911,16 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); + splan->scan.plan.forbid_pre_detoast_vars = + fix_scan_list(root, splan->scan.plan.forbid_pre_detoast_vars, + rtoffset, NUM_EXEC_TLIST(plan), + NULL); } break; case T_NamedTuplestoreScan: @@ -817,10 +930,12 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); } break; case T_WorkTableScan: @@ -830,10 +945,12 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->scan.scanrelid += rtoffset; splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), + &splan->scan.reference_attrs); splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), + &splan->scan.reference_attrs); } break; case T_ForeignScan: @@ -873,7 +990,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) mplan->param_exprs = fix_scan_list(root, mplan->param_exprs, rtoffset, - NUM_EXEC_TLIST(plan)); + NUM_EXEC_TLIST(plan), + NULL); break; } @@ -933,9 +1051,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) Assert(splan->plan.qual == NIL); splan->limitOffset = - fix_scan_expr(root, splan->limitOffset, rtoffset, 1); + fix_scan_expr(root, splan->limitOffset, rtoffset, 1, NULL); splan->limitCount = - fix_scan_expr(root, splan->limitCount, rtoffset, 1); + fix_scan_expr(root, splan->limitCount, rtoffset, 1, NULL); } break; case T_Agg: @@ -988,17 +1106,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) * variable refs, so fix_scan_expr works for them. */ wplan->startOffset = - fix_scan_expr(root, wplan->startOffset, rtoffset, 1); + fix_scan_expr(root, wplan->startOffset, rtoffset, 1, NULL); wplan->endOffset = - fix_scan_expr(root, wplan->endOffset, rtoffset, 1); + fix_scan_expr(root, wplan->endOffset, rtoffset, 1, NULL); wplan->runCondition = fix_scan_list(root, wplan->runCondition, rtoffset, - NUM_EXEC_TLIST(plan)); + NUM_EXEC_TLIST(plan), NULL); wplan->runConditionOrig = fix_scan_list(root, wplan->runConditionOrig, rtoffset, - NUM_EXEC_TLIST(plan)); + NUM_EXEC_TLIST(plan), NULL); } break; case T_Result: @@ -1038,14 +1156,14 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->plan.targetlist = fix_scan_list(root, splan->plan.targetlist, - rtoffset, NUM_EXEC_TLIST(plan)); + rtoffset, NUM_EXEC_TLIST(plan), NULL); splan->plan.qual = fix_scan_list(root, splan->plan.qual, - rtoffset, NUM_EXEC_QUAL(plan)); + rtoffset, NUM_EXEC_QUAL(plan), NULL); } /* resconstantqual can't contain any subplan variable refs */ splan->resconstantqual = - fix_scan_expr(root, splan->resconstantqual, rtoffset, 1); + fix_scan_expr(root, splan->resconstantqual, rtoffset, 1, NULL); } break; case T_ProjectSet: @@ -1061,7 +1179,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) splan->withCheckOptionLists = fix_scan_list(root, splan->withCheckOptionLists, - rtoffset, 1); + rtoffset, 1, NULL); if (splan->returningLists) { @@ -1118,18 +1236,20 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) fix_join_expr(root, splan->onConflictSet, NULL, itlist, linitial_int(splan->resultRelations), - rtoffset, NRM_EQUAL, NUM_EXEC_QUAL(plan)); + rtoffset, NRM_EQUAL, NUM_EXEC_QUAL(plan), + NULL, NULL); splan->onConflictWhere = (Node *) fix_join_expr(root, (List *) splan->onConflictWhere, NULL, itlist, linitial_int(splan->resultRelations), - rtoffset, NRM_EQUAL, NUM_EXEC_QUAL(plan)); + rtoffset, NRM_EQUAL, NUM_EXEC_QUAL(plan), + NULL, NULL); pfree(itlist); splan->exclRelTlist = - fix_scan_list(root, splan->exclRelTlist, rtoffset, 1); + fix_scan_list(root, splan->exclRelTlist, rtoffset, 1, NULL); } /* @@ -1182,7 +1302,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) resultrel, rtoffset, NRM_EQUAL, - NUM_EXEC_TLIST(plan)); + NUM_EXEC_TLIST(plan), + NULL, NULL); /* Fix quals too. */ action->qual = (Node *) fix_join_expr(root, @@ -1191,7 +1312,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) resultrel, rtoffset, NRM_EQUAL, - NUM_EXEC_QUAL(plan)); + NUM_EXEC_QUAL(plan), + NULL, NULL); } } } @@ -1356,13 +1478,16 @@ set_indexonlyscan_references(PlannerInfo *root, NUM_EXEC_QUAL((Plan *) plan)); /* indexqual is already transformed to reference index columns */ plan->indexqual = fix_scan_list(root, plan->indexqual, - rtoffset, 1); + rtoffset, 1, + &plan->scan.reference_attrs); /* indexorderby is already transformed to reference index columns */ plan->indexorderby = fix_scan_list(root, plan->indexorderby, - rtoffset, 1); + rtoffset, 1, + &plan->scan.reference_attrs); /* indextlist must NOT be transformed to reference index columns */ plan->indextlist = fix_scan_list(root, plan->indextlist, - rtoffset, NUM_EXEC_TLIST((Plan *) plan)); + rtoffset, NUM_EXEC_TLIST((Plan *) plan), + &plan->scan.reference_attrs); pfree(index_itlist); @@ -1409,10 +1534,10 @@ set_subqueryscan_references(PlannerInfo *root, plan->scan.scanrelid += rtoffset; plan->scan.plan.targetlist = fix_scan_list(root, plan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST((Plan *) plan)); + rtoffset, NUM_EXEC_TLIST((Plan *) plan), NULL); plan->scan.plan.qual = fix_scan_list(root, plan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL((Plan *) plan)); + rtoffset, NUM_EXEC_QUAL((Plan *) plan), NULL); result = (Plan *) plan; } @@ -1612,7 +1737,7 @@ set_foreignscan_references(PlannerInfo *root, /* fdw_scan_tlist itself just needs fix_scan_list() adjustments */ fscan->fdw_scan_tlist = fix_scan_list(root, fscan->fdw_scan_tlist, - rtoffset, NUM_EXEC_TLIST((Plan *) fscan)); + rtoffset, NUM_EXEC_TLIST((Plan *) fscan), NULL); } else { @@ -1622,16 +1747,16 @@ set_foreignscan_references(PlannerInfo *root, */ fscan->scan.plan.targetlist = fix_scan_list(root, fscan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST((Plan *) fscan)); + rtoffset, NUM_EXEC_TLIST((Plan *) fscan), NULL); fscan->scan.plan.qual = fix_scan_list(root, fscan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL((Plan *) fscan)); + rtoffset, NUM_EXEC_QUAL((Plan *) fscan), NULL); fscan->fdw_exprs = fix_scan_list(root, fscan->fdw_exprs, - rtoffset, NUM_EXEC_QUAL((Plan *) fscan)); + rtoffset, NUM_EXEC_QUAL((Plan *) fscan), NULL); fscan->fdw_recheck_quals = fix_scan_list(root, fscan->fdw_recheck_quals, - rtoffset, NUM_EXEC_QUAL((Plan *) fscan)); + rtoffset, NUM_EXEC_QUAL((Plan *) fscan), NULL); } fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset); @@ -1690,20 +1815,20 @@ set_customscan_references(PlannerInfo *root, /* custom_scan_tlist itself just needs fix_scan_list() adjustments */ cscan->custom_scan_tlist = fix_scan_list(root, cscan->custom_scan_tlist, - rtoffset, NUM_EXEC_TLIST((Plan *) cscan)); + rtoffset, NUM_EXEC_TLIST((Plan *) cscan), NULL); } else { /* Adjust tlist, qual, custom_exprs in the standard way */ cscan->scan.plan.targetlist = fix_scan_list(root, cscan->scan.plan.targetlist, - rtoffset, NUM_EXEC_TLIST((Plan *) cscan)); + rtoffset, NUM_EXEC_TLIST((Plan *) cscan), NULL); cscan->scan.plan.qual = fix_scan_list(root, cscan->scan.plan.qual, - rtoffset, NUM_EXEC_QUAL((Plan *) cscan)); + rtoffset, NUM_EXEC_QUAL((Plan *) cscan), NULL); cscan->custom_exprs = fix_scan_list(root, cscan->custom_exprs, - rtoffset, NUM_EXEC_QUAL((Plan *) cscan)); + rtoffset, NUM_EXEC_QUAL((Plan *) cscan), NULL); } /* Adjust child plan-nodes recursively, if needed */ @@ -2111,6 +2236,95 @@ fix_alternative_subplan(PlannerInfo *root, AlternativeSubPlan *asplan, return (Node *) bestplan; } + +static inline void +setup_intermediate_level_ctx(intermediate_level_context *ctx) +{ + ctx->level = 0; + ctx->level_added = false; +} + +static inline void +setup_intermediate_var_ref_ctx(intermediate_var_ref_context *ctx, Bitmapset **final_ref_attrs) +{ + ctx->existing_attrs = NULL; + ctx->final_ref_attrs = final_ref_attrs; +} + +/* + * increase_level_for_pre_detoast + * Check if the given Expr could detoast a Var directly, if yes, + * increase the level and return true. otherwise return false; + */ +static inline void +increase_level_for_pre_detoast(Node *node, intermediate_level_context *ctx) +{ + /* The following nodes is impossible to detoast a Var directly. */ + if (IsA(node, List) || IsA(node, TargetEntry) || IsA(node, NullTest)) + { + ctx->level_added = false; + } + else if (IsA(node, FuncExpr) && castNode(FuncExpr, node)->funcid == F_PG_COLUMN_COMPRESSION) + { + /* let's not detoast first so that pg_column_compression works. */ + ctx->level_added = false; + } + else + { + ctx->level_added = true; + ctx->level += 1; + } +} + +static inline void +decreased_level_for_pre_detoast(intermediate_level_context *ctx) +{ + if (ctx->level_added) + ctx->level -= 1; + + ctx->level_added = false; +} + +/* + * add_pre_detoast_vars + * add the var's information into pre_detoast_attrs when the check is pass. + */ +static inline void +add_pre_detoast_vars(intermediate_level_context *level_ctx, + intermediate_var_ref_context *ctx, + Var *var) +{ + int attno; + + if (level_ctx->level <= 1 || ctx->final_ref_attrs == NULL || var->varattno <= 0) + return; + + attno = var->varattno - 1; + if (bms_is_member(attno, ctx->existing_attrs)) + { + /* not the first time to access it, add it to final result. */ + *ctx->final_ref_attrs = bms_add_member(*ctx->final_ref_attrs, attno); + } + else + { + /* first time. */ + ctx->existing_attrs = bms_add_member(ctx->existing_attrs, attno); + + /* + * XXX: + * + * The above strategy doesn't help to detect if a Var is detoast + * twice. Reasons are: 1. the context is not maintain in Plan node + * level. so if it is detoast at targetlist and qual, we can't detect + * it. 2. even we can make it at plan node, it still doesn't help for + * the among-nodes case. + * + * So for now, I just disable it. + */ + *ctx->final_ref_attrs = bms_add_member(*ctx->final_ref_attrs, attno); + } +} + /* * fix_scan_expr * Do set_plan_references processing on a scan-level expression @@ -2125,18 +2339,23 @@ fix_alternative_subplan(PlannerInfo *root, AlternativeSubPlan *asplan, * 'node': the expression to be modified * 'rtoffset': how much to increment varnos by * 'num_exec': estimated number of executions of expression + * 'scan_reference_attrs': gather which vars are potential to run the detoast + * on this expr, NULL means the caller doesn't have interests on this. * * The expression tree is either copied-and-modified, or modified in-place * if that seems safe. */ static Node * -fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset, double num_exec) +fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset, + double num_exec, Bitmapset **scan_reference_attrs) { fix_scan_expr_context context; context.root = root; context.rtoffset = rtoffset; context.num_exec = num_exec; + setup_intermediate_level_ctx(&context.level_ctx); + setup_intermediate_var_ref_ctx(&context.scan_reference_attrs, scan_reference_attrs); if (rtoffset != 0 || root->multiexpr_params != NIL || @@ -2167,8 +2386,13 @@ fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset, double num_exec) static Node * fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context) { + Node *n; + if (node == NULL) return NULL; + + increase_level_for_pre_detoast(node, &context->level_ctx); + if (IsA(node, Var)) { Var *var = copyVar((Var *) node); @@ -2186,10 +2410,16 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context) var->varno += context->rtoffset; if (var->varnosyn > 0) var->varnosyn += context->rtoffset; + + add_pre_detoast_vars(&context->level_ctx, &context->scan_reference_attrs, var); + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) var; } if (IsA(node, Param)) + { + decreased_level_for_pre_detoast(&context->level_ctx); return fix_param_node(context->root, (Param *) node); + } if (IsA(node, Aggref)) { Aggref *aggref = (Aggref *) node; @@ -2199,8 +2429,10 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context) aggparam = find_minmax_agg_replacement_param(context->root, aggref); if (aggparam != NULL) { + decreased_level_for_pre_detoast(&context->level_ctx); /* Make a copy of the Param for paranoia's sake */ return (Node *) copyObject(aggparam); + } /* If no match, just fall through to process it normally */ } @@ -2210,6 +2442,7 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context) Assert(!IS_SPECIAL_VARNO(cexpr->cvarno)); cexpr->cvarno += context->rtoffset; + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) cexpr; } if (IsA(node, PlaceHolderVar)) @@ -2218,29 +2451,52 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context) PlaceHolderVar *phv = (PlaceHolderVar *) node; /* XXX can we assert something about phnullingrels? */ - return fix_scan_expr_mutator((Node *) phv->phexpr, context); + Node *n2 = fix_scan_expr_mutator((Node *) phv->phexpr, context); + + decreased_level_for_pre_detoast(&context->level_ctx); + return n2; } if (IsA(node, AlternativeSubPlan)) - return fix_scan_expr_mutator(fix_alternative_subplan(context->root, - (AlternativeSubPlan *) node, - context->num_exec), - context); + { + Node *n2 = fix_scan_expr_mutator(fix_alternative_subplan(context->root, + (AlternativeSubPlan *) node, + context->num_exec), + context); + + decreased_level_for_pre_detoast(&context->level_ctx); + return n2; + } fix_expr_common(context->root, node); - return expression_tree_mutator(node, fix_scan_expr_mutator, - (void *) context); + n = expression_tree_mutator(node, fix_scan_expr_mutator, (void *) context); + decreased_level_for_pre_detoast(&context->level_ctx); + return n; } static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context) { + bool ret; + if (node == NULL) return false; + + increase_level_for_pre_detoast(node, &context->level_ctx); + + if (IsA(node, Var)) + { + add_pre_detoast_vars(&context->level_ctx, + &context->scan_reference_attrs, + castNode(Var, node)); + } Assert(!(IsA(node, Var) && ((Var *) node)->varno == ROWID_VAR)); Assert(!IsA(node, PlaceHolderVar)); Assert(!IsA(node, AlternativeSubPlan)); fix_expr_common(context->root, node); - return expression_tree_walker(node, fix_scan_expr_walker, - (void *) context); + ret = expression_tree_walker(node, fix_scan_expr_walker, + (void *) context); + + decreased_level_for_pre_detoast(&context->level_ctx); + return ret; } /* @@ -2276,7 +2532,10 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset) (Index) 0, rtoffset, NRM_EQUAL, - NUM_EXEC_QUAL((Plan *) join)); + NUM_EXEC_QUAL((Plan *) join), + &join->outer_reference_attrs, + &join->inner_reference_attrs + ); /* Now do join-type-specific stuff */ if (IsA(join, NestLoop)) @@ -2323,7 +2582,9 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset) (Index) 0, rtoffset, NRM_EQUAL, - NUM_EXEC_QUAL((Plan *) join)); + NUM_EXEC_QUAL((Plan *) join), + &join->outer_reference_attrs, + &join->inner_reference_attrs); } else if (IsA(join, HashJoin)) { @@ -2336,7 +2597,9 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset) (Index) 0, rtoffset, NRM_EQUAL, - NUM_EXEC_QUAL((Plan *) join)); + NUM_EXEC_QUAL((Plan *) join), + &join->outer_reference_attrs, + &join->inner_reference_attrs); /* * HashJoin's hashkeys are used to look for matching tuples from its @@ -2368,7 +2631,9 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset) (Index) 0, rtoffset, (join->jointype == JOIN_INNER ? NRM_EQUAL : NRM_SUPERSET), - NUM_EXEC_TLIST((Plan *) join)); + NUM_EXEC_TLIST((Plan *) join), + &join->outer_reference_attrs, + &join->inner_reference_attrs); join->plan.qual = fix_join_expr(root, join->plan.qual, outer_itlist, @@ -2376,8 +2641,20 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset) (Index) 0, rtoffset, (join->jointype == JOIN_INNER ? NRM_EQUAL : NRM_SUPERSET), - NUM_EXEC_QUAL((Plan *) join)); - + NUM_EXEC_QUAL((Plan *) join), + &join->outer_reference_attrs, + &join->inner_reference_attrs); + + join->plan.forbid_pre_detoast_vars = fix_join_expr(root, + join->plan.forbid_pre_detoast_vars, + outer_itlist, + inner_itlist, + (Index) 0, + rtoffset, + (join->jointype == JOIN_INNER ? NRM_EQUAL : NRM_SUPERSET), + NUM_EXEC_TLIST((Plan *) join), + NULL, + NULL); pfree(outer_itlist); pfree(inner_itlist); } @@ -3010,9 +3287,12 @@ fix_join_expr(PlannerInfo *root, Index acceptable_rel, int rtoffset, NullingRelsMatch nrm_match, - double num_exec) + double num_exec, + Bitmapset **outer_reference_attrs, + Bitmapset **inner_reference_attrs) { fix_join_expr_context context; + List *ret; context.root = root; context.outer_itlist = outer_itlist; @@ -3021,16 +3301,30 @@ fix_join_expr(PlannerInfo *root, context.rtoffset = rtoffset; context.nrm_match = nrm_match; context.num_exec = num_exec; - return (List *) fix_join_expr_mutator((Node *) clauses, &context); + + setup_intermediate_level_ctx(&context.level_ctx); + setup_intermediate_var_ref_ctx(&context.outer_reference_attrs, outer_reference_attrs); + setup_intermediate_var_ref_ctx(&context.inner_reference_attrs, inner_reference_attrs); + + ret = (List *) fix_join_expr_mutator((Node *) clauses, &context); + + bms_free(context.outer_reference_attrs.existing_attrs); + bms_free(context.inner_reference_attrs.existing_attrs); + + return ret; } static Node * fix_join_expr_mutator(Node *node, fix_join_expr_context *context) { Var *newvar; + Node *ret_node; if (node == NULL) return NULL; + + increase_level_for_pre_detoast(node, &context->level_ctx); + if (IsA(node, Var)) { Var *var = (Var *) node; @@ -3044,7 +3338,13 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) context->rtoffset, context->nrm_match); if (newvar) + { + add_pre_detoast_vars(&context->level_ctx, + &context->outer_reference_attrs, + newvar); + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) newvar; + } } /* then in the inner. */ @@ -3056,7 +3356,13 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) context->rtoffset, context->nrm_match); if (newvar) + { + add_pre_detoast_vars(&context->level_ctx, + &context->inner_reference_attrs, + newvar); + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) newvar; + } } /* If it's for acceptable_rel, adjust and return it */ @@ -3066,6 +3372,9 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) var->varno += context->rtoffset; if (var->varnosyn > 0) var->varnosyn += context->rtoffset; + /* XXX acceptable_rel? we can ignore it for safety. */ + decreased_level_for_pre_detoast(&context->level_ctx); + return (Node *) var; } @@ -3084,22 +3393,38 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) OUTER_VAR, context->nrm_match); if (newvar) + { + add_pre_detoast_vars(&context->level_ctx, + &context->outer_reference_attrs, + newvar); + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) newvar; + } } if (context->inner_itlist && context->inner_itlist->has_ph_vars) { + newvar = search_indexed_tlist_for_phv(phv, context->inner_itlist, INNER_VAR, context->nrm_match); if (newvar) + { + add_pre_detoast_vars(&context->level_ctx, + &context->inner_reference_attrs, + newvar); + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) newvar; + } } /* If not supplied by input plans, evaluate the contained expr */ /* XXX can we assert something about phnullingrels? */ - return fix_join_expr_mutator((Node *) phv->phexpr, context); + ret_node = fix_join_expr_mutator((Node *) phv->phexpr, context); + decreased_level_for_pre_detoast(&context->level_ctx); + return ret_node; } + /* Try matching more complex expressions too, if tlists have any */ if (context->outer_itlist && context->outer_itlist->has_non_vars) { @@ -3107,7 +3432,13 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) context->outer_itlist, OUTER_VAR); if (newvar) + { + add_pre_detoast_vars(&context->level_ctx, + &context->outer_reference_attrs, + newvar); + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) newvar; + } } if (context->inner_itlist && context->inner_itlist->has_non_vars) { @@ -3115,20 +3446,36 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) context->inner_itlist, INNER_VAR); if (newvar) + { + add_pre_detoast_vars(&context->level_ctx, + &context->inner_reference_attrs, + newvar); + decreased_level_for_pre_detoast(&context->level_ctx); return (Node *) newvar; + } } /* Special cases (apply only AFTER failing to match to lower tlist) */ if (IsA(node, Param)) - return fix_param_node(context->root, (Param *) node); + { + ret_node = fix_param_node(context->root, (Param *) node); + decreased_level_for_pre_detoast(&context->level_ctx); + return ret_node; + } if (IsA(node, AlternativeSubPlan)) - return fix_join_expr_mutator(fix_alternative_subplan(context->root, - (AlternativeSubPlan *) node, - context->num_exec), - context); + { + ret_node = fix_join_expr_mutator(fix_alternative_subplan(context->root, + (AlternativeSubPlan *) node, + context->num_exec), + context); + decreased_level_for_pre_detoast(&context->level_ctx); + return ret_node; + } fix_expr_common(context->root, node); - return expression_tree_mutator(node, - fix_join_expr_mutator, - (void *) context); + ret_node = expression_tree_mutator(node, + fix_join_expr_mutator, + (void *) context); + decreased_level_for_pre_detoast(&context->level_ctx); + return ret_node; } /* @@ -3163,7 +3510,8 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) * varno = newvarno, varattno = resno of corresponding targetlist element. * The original tree is not modified. */ -static Node * +static Node * /* XXX: shall I care about this for shared + * detoast optimization? */ fix_upper_expr(PlannerInfo *root, Node *node, indexed_tlist *subplan_itlist, @@ -3318,7 +3666,10 @@ set_returning_clause_references(PlannerInfo *root, resultRelation, rtoffset, NRM_EQUAL, - NUM_EXEC_TLIST(topplan)); + NUM_EXEC_TLIST(topplan), + NULL, + NULL + ); pfree(itlist); diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index a28ddcdd77..9304786bb2 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -78,6 +78,17 @@ typedef enum ExprEvalOp EEOP_OUTER_VAR, EEOP_SCAN_VAR, + /* + * compute non-system Var value with shared-detoast-datum logic, use some + * dedicated steps rather than add extra logic to existing steps is for + * performance aspect, within this way, we just decide if the extra logic + * is needed at ExecInitExpr stage once rather than every time of + * ExecInterpExpr. + */ + EEOP_INNER_VAR_TOAST, + EEOP_OUTER_VAR_TOAST, + EEOP_SCAN_VAR_TOAST, + /* compute system Var value */ EEOP_INNER_SYSVAR, EEOP_OUTER_SYSVAR, @@ -830,5 +841,6 @@ extern void ExecEvalAggOrderedTransDatum(ExprState *state, ExprEvalStep *op, ExprContext *econtext); extern void ExecEvalAggOrderedTransTuple(ExprState *state, ExprEvalStep *op, ExprContext *econtext); +extern void ExecSlotDetoastDatumExternal(TupleTableSlot *slot, int attnum); #endif /* EXEC_EXPR_H */ diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h index 6133dbcd0a..d87751e460 100644 --- a/src/include/executor/tuptable.h +++ b/src/include/executor/tuptable.h @@ -18,6 +18,7 @@ #include "access/htup_details.h" #include "access/sysattr.h" #include "access/tupdesc.h" +#include "nodes/bitmapset.h" #include "storage/buf.h" /*---------- @@ -128,6 +129,25 @@ typedef struct TupleTableSlot MemoryContext tts_mcxt; /* slot itself is in this context */ ItemPointerData tts_tid; /* stored tuple's tid */ Oid tts_tableOid; /* table oid of tuple */ + + /* + * The attributes whose values are the detoasted version in tts_values[*], + * if so these memory needs some extra clean-up. These memory can't be put + * into ecxt_per_tuple_memory since many of them needs a longer life span, + * for example the Datum in outer join. These memory is put into + * TupleTableSlot.tts_mcxt and be clear whenever the tts_values[*] is + * invalidated. + * + * Bitset rather than Bitmapset is chosen here because when all the + * members of Bitmapset are deleted, the allocated memory will be + * deallocated automatically, which is too expensive in this case since we + * need to deleted all the members in each ExecClearTuple and repopulate + * it again when fill the detoast datum to tts_values[*]. This situation + * will be run again and again in an execution cycle. + * + * These values are populated by EEOP_{INNER/OUTER/SCAN}_VAR_TOAST steps. + */ + Bitset *pre_detoasted_attrs; } TupleTableSlot; /* routines for a TupleTableSlot implementation */ @@ -426,12 +446,36 @@ slot_getsysattr(TupleTableSlot *slot, int attnum, bool *isnull) return slot->tts_ops->getsysattr(slot, attnum, isnull); } +/* + * ExecFreePreDetoastDatum - free the memory which is allocated in pre-detoast-datum. + */ +static inline void +ExecFreePreDetoastDatum(TupleTableSlot *slot) +{ + int attnum; + + attnum = -1; + while ((attnum = bitset_next_member(slot->pre_detoasted_attrs, attnum)) >= 0) + { + pfree((void *) slot->tts_values[attnum]); + } + + /* + * unset the bits but keep the memory for later use, this is importance + * for at the performance aspect. + */ + bitset_clear(slot->pre_detoasted_attrs); +} + + /* * ExecClearTuple - clear the slot's contents */ static inline TupleTableSlot * ExecClearTuple(TupleTableSlot *slot) { + ExecFreePreDetoastDatum(slot); + slot->tts_ops->clear(slot); return slot; @@ -450,6 +494,10 @@ ExecClearTuple(TupleTableSlot *slot) static inline void ExecMaterializeSlot(TupleTableSlot *slot) { + /* + * XXX: pre_detoasted_attrs doesn't dependent on any external storage, so + * nothing should be done here. + */ slot->tts_ops->materialize(slot); } @@ -494,6 +542,25 @@ ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot) dstslot->tts_ops->copyslot(dstslot, srcslot); + if (dstslot->tts_nvalid > 0 && srcslot->tts_nvalid > 0) + { + int attnum = -1; + MemoryContext old = MemoryContextSwitchTo(dstslot->tts_mcxt); + + dstslot->pre_detoasted_attrs = bitset_copy(srcslot->pre_detoasted_attrs); + + while ((attnum = bitset_next_member(dstslot->pre_detoasted_attrs, attnum)) >= 0) + { + struct varlena *datum = (struct varlena *) srcslot->tts_values[attnum]; + Size len; + + Assert(!VARATT_IS_EXTENDED(datum)); + len = VARSIZE(datum); + dstslot->tts_values[attnum] = (Datum) palloc(len); + memcpy((void *) dstslot->tts_values[attnum], datum, len); + } + MemoryContextSwitchTo(old); + } return dstslot; } diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 444a5f0fd5..30fdb37d1c 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -1481,6 +1481,12 @@ typedef struct ScanState Relation ss_currentRelation; struct TableScanDescData *ss_currentScanDesc; TupleTableSlot *ss_ScanTupleSlot; + + /* + * The final attributes which should apply the pre-detoast-attrs logic on + * the Scan nodes. + */ + Bitmapset *scan_pre_detoast_attrs; } ScanState; /* ---------------- @@ -2010,6 +2016,13 @@ typedef struct JoinState bool single_match; /* True if we should skip to next outer tuple * after finding one inner match */ ExprState *joinqual; /* JOIN quals (in addition to ps.qual) */ + + /* + * The final attributes which should apply the pre-detoast-attrs logic on + * the join nodes. + */ + Bitmapset *outer_pre_detoast_attrs; + Bitmapset *inner_pre_detoast_attrs; } JoinState; /* ---------------- @@ -2771,4 +2784,5 @@ typedef struct LimitState TupleTableSlot *last_slot; /* slot for evaluation of ties */ } LimitState; +extern void SetPredetoastAttrsForJoin(JoinState *joinstate); #endif /* EXECNODES_H */ diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index b4ef6bc44c..ea5033aaa0 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -169,6 +169,13 @@ typedef struct Plan */ Bitmapset *extParam; Bitmapset *allParam; + + /* + * A list of Vars which should not apply the shared-detoast-datum logic + * since the upper nodes like Sort/Hash wants them as small as possible. + * It's a subset of targetlist in each Plan node. + */ + List *forbid_pre_detoast_vars; } Plan; /* ---------------- @@ -385,6 +392,16 @@ typedef struct Scan Plan plan; Index scanrelid; /* relid is index into the range table */ + + /* + * Records of var's varattno - 1 where the Var is accessed indirectly by + * any expression, like a > 3. However a IS [NOT] NULL is not included + * since it doesn't access the tts_values[*] at all. + * + * This is a essential information to figure out which attrs should use + * the pre-detoast-attrs logic. + */ + Bitmapset *reference_attrs; } Scan; /* ---------------- @@ -789,6 +806,17 @@ typedef struct Join JoinType jointype; bool inner_unique; List *joinqual; /* JOIN quals (in addition to plan.qual) */ + + /* + * Records of var's varattno - 1 where the Var is accessed indirectly by + * any expression, like a > 3. However a IS [NOT] NULL is not included + * since it doesn't access the tts_values[*] at all. + * + * This is a essential information to figure out which attrs should use + * the pre-detoast-attrs logic. + */ + Bitmapset *outer_reference_attrs; + Bitmapset *inner_reference_attrs; } Join; /* ---------------- @@ -869,6 +897,11 @@ typedef struct HashJoin * perform lookups in the hashtable over the inner plan. */ List *hashkeys; + + /* + * Whether the left plan tree should use a SMALL_TLIST. + */ + bool left_small_tlist; } HashJoin; /* ---------------- @@ -1588,4 +1621,24 @@ typedef enum MonotonicFunction MONOTONICFUNC_BOTH = MONOTONICFUNC_INCREASING | MONOTONICFUNC_DECREASING, } MonotonicFunction; +static inline bool +is_join_plan(Plan *plan) +{ + return (plan != NULL) && (IsA(plan, NestLoop) || IsA(plan, HashJoin) || IsA(plan, MergeJoin)); +} + +static inline bool +is_scan_plan(Plan *plan) +{ + return (plan != NULL) && + (IsA(plan, SeqScan) || + IsA(plan, SampleScan) || + IsA(plan, IndexScan) || + IsA(plan, IndexOnlyScan) || + IsA(plan, BitmapIndexScan) || + IsA(plan, BitmapHeapScan) || + IsA(plan, TidScan) || + IsA(plan, SubqueryScan)); +} + #endif /* PLANNODES_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index dcba329ee4..308f19d61a 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -4047,6 +4047,8 @@ cb_cleanup_dir cb_options cb_tablespace cb_tablespace_mapping +intermediate_var_ref_context +intermediate_level_context manifest_data manifest_writer rfile -- 2.34.1