Re: Internal key management system
Hello Masahiko-san, I've started a new separate thread from the previous long thread[1] for internal key management system to PostgreSQL. As I mentioned in the previous mail[2], we've decided to step back and focus on only internal key management system for PG13. The internal key management system introduces the functionality to PostgreSQL that allows user to encrypt and decrypt data without knowing the actual key. Besides, it will be able to be integrated with transparent data encryption in the future. The basic idea is that PostgreSQL generates the master encryption key which is further protected by the user-provided passphrase. The key management system provides two functions to wrap and unwrap the secret by the master encryption key. A user generates a secret key locally In understand that the secret key is sent in the clear for being encrypted by a master key. and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save it somewhere. Then the user can use the encrypted secret key to encrypt data and decrypt data by something like: INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('x')); SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('x')) FROM tbl; Where 'x' is the result of pg_kmgr_wrap function. I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what prevent users to: SELECT pg_kmgr_unwrap(''); so as to recover the key, somehow in contradiction to "allows user to encrypt and decrypt data without knowing the actual key". When dealing with cryptography and key management, I can only recommand extreme caution. -- Fabien.
Re: psql - add SHOW_ALL_RESULTS option
Hello Tomas, This patch was marked as ready for committer, but clearly there's an ongoin discussion about what should be the default behavoir, if this breaks existing apps etc. So I've marked it as "needs review" and moved it to the next CF. The issue is that root (aka Tom) seems to be against the feature, and would like the keep it as current. Although my opinion is that the previous behavior is close to insane, I'm ready to resurect the guc to control the behavior so that it would be possible, or even the default. Right now I'm waiting for a "I will not veto it on principle" from Tom (I'm okay with a reject based on bad implementation) before spending more time on it: Although my time is given for free, it is not a good reason to send it down the drain if there is a reject coming whatever I do. Tom, would you consider the feature acceptable with a guc to control it? -- Fabien.
Re: PATCH: add support for IN and @> in functional-dependency statistics use
On Saturday, February 1, 2020 3:24:46 PM CET Tomas Vondra wrote: > On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote: > >Hello > > > >At my current job, we have a lot of multi-tenant databases, thus with > >tables containing a tenant_id column. Such a column introduces a severe > >bias in statistics estimation since any other FK in the next columns is > >very likely to have a functional dependency on the tenant id. We found > >several queries where this functional dependency messed up the estimations > >so much the optimizer chose wrong plans. > >When we tried to use extended statistics with CREATE STATISTIC on > >tenant_id, other_id, we noticed that the current implementation for > >detecting functional dependency lacks two features (at least in our use > >case): > >- support for IN clauses > >- support for the array contains operator (that could be considered as a > >special case of IN) > > Thanks for the patch. I don't think the lack of support for these clause > types is an oversight - we haven't done them because we were not quite > sure the functional dependencies can actually apply to them. But maybe > we can support them, I'm not against that in principle. > > >After digging in the source code, I think the lack of support for IN > >clauses is an oversight and due to the fact that IN clauses are > >ScalarArrayOpExpr instead of OpExpr. The attached patch fixes this by > >simply copying the code- path for OpExpr and changing the type name. It > >compiles and the results are correct, with a dependency being correctly > >taken into consideration when estimating rows. If you think such a copy > >paste is bad and should be factored in another static bool function, > >please say so and I will happily provide an updated patch. > > Hmmm. Consider a query like this: > >... WHERE tenant_id = 1 AND another_column IN (2,3) > > which kinda contradicts the idea of functional dependencies that knowing > a value in one column, tells us something about a value in a second > column. But that assumes a single value, which is not quite true here. > > The above query is essentially the same thing as > >... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3)) > > and also > >... WHERE (tenant_id=1 AND another_column=2) > OR (tenant_id=1 AND another_column=3) > > at wchich point we could apply functional dependencies - but we'd do it > once for each AND-clause, and then combine the results to compute > selectivity for the OR clause. > > But this means that if we apply functional dependencies directly to the > original clause, it'll be inconsistent. Which seems a bit unfortunate. > > Or do I get this wrong? In my tests, I've got a table with two columns a and b, generated this way: CREATE TABLE test (a INT, b INT) AS SELECT i, i/10 FROM generate_series(1, 10) s(i), generate_series(1, 5) x; With statistics defined on the a, b columns Here are the estimated selectivity results without any patch: SELECT * FROM test WHERE a = 1 AND b = 1 : 5 SELECT * FROM test WHERE a = 1 AND (b = 1 OR b = 2) : 1 SELECT * FROM test WHERE (a = 1 AND b = 1) OR (a = 1 AND b = 2) : 1 SELECT * FROM test WHERE a = 1 AND b IN (1, 2) : 1 With the patch, the estimated rows of the last query goes back to 5, which is more logical. The other ones don't change. > BTW the code added in the 0001 patch is the same as for is_opclause, so > maybe we can simply do > > if (is_opclause(rinfo->clause) || > IsA(rinfo->clause, ScalarArrayOpExpr)) > { > ... > } > > instead of just duplicating the code. I would love doing that, but the ScalarArrayOpExpr and OpExpr are not binary compatible for the members used here. In ScalarArrayOpExpr, on AMD64, args is at offset 24 and opno at 4, while they are at 32 and 4 in OpExpr. I can work around with this kind of code, but I don't like it much: List *args; Oid opno; if (IsA(rinfo->clause, OpExpr)) { /* If it's an opclause, we will check for Var = Const or Const = Var. */ OpExpr *expr = (OpExpr *) rinfo->clause; args = expr->args; opno = expr->opno; } else if (IsA(rinfo->clause, ScalarArrayOpExpr)) { /* If it's a ScalarArrayOpExpr, check for Var IN Const. */ ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) rinfo->clause; args = expr->args; opno = expr->opno; } Or I can rewrite it in C++ to play with templates... :) > We also need some at least some > regression tests, testing functional dependencies with this clause type. Agreed > >The lack of support for @> operator, on the other hand, seems to be a > >decision taken when writing the initial code, but I can not find any > >mathematical nor technical reason for it. The current code restricts > >dependency calculation to the = operator, obviously because inequality > >operators are not going to work... but array contains is just several = > >operators grouped, thus the same for the
Re: BUG #16171: Potential malformed JSON in explain output
> On 1 Feb 2020, at 20:37, Tom Lane wrote: > > Hamid Akhtar writes: >> I've reviewed and verified this patch and IMHO, this is ready to be >> committed. > > I took a look at this and I don't think it's really going in the right > direction. ISTM the clear intent of this code was to attach the "Subplans > Removed" item as a field of the parent [Merge]Append node, but the author > forgot about the intermediate "Plans" array. So I think that, rather than > doubling down on a mistake, we ought to move where the field is generated > so that it *is* a field of the parent node. Right, that makes sense; +1 on the attached 0001 patch. > This does lead to some field > order rearrangement in text mode, as per the regression test changes, > but I think that's not a big deal. (A change can only happen if there > are initplan(s) attached to the parent node.) Does that prevent backpatching this, or are we Ok with EXPLAIN text output not being stable across minors? AFAICT Pg::Explain still works fine with this change, but mileage may vary for other parsers. > 0002 attached isn't committable, because nobody would want the overhead > in production, but it seems like a good trick to keep up our sleeves. Thats a neat trick, I wonder if it would be worth maintaining a curated list of these tricks in a README under src/test to help others avoid/reduce wheel reinventing? cheers ./daniel
ALTER tbl rewrite loses CLUSTER ON index
Other options are preserved by ALTER (and CLUSTER ON is and most obviously should be preserved by CLUSTER's rewrite), so I think (SET) CLUSTER should be preserved by ALTER, too. As far as I can see, this should be the responsibility of something in the vicinity of ATPostAlterTypeParse/RememberIndexForRebuilding. Attach patch sketches a fix. ts=# SET client_min_messages=debug; DROP TABLE t; CREATE TABLE t(i int); CREATE INDEX ON t(i)WITH(fillfactor=11, vacuum_cleanup_index_scale_factor=12); CLUSTER t USING t_i_key; ALTER TABLE t ALTER i TYPE bigint; \d t SET DEBUG: drop auto-cascades to type t DEBUG: drop auto-cascades to type t[] DEBUG: drop auto-cascades to index t_i_idx DROP TABLE CREATE TABLE DEBUG: building index "t_i_idx" on table "t" serially CREATE INDEX ERROR: index "t_i_key" for table "t" does not exist DEBUG: rewriting table "t" DEBUG: building index "t_i_idx" on table "t" serially DEBUG: drop auto-cascades to type pg_temp_3091172777 DEBUG: drop auto-cascades to type pg_temp_3091172777[] ALTER TABLE Table "public.t" Column | Type | Collation | Nullable | Default ++---+--+- i | bigint | | | Indexes: "t_i_idx" btree (i) WITH (fillfactor='11', vacuum_cleanup_index_scale_factor='12') >From f235a691722a464059358cd6b1d744f75d7bf92f Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 2 Feb 2020 09:49:57 -0600 Subject: [PATCH v1] preserve CLUSTER ON during ALTER TABLE --- src/backend/commands/tablecmds.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index f599393..c4e6cbd 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -11616,6 +11616,7 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab) } else { + Relation indrel; /* OK, capture the index's existing definition string */ char *defstring = pg_get_indexdef_string(indoid); @@ -11623,6 +11624,18 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab) indoid); tab->changedIndexDefs = lappend(tab->changedIndexDefs, defstring); + /* Preserve CLUSTER ON if set */ + indrel = index_open(indoid, AccessShareLock); + if (indrel->rd_index->indisclustered) { +char buf[3*NAMEDATALEN + 24]; +sprintf(buf, "ALTER TABLE %s CLUSTER ON %s", + get_rel_name(tab->relid), get_rel_name(indoid)); // XXX: schema +tab->changedIndexOids = lappend_oid(tab->changedIndexOids, + indoid); +tab->changedIndexDefs = lappend(tab->changedIndexDefs, +pstrdup(buf)); + } + index_close(indrel, NoLock); } } } @@ -11901,6 +11914,11 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, * the new table definition. */ } +else if (cmd->subtype == AT_ClusterOn) +{ + tab->subcmds[AT_PASS_OLD_INDEX] = + lappend(tab->subcmds[AT_PASS_OLD_INDEX], cmd); +} else elog(ERROR, "unexpected statement subtype: %d", (int) cmd->subtype); -- 2.7.4
Re: BUG #16171: Potential malformed JSON in explain output
[ cc'ing depesz to see what he thinks about this ] Daniel Gustafsson writes: > On 1 Feb 2020, at 20:37, Tom Lane wrote: >> This does lead to some field >> order rearrangement in text mode, as per the regression test changes, >> but I think that's not a big deal. (A change can only happen if there >> are initplan(s) attached to the parent node.) > Does that prevent backpatching this, or are we Ok with EXPLAIN text output not > being stable across minors? AFAICT Pg::Explain still works fine with this > change, but mileage may vary for other parsers. I'm not sure about that either. It should be a clear win for parsers of the non-text formats, because now we're generating valid JSON-or-whatever where we were not before. But it's not too hard to imagine that someone's ad-hoc parser of text output would break, depending on how much it relies on field order rather than indentation to make sense of things. In the background of all this is that cases where it matters must be pretty thin on the ground so far, else we'd have gotten complaints sooner. So we shouldn't really assume that everyone's parser handles such cases at all yet. I'm a little bit inclined to back-patch, on the grounds that JSON output is hopelessly broken without this, and any text-mode parsers that need work would need work by September anyway. But I could easily be argued into not back-patching. Another approach we could consider is putting your patch in the back branches and mine in HEAD. I'm not sure that's a good idea; it trades short-term stability of the text format for a long-term mess in the non-text formats. But maybe somebody will argue for it. Thoughts? regards, tom lane
TestLib condition for deleting temporary directories
Forking thread "logical decoding : exceeded maxAllocatedDescs for .spill files" for this side issue: On Wed, Jan 08, 2020 at 09:37:04PM -0800, Noah Misch wrote: > v10 > deletes PostgresNode base directories at the end of this test file, despite > the failure[1]. > [1] It has the all_tests_passing() logic in an attempt to stop this. I'm > guessing it didn't help because the file failed by calling die "connection > error: ...", not by reporting a failure to Test::More via ok(0) or similar. That is what happened. We should test the exit status to decide whether to keep temporaries, as attached. PostgresNode does that, since commit 90627cf (thread https://postgr.es/m/flat/6205.1492883490%40sss.pgh.pa.us). That thread already discussed $SUBJECT[1] and the __DIE__ handler being redundant[2]. I plan to back-patch, since it's most useful for v10 and v9.6. [1] https://postgr.es/m/CAMsr+YFyFU=+mvfzqhthfmw22x5-h517e6ck6et+dt+x4bu...@mail.gmail.com [2] https://postgr.es/m/fea925b2-c3ae-4ba9-9194-5f5616ad0...@yesql.se Author: Noah Misch Commit: Noah Misch When a TAP file has non-zero exit status, retain temporary directories. PostgresNode already retained base directories in such cases. Stop using $SIG{__DIE__}, which is redundant with the exit status check, in lieu of proliferating it to TestLib. Back-patch to 9.6, where commit 88802e068017bee8cea7a5502a712794e761c7b5 introduced retention on failure. Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 2e0cf4a..ff972d1 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -1206,20 +1206,6 @@ sub can_bind return $ret; } -# Retain the errno on die() if set, else assume a generic errno of 1. -# This will instruct the END handler on how to handle artifacts left -# behind from tests. -$SIG{__DIE__} = sub { - if ($!) - { - $died = $!; - } - else - { - $died = 1; - } -}; - # Automatically shut down any still-running nodes when the test script exits. # Note that this just stops the postmasters (in the same order the nodes were # created in). Any temporary directories are deleted, in an unspecified @@ -1238,8 +1224,7 @@ END next if defined $ENV{'PG_TEST_NOCLEAN'}; # clean basedir on clean test invocation - $node->clean_node - if TestLib::all_tests_passing() && !defined $died && !$exit_code; + $node->clean_node if $exit_code == 0 && TestLib::all_tests_passing(); } $? = $exit_code; diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index 458b801..65ee042 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -183,8 +183,13 @@ INIT END { - # Preserve temporary directory for this test on failure - $File::Temp::KEEP_ALL = 1 unless all_tests_passing(); + # Test files have several ways of causing prove_check to fail: + # 1. Exit with a non-zero status. + # 2. Call ok(0) or similar, indicating that a constituent test failed. + # 3. Deviate from the planned number of tests. + # + # Preserve temporary directories after (1) and after (2). + $File::Temp::KEEP_ALL = 1 unless $? == 0 && all_tests_passing(); } =pod
Re: TestLib condition for deleting temporary directories
> On 2 Feb 2020, at 18:01, Noah Misch wrote: > > Forking thread "logical decoding : exceeded maxAllocatedDescs for .spill > files" for this side issue: Thanks, I hadn't seen this. > On Wed, Jan 08, 2020 at 09:37:04PM -0800, Noah Misch wrote: >> v10 >> deletes PostgresNode base directories at the end of this test file, despite >> the failure[1]. > >> [1] It has the all_tests_passing() logic in an attempt to stop this. I'm >> guessing it didn't help because the file failed by calling die "connection >> error: ...", not by reporting a failure to Test::More via ok(0) or similar. > > That is what happened. We should test the exit status to decide whether to > keep temporaries, as attached. PostgresNode does that, since commit 90627cf > (thread https://postgr.es/m/flat/6205.1492883490%40sss.pgh.pa.us). That > thread already discussed $SUBJECT[1] and the __DIE__ handler being > redundant[2]. I plan to back-patch, since it's most useful for v10 and v9.6. I'm travelling and haven't been able to test, but this makes sense from reading. +1 on backpatching. cheers ./daniel
[PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables
Hello all, Like the title says, using "gin_fuzzy_search_limit" degrades speed when it has a relatively low setting. What do I mean by "relatively low"? An example would be when a table with a GIN index has many millions of rows and a particular keyword search has 1,000,000 possible results because the keyword is very common (or it's just that the table is so supremely large that even a somewhat common keyword appears enough to return one million results). However, you only want to return around 100 random results from that one million, so you set gin_fuzzy_search_limit to 100. That limit is relatively low when you look at the ratio of the limit value to the possible results: 100 / 1,000,000 = 0.0001. You'll find the query is very slow for such a low ratio. It isn't so slow if gin_fuzzy_search_limit is 100 but the keyword search has only a total of 10,000 possible results (resulting in a higher ratio of 0.1). This would explain why in the documentation it is said that "From experience, values in the thousands (e.g., 5000 — 2) work well". It's not so common to have queries that return large enough result sets such that gin_fuzzy_search_limit values between 5,000 and 20,000 would result in low ratios and so result in the performance issue I've observed (these gin_fuzzy_search_limit values have relatively high ratios between 0.005 and 0.02 if you have 1,000,000 results for a keyword search). However, if you desire a lower gin_fuzzy_search_limit such as 100, while also having a relatively larger table, you'll find this slowness issue. I discussed this issue more and the reason for it in my original bug report: https://www.postgresql.org/message-id/16220-1a0a4f0cb67ca...@postgresql.org Attached is SQL to test and observe this issue and also attached is a patch I want to eventually submit to a commitfest. Best regards, Adé gin_fuzzy_search_limit_test.sql Description: Binary data ginget.patch Description: Binary data
Re: PATCH: add support for IN and @> in functional-dependency statistics use
On Sun, Feb 02, 2020 at 10:59:32AM +0100, Pierre Ducroquet wrote: On Saturday, February 1, 2020 3:24:46 PM CET Tomas Vondra wrote: On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote: >Hello > >At my current job, we have a lot of multi-tenant databases, thus with >tables containing a tenant_id column. Such a column introduces a severe >bias in statistics estimation since any other FK in the next columns is >very likely to have a functional dependency on the tenant id. We found >several queries where this functional dependency messed up the estimations >so much the optimizer chose wrong plans. >When we tried to use extended statistics with CREATE STATISTIC on >tenant_id, other_id, we noticed that the current implementation for >detecting functional dependency lacks two features (at least in our use >case): >- support for IN clauses >- support for the array contains operator (that could be considered as a >special case of IN) Thanks for the patch. I don't think the lack of support for these clause types is an oversight - we haven't done them because we were not quite sure the functional dependencies can actually apply to them. But maybe we can support them, I'm not against that in principle. >After digging in the source code, I think the lack of support for IN >clauses is an oversight and due to the fact that IN clauses are >ScalarArrayOpExpr instead of OpExpr. The attached patch fixes this by >simply copying the code- path for OpExpr and changing the type name. It >compiles and the results are correct, with a dependency being correctly >taken into consideration when estimating rows. If you think such a copy >paste is bad and should be factored in another static bool function, >please say so and I will happily provide an updated patch. Hmmm. Consider a query like this: ... WHERE tenant_id = 1 AND another_column IN (2,3) which kinda contradicts the idea of functional dependencies that knowing a value in one column, tells us something about a value in a second column. But that assumes a single value, which is not quite true here. The above query is essentially the same thing as ... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3)) and also ... WHERE (tenant_id=1 AND another_column=2) OR (tenant_id=1 AND another_column=3) at wchich point we could apply functional dependencies - but we'd do it once for each AND-clause, and then combine the results to compute selectivity for the OR clause. But this means that if we apply functional dependencies directly to the original clause, it'll be inconsistent. Which seems a bit unfortunate. Or do I get this wrong? In my tests, I've got a table with two columns a and b, generated this way: CREATE TABLE test (a INT, b INT) AS SELECT i, i/10 FROM generate_series(1, 10) s(i), generate_series(1, 5) x; With statistics defined on the a, b columns Here are the estimated selectivity results without any patch: SELECT * FROM test WHERE a = 1 AND b = 1 : 5 SELECT * FROM test WHERE a = 1 AND (b = 1 OR b = 2) : 1 SELECT * FROM test WHERE (a = 1 AND b = 1) OR (a = 1 AND b = 2) : 1 SELECT * FROM test WHERE a = 1 AND b IN (1, 2) : 1 With the patch, the estimated rows of the last query goes back to 5, which is more logical. The other ones don't change. Yes, I think you're right. I've been playing with this a bit more, and I think you're right we can support it the way you propose. I'm still a bit annoyed by the inconsistency this might/does introduce. Consider for example these clauses: a) ... WHERE a = 10 AND b IN (100, 200) b) ... WHERE a = 10 AND (b = 100 OR b = 200) c) ... WHERE (a = 10 AND b = 100) OR (a = 10 AND b = 200) All three cases are logically equivalent and do return the same set of rows. But we estimate them differently, arriving at different estimates. Case (a) is the one you improve in your patch. Case (c) is actually not possible in practice, because we rewrite it as (b) during planning. But (b) is estimated very differently, because we don't recognize the OR clause as supported by functional dependencies. On the one hand I'm sure it's not the first case where we already estimate equivalent clauses differently. On the other hand I wonder how difficult would it be to support this specific type of OR clause (with all expressions having the form "Var = Const" and all Vars referencing the same rel). I'm not going to block the patch because of this, of course. Similarly, it'd be nice to add support for ScalarArrayOpExpr to MCV stats, not just functional dependencies ... BTW the code added in the 0001 patch is the same as for is_opclause, so maybe we can simply do if (is_opclause(rinfo->clause) || IsA(rinfo->clause, ScalarArrayOpExpr)) { ... } instead of just duplicating the code. I would love doing that, but the ScalarArrayOpExpr and OpExpr are not binary compatible for the members used here. In ScalarArrayOpExpr, on AM
Re: Prevent pg_basebackup running as root
On 2020/02/01 18:34, Michael Paquier wrote: On Thu, Jan 30, 2020 at 04:00:40PM +0900, Michael Paquier wrote: Anyway, your patch looks like a good idea to me, so let's see if others have opinions or objections about it. Seeing nothing, committed v2. Thanks! Regards Ian Barwick -- Ian Barwick https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: table partitioning and access privileges
On Fri, Jan 31, 2020 at 9:39 PM Fujii Masao wrote: > On 2020/01/31 13:38, Amit Langote wrote: > > On Fri, Jan 31, 2020 at 1:28 AM Fujii Masao > > wrote: > >> Fair enough. I finally did back-patch because the behavior is clearly > >> documented and I failed to hear the opinions to object the back-patch. > >> But I should have heard and discussed such risks more. > >> > >> I'm OK to revert all those back-patch. Instead, probably the document > >> should be updated in old branches. > > > > I could find only this paragraph in the section on inheritance that > > talks about how access permissions work: > > > > 9.4: > > > > "Note how table access permissions are handled. Querying a parent > > table can automatically access data in child tables without further > > access privilege checking. This preserves the appearance that the data > > is (also) in the parent table. Accessing the child tables directly is, > > however, not automatically allowed and would require further > > privileges to be granted." > > > > 9.5-12: > > > > "Inherited queries perform access permission checks on the parent > > table only. Thus, for example, granting UPDATE permission on the > > cities table implies permission to update rows in the capitals table > > as well, when they are accessed through cities. This preserves the > > appearance that the data is (also) in the parent table. But the > > capitals table could not be updated directly without an additional > > grant. In a similar way, the parent table's row security policies (see > > Section 5.7) are applied to rows coming from child tables during an > > inherited query. A child table's policies, if any, are applied only > > when it is the table explicitly named in the query; and in that case, > > any policies attached to its parent(s) are ignored." > > > > Do you mean that the TRUNCATE exception should be noted here? > > Yes, that's what I was thinking. Okay. How about the attached? Maybe, we should also note the LOCK TABLE exception? Regards, Amit diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 8d2908c34d..d274d048ec 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2323,7 +2323,10 @@ VALUES ('New York', NULL, NULL, 'NY'); access privilege checking. This preserves the appearance that the data is (also) in the parent table. Accessing the child tables directly is, however, not automatically allowed and would require - further privileges to be granted. + further privileges to be granted. One exception to this rule is + TRUNCATE command, where permissions on the child tables + are always checked, whether they are truncated directly or recursively via + TRUNCATE performed on the parent table. diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 6ab37e7354..cf7b4fd891 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2762,7 +2762,10 @@ VALUES ('Albany', NULL, NULL, 'NY'); rows coming from child tables during an inherited query. A child table's policies, if any, are applied only when it is the table explicitly named in the query; and in that case, any policies attached to its parent(s) are - ignored. + ignored. One exception to this rule is TRUNCATE command, + where permissions on the child tables are always checked, whether they are + truncated directly or recursively via TRUNCATE performed + on the parent table.
Re: Portal->commandTag as an enum
Mark Dilger writes: > I put the CommandTag enum in src/common because there wasn’t any reason > not to do so. It seems plausible that frontend test frameworks might want > access to this enum. Au contraire, that's an absolutely fundamental mistake. There is zero chance of this enum holding still across PG versions, so if we expose it to frontend code, we're going to have big problems with cross-version compatibility. See our historical problems with assuming the enum for character set encodings was the same between frontend and backend ... and that set is orders of magnitude more stable than this one. As I recall, the hardest problem with de-string-ifying this is the fact that for certain tags we include a rowcount in the string. I'd like to see that undone --- we have to keep it like that on-the-wire to avoid a protocol break, but it'd be best if noplace inside the backend did it that way, and we converted at the last moment before sending a CommandComplete to the client. Your references to "completion tags" make it sound like you've only half done the conversion (though I've not read the patch in enough detail to verify). BTW, the size of the patch is rather depressing, especially if it's only half done. Unlike Andres, I'm not even a little bit convinced that this is worth the amount of code churn it'll cause. Have you done any code-size or speed comparisons? regards, tom lane
Re: Internal key management system
Hi; So I actually have tried to do carefully encrypted data in Postgres via pg_crypto. I think the key management problems in PostgreSQL are separable from table-level encryption. In particular the largest problem right now with having encrypted attributes is accidental key disclosure. I think if we solve key management in a way that works for encrypted attributes first, we can then add encrypted tables later. Additionally big headaches come with key rotation. So here are my thoughts here. This is a fairly big topic. And I am not sure it can be done incrementally as much as that seems to doom big things in the community, but I think it could be done with a major push by a combination of big players, such as Second Quadrant. On Sun, Feb 2, 2020 at 3:02 AM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > Hi, > > I've started a new separate thread from the previous long thread[1] > for internal key management system to PostgreSQL. As I mentioned in > the previous mail[2], we've decided to step back and focus on only > internal key management system for PG13. The internal key management > system introduces the functionality to PostgreSQL that allows user to > encrypt and decrypt data without knowing the actual key. Besides, it > will be able to be integrated with transparent data encryption in the > future. > > The basic idea is that PostgreSQL generates the master encryption key > which is further protected by the user-provided passphrase. The key > management system provides two functions to wrap and unwrap the secret > by the master encryption key. A user generates a secret key locally > and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save > it somewhere. Then the user can use the encrypted secret key to > encrypt data and decrypt data by something like: > So my understanding is that you would then need something like: 1. Symmetric keys for actual data storage. These could never be stored in the clear. 2. User public/private keys to use to access data storage keys. The private key would need to be encrypted with passphrases. And the server needs to access the private key. 3. Symmetric secret keys to encrypt private keys 4. A key management public/private key pair used to exchange the password for the private key. > > INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('x')); > SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('x')) FROM tbl; > If you get anything wrong you risk logs being useful to break tne encryption keys and make data access easy. You don't want pg_kmgr_unwrap('') in your logs. Here what I would suggest is a protocol extension to do the key exchange. In other words, protocol messages to: 1. Request data exchange server public key. 2. Send server public-key encrypted symmetric key. Make sure it is properly padded etc. These are safe still only over SSL with sslmode=full_verify since otherwise you might be vulnerable to an MITM attack. Then the keys should be stored in something like CacheMemoryContext and pg_encrypt()/pg_decrypt() would have access to them along with appropriate catalogs needed to get to the storage keys themselves. > > Where 'x' is the result of pg_kmgr_wrap function. > > That way we can get something encrypted and decrypted without ever > knowing the actual key that was used to encrypt it. > > I'm currently updating the patch and will submit it. > The above though is only a small part of the problem. What we also need are a variety of new DDL commands specifically for key management. This is needed because without commands of this sort, we cannot make absolutely sure that the commands are never logged. These commands MUST not have keys logged and therefore must have keys stripped prior to logging. If I were designing this: 1. Users on an SSL connection would be able to: CREATE ENCRYPTION USER KEY PAIR WITH PASSWORD 'xyz' which would automatically rotate keys. 2. Superusers could: ALTER SYSTEM ROTATE ENCRYPTION EXCHANGE KEY PAIR; 3. Add an ENCRYPTED attribute to columns and disallow indexing of ENCRYPTED columns. This would store keys for the columns encrypted with user public keys where they have access. 4. Allow superusers to ALTER TABLE foo ALTER encrypted_column ROTATE KEYS; which would naturally require a full table rewrite. Now, what that proposal does not provide is the use of encryption to enforce finer-grained access such as per-row keys but that's another topic and maybe something we don't need. However I hope that explains what I see as a version of a minimum viable infrastructure here. > > On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni wrote: > > > > On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada > > wrote: > > > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni > wrote: > > > > That > > > > would allow the internal usage to have a fixed output length of > > > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes. > > > > > > Probably you meant LEN(DAT
Re: Autovacuum on partitioned table
On Sun, Feb 2, 2020 at 12:53 PM Masahiko Sawada wrote: > On Wed, 29 Jan 2020 at 17:56, Amit Langote wrote: > > On Wed, Jan 29, 2020 at 11:29 AM yuzuko wrote: > > > > How are you going to track changes_since_analyze of partitioned table? > > > > It's just an idea but we can accumulate changes_since_analyze of > > > > partitioned table by adding child tables's value after analyzing each > > > > child table. And compare the partitioned tables value to the threshold > > > > that is computed by (autovacuum_analyze_threshold + total rows > > > > including all child tables * autovacuum_analyze_scale_factor). > > > > > > > The idea Sawada-san mentioned is similar to mine. > > > > So if I understand this idea correctly, a partitioned table's analyze > > will only be triggered when partitions are analyzed. That is, > > inserts, updates, deletes of tuples in partitions will be tracked by > > pgstat, which in turn is used by autovacuum to trigger analyze on > > partitions. Then, partitions changes_since_analyze is added into the > > parent's changes_since_analyze, which in turn *may* trigger analyze > > parent. I said "may", because it would take multiple partition > > analyzes to accumulate enough changes to trigger one on the parent. > > Am I getting that right? > > Yeah that is what I meant. In addition, adding partition's > changes_since_analyze to its parent needs to be done recursively as > the parent table could also be a partitioned table. That's a good point. So, changes_since_analyze increments are essentially propagated from leaf partitions to all the way up to the root table, including any intermediate partitioned tables. We'll need to consider whether we should propagate only one level at a time (from bottom of the tree) or update all parents up to the root, every time a leaf partition is analyzed. If we we do the latter, that might end up triggering analyze on all the parents at the same time causing repeated scanning of the same child tables in close intervals, although setting analyze threshold and scale factor of the parent tables of respective levels wisely can help avoid any negative impact of that. Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On 2020/02/02 14:59, Masahiko Sawada wrote: On Fri, 31 Jan 2020 at 02:29, Fujii Masao wrote: On 2020/01/30 12:58, Kyotaro Horiguchi wrote: At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao wrote in Hi, pg_basebackup reports the backup progress if --progress option is specified, and we can monitor it in the client side. I think that it's useful if we can monitor the progress information also in the server side because, for example, we can easily check that by using SQL. So I'd like to propose pg_stat_progress_basebackup view that allows us to monitor the progress of pg_basebackup, in the server side. Thought? POC patch is attached. | postgres=# \d pg_stat_progress_basebackup | View "pg_catalog.pg_stat_progress_basebackup" |Column| Type | Collation | Nullable | Default | -+-+---+--+- | pid | integer | | | | phase | text| | | | backup_total| bigint | | | | backup_streamed | bigint | | | | tablespace_total| bigint | | | | tablespace_streamed | bigint | | | I think the view needs client identity such like host/port pair besides pid (host/port pair fails identify client in the case of unix-sockets.). I don't think this is job of a progress reporting. If those information is necessary, we can join this view with pg_stat_replication using pid column as the join key. Also elapsed time from session start might be useful. +1 I think that not only pg_stat_progress_basebackup but also all the other progress views should report the time when the target command was started and the time when the phase was last changed. Those times would be useful to estimate the remaining execution time from the progress infromation. pg_stat_progress_acuum has datid, datname and relid. + if (backup_total > 0 && backup_streamed > backup_total) + { + backup_total = backup_streamed; Do we need the condition "backup_total > 0"? I added the condition for the case where --progress option is not supplied in pg_basebackup, i.e., the case where the total amount of backup is not estimated at the beginning of the backup. In this case, total_backup is always 0. + int tblspc_streamed = 0; + + pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE, + PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP); This starts "streaming backup" phase with backup_total = 0. Coudln't we move to that phase after setting backup total and tablespace total? That is, just after calling SendBackupHeader(). OK, that's a bit less confusing for users. I will change in that way. Fixed. Attached is the updated version of the patch. I also fixed the regression test failure. +WHEN 3 THEN 'stopping backup'::text I'm not sure, but the "stop" seems suggesting the backup is terminated before completion. If it is following the name of the function pg_stop_backup, I think the name is suggesting to stop "the state for performing backup", not a backup. In the first place, the progress is about "backup" so it seems strange that we have another phase after the "stopping backup" phase. It might be better that it is "finishing file transfer" or such. "initializing" -> "starting file transfer" -> "transferring files" -> "finishing file transfer" -> "transaferring WAL" Better name is always welcome! If "stopping back" is confusing, what about "performing pg_stop_backup"? So initializing performing pg_start_backup streaming database files performing pg_stop_backup transfering WAL files Another idea I came up with is to show steps that take time instead of pg_start_backup/pg_stop_backup, for better understanding the situation. That is, "performing checkpoint" and "performing WAL archive" etc, which engage the most of the time of these functions. Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds better than "performing WAL archive". Thought? I've not applied this change in the patch yet, but if there is no other idea, I'd like to adopt this. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 8839699079..136fcbc2af 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -376,6 +376,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_basebackuppg_stat_progress_basebackup + One row for each WAL sender process streaming a base backup, + showing current progress. + See . + + + @@ -3515,7 +3523,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, certain
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Fri, Jan 10, 2020 at 10:53 AM Dilip Kumar wrote: > > On Mon, Dec 30, 2019 at 3:43 PM Amit Kapila wrote: > > > > > > > 2. During commit time in DecodeCommit we check whether we need to skip > > > the changes of the transaction or not by calling > > > SnapBuildXactNeedsSkip but since now we support streaming so it's > > > possible that before we decode the commit WAL, we might have already > > > sent the changes to the output plugin even though we could have > > > skipped those changes. So my question is instead of checking at the > > > commit time can't we check before adding to ReorderBuffer itself > > > > > > > I think if we can do that then the same will be true for current code > > irrespective of this patch. I think it is possible that we can't take > > that decision while decoding because we haven't assembled a consistent > > snapshot yet. I think we might be able to do that while we try to > > stream the changes. I think we need to take care of all the > > conditions during streaming (when the logical_decoding_workmem limit > > is reached) as we do in DecodeCommit. This needs a bit more study. > > I have analyzed this further and I think we can not decide all the > conditions even while streaming. Because IMHO once we get the > SNAPBUILD_FULL_SNAPSHOT we can add the changes to the reorder buffer > so that if we get the commit of the transaction after we reach to the > SNAPBUILD_CONSISTENT. However, if we get the commit before we reach > to SNAPBUILD_CONSISTENT then we need to ignore this transaction. Now, > even if we have SNAPBUILD_FULL_SNAPSHOT we can stream the changes > which might get dropped later but that we can not decide while > streaming. > This makes sense to me, but we should add a comment for the same when we are streaming to say we can't skip similar to how we do during commit time because of the above reason described by you. Also, what about other conditions where we can skip the transaction, basically cases like (a) when the transaction happened in another database, (b) when the output plugin is not interested in the origin and (c) when we are doing fast-forwarding -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: table partitioning and access privileges
On 2020/02/03 11:05, Amit Langote wrote: On Fri, Jan 31, 2020 at 9:39 PM Fujii Masao wrote: On 2020/01/31 13:38, Amit Langote wrote: On Fri, Jan 31, 2020 at 1:28 AM Fujii Masao wrote: Fair enough. I finally did back-patch because the behavior is clearly documented and I failed to hear the opinions to object the back-patch. But I should have heard and discussed such risks more. I'm OK to revert all those back-patch. Instead, probably the document should be updated in old branches. I could find only this paragraph in the section on inheritance that talks about how access permissions work: 9.4: "Note how table access permissions are handled. Querying a parent table can automatically access data in child tables without further access privilege checking. This preserves the appearance that the data is (also) in the parent table. Accessing the child tables directly is, however, not automatically allowed and would require further privileges to be granted." 9.5-12: "Inherited queries perform access permission checks on the parent table only. Thus, for example, granting UPDATE permission on the cities table implies permission to update rows in the capitals table as well, when they are accessed through cities. This preserves the appearance that the data is (also) in the parent table. But the capitals table could not be updated directly without an additional grant. In a similar way, the parent table's row security policies (see Section 5.7) are applied to rows coming from child tables during an inherited query. A child table's policies, if any, are applied only when it is the table explicitly named in the query; and in that case, any policies attached to its parent(s) are ignored." Do you mean that the TRUNCATE exception should be noted here? Yes, that's what I was thinking. Okay. How about the attached? Thanks for the patches! You added the note just after the description about row level security on inherited table, but isn't it better to add it before that? Attached patch does that. Thought? Maybe, we should also note the LOCK TABLE exception? Yes. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index d88651df9e..7550d03f27 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3384,7 +3384,15 @@ VALUES ('Albany', NULL, NULL, 'NY'); accessed through cities. This preserves the appearance that the data is (also) in the parent table. But the capitals table could not be updated directly - without an additional grant. In a similar way, the parent table's row + without an additional grant. Two exceptions to this rule are + TRUNCATE and LOCK TABLE, + where permissions on the child tables are always checked, + whether they are processed directly or recursively via those commands + performed on the parent table. + + + + In a similar way, the parent table's row security policies (see ) are applied to rows coming from child tables during an inherited query. A child table's policies, if any, are applied only when it is the table explicitly named
Re: table partitioning and access privileges
On Mon, Feb 3, 2020 at 2:07 PM Fujii Masao wrote: > On 2020/02/03 11:05, Amit Langote wrote: > > Okay. How about the attached? > > Thanks for the patches! You added the note just after the description > about row level security on inherited table, but isn't it better to > add it before that? Attached patch does that. Thought? Yeah, that might be a better flow for that paragraph. > > Maybe, we should also note the LOCK TABLE exception? > > Yes. Note that, unlike TRUNCATE, LOCK TABLE exception exists in HEAD too, but maybe you're aware of that. Thanks, Amit
Re: Proposal: Add more compile-time asserts to expose inconsistencies.
On Fri, Jan 31, 2020 at 02:46:16PM +0900, Michael Paquier wrote: > Actually, thinking more about it, I'd rather remove this part as well, > keeping only the change in the third paragraph of this comment block. I have tweaked a bit the comments in this area, and applied the patch. I'll begin a new thread with the rest of the refactoring. There are a couple of things I'd like to double-check first. -- Michael signature.asc Description: PGP signature
Re: Add %x to PROMPT1 and PROMPT2
On Wed, Jan 29, 2020 at 11:51:10PM +0100, Vik Fearing wrote: > Thanks for the review! > > Would you mind changing the status in the commitfest app? > https://commitfest.postgresql.org/27/2427/ FWIW, I am not really in favor of changing a default old enough that it could vote (a45195a). -- Michael signature.asc Description: PGP signature
Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)
On 2020/02/01 16:05, Kasahara Tatsuhito wrote: On Thu, Jan 30, 2020 at 1:49 PM Kyotaro Horiguchi wrote: At Thu, 30 Jan 2020 13:30:56 +0900, Kasahara Tatsuhito wrote in TID scan : yes , seq_scan, , Here is wrong, because TID scan came to have SO_TYPE_SEQSCAN flags from commit 147e3722f7. It is reflectings the discussion below, which means TID scan doesn't have corresponding SO_TYPE_ value. Currently it is setting SO_TYPE_SEQSCAN by accedent. Ah, sorry I misunderstood.. Upon further investigation, the SO_TYPE_SEQSCAN flag was also used at heap_beginscan() to determine whether a predicate lock was taken on the entire relation. if (scan->rs_base.rs_flags & (SO_TYPE_SEQSCAN | SO_TYPE_SAMPLESCAN)) { /* * Ensure a missing snapshot is noticed reliably, even if the * isolation mode means predicate locking isn't performed (and * therefore the snapshot isn't used here). */ Assert(snapshot); PredicateLockRelation(relation, snapshot); } Therefore, it can not simply remove the SO_TYPE_SEQSCAN flag from a TID scan. To keep the old behavior, I think it would be better to add a new SO_TYPE_TIDSCAN flag and take a predicate lock on the entire relation. But in the old behavior, PredicateLockRelation() was not called in TidScan case because its flag was not SO_TYPE_SEQSCAN. No? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On Mon, Feb 3, 2020 at 1:17 PM Fujii Masao wrote: > On 2020/02/02 14:59, Masahiko Sawada wrote: > > On Fri, 31 Jan 2020 at 02:29, Fujii Masao > > wrote: > >> On 2020/01/30 12:58, Kyotaro Horiguchi wrote: > >>> +WHEN 3 THEN 'stopping backup'::text > >>> > >>> I'm not sure, but the "stop" seems suggesting the backup is terminated > >>> before completion. If it is following the name of the function > >>> pg_stop_backup, I think the name is suggesting to stop "the state for > >>> performing backup", not a backup. > >>> > >>> In the first place, the progress is about "backup" so it seems strange > >>> that we have another phase after the "stopping backup" phase. It > >>> might be better that it is "finishing file transfer" or such. > >>> > >>> "initializing" > >>> -> "starting file transfer" > >>> -> "transferring files" > >>> -> "finishing file transfer" > >>> -> "transaferring WAL" > >> > >> Better name is always welcome! If "stopping back" is confusing, > >> what about "performing pg_stop_backup"? So > >> > >> initializing > >> performing pg_start_backup > >> streaming database files > >> performing pg_stop_backup > >> transfering WAL files > > > > Another idea I came up with is to show steps that take time instead of > > pg_start_backup/pg_stop_backup, for better understanding the > > situation. That is, "performing checkpoint" and "performing WAL > > archive" etc, which engage the most of the time of these functions. > > Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds > better than "performing WAL archive". Thought? > I've not applied this change in the patch yet, but if there is no > other idea, I'd like to adopt this. If we are trying to "pg_stop_backup" in phase name, maybe we should avoid "pg_start_backup"? Then maybe: initializing starting backup / waiting for [ backup start ] checkpoint to finish transferring database files finishing backup / waiting for WAL archiving to finish transferring WAL files ? Some comments on documentation changes in v2 patch: + Amount of data already streamed. "already" may be redundant. For example, in pg_start_progress_vacuum, heap_blks_scanned is described as "...blocks scanned", not "...blocks already scanned". + tablespace_total + tablespace_streamed Better to use plural tablespaces_total and tablespaces_streamed for consistency? + The WAL sender process is currently performing + pg_start_backup and setting up for + making a base backup. How about "taking" instead of "making" in the above sentence? - + I don't see any new text in the documentation patch that uses above xref, so no need to define it? Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On Mon, Feb 3, 2020 at 4:28 PM Amit Langote wrote: > If we are trying to "pg_stop_backup" in phase name, maybe we should > avoid "pg_start_backup"? Then maybe: Sorry, I meant to write: If we are trying to avoid "pg_stop_backup" in phase name, maybe we should avoid "pg_start_backup"? Then maybe: Thanks, Amit
Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)
On Mon, Feb 3, 2020 at 4:22 PM Fujii Masao wrote: > On 2020/02/01 16:05, Kasahara Tatsuhito wrote: > > if (scan->rs_base.rs_flags & (SO_TYPE_SEQSCAN | SO_TYPE_SAMPLESCAN)) > > { > > /* > > * Ensure a missing snapshot is noticed reliably, even if the > > * isolation mode means predicate locking isn't performed (and > > * therefore the snapshot isn't used here). > > */ > > Assert(snapshot); > > PredicateLockRelation(relation, snapshot); > > } > > > > Therefore, it can not simply remove the SO_TYPE_SEQSCAN flag from a TID > > scan. > > To keep the old behavior, I think it would be better to add a new > > SO_TYPE_TIDSCAN flag and take a predicate lock on the entire relation. > > But in the old behavior, PredicateLockRelation() was not called in TidScan > case > because its flag was not SO_TYPE_SEQSCAN. No? No. Tid scan called PredicateLockRelation() both previous and current. In the current (v12 and HEAD), Tid scan has SO_TYPE_SEQSCAN flag so that PredicateLockRelation()is called in Tid scan. In the Previous (- v11), in heap_beginscan_internal(), checks is_bitmapscan flags. If is_bitmapscan is set to false, calls PredicateLockRelation(). (- v11) if (!is_bitmapscan) PredicateLockRelation(relation, snapshot); And in the Tid scan, is_bitmapscan is set to false, so that PredicateLockRelation()is called in Tid scan. Best regards, -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
Re: WIP: Aggregation push-down
legrand legrand wrote: > set enable_agg_pushdown to true; > isn't displayed in EXPLAIN (SETTINGS) syntax. > > > The following modification seems to fix that: > > src/backend/utils/misc/guc.c > > {"enable_agg_pushdown", PGC_USERSET, QUERY_TUNING_METHOD, > gettext_noop("Enables aggregation push-down."), > NULL, > GUC_EXPLAIN --- line Added --- > }, > &enable_agg_pushdown, > false, > NULL, NULL, NULL > }, Thanks. I'll include this change in the next version. -- Antonin Houska Web: https://www.cybertec-postgresql.com