Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication
Hi hackers, I'd like to propose an enhancement to PostgreSQL's logical replication system: Deferred Replica Filtering (DRF). The goal of this feature is to provide more granular control over which rows are replicated by applying publication filters *after* the WAL decoding process, before sending data to subscribers. Currently, PostgreSQL's logical replication filters apply deterministically. Deferred filtering, however, operates after the WAL has been decoded, giving it access to the complete row data and making filtering decisions based on mutable values. Additionally, record columns may be omitted by the filter. This opens up several possibilities for granular control. Consider the following examples: Alice and Bob subscribe to changes on a table with RLS enabled, allowing CRUD operations based on user's IDs. 1. Alice needs to know the timestamp at which Bob updated the table. With DRF, we can omit all columns except for the timestamp. 2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all columns on any deleted row, potentially exposing complete records he shouldn't be authorized to view. DRF can filter these rows out. Deferred replica filtering allows for session-specific, per-row, and per-column filtering - features currently not supported by existing replication filters, enhancing security and data privacy. I look forward to hearing your thoughts! Best, Dean S
Re: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication
Unfortunately, neither column lists nor row filters can provide the level of control I'm proposing. These revised examples might help illustrate the use case for DRF: Alice, Bob, and Eve subscribe to changes on a `friend_requests` table. Row-level security ensures CRUD access based on user IDs. 1. Per-subscriber column control: Bob makes a change to the table. Alice should receive the entire record, while Eve should only receive the timestamp - no other columns. Why DRF is needed: Column lists are static and apply equally to all subscribers, meaning we can't distinguish Alice's subscription from Eve's. 2. Bob DELETEs a row from the table. Alice should see the DELETE event, while Eve should not even be aware of an event. Why DRF is needed: The deterministic nature of row filters makes them unsuitable for per-subscriber filtering based on session data. The goal of DRF is to allow per-subscriber variations in change broadcasts, enabling granular control over what data is sent to each subscriber based on their session context. Best, Dean S On Mon, Mar 17, 2025 at 4:32 AM Amit Kapila wrote: > On Sun, Mar 16, 2025 at 12:59 AM Dean wrote: > > > > I'd like to propose an enhancement to PostgreSQL's logical replication > system: Deferred Replica Filtering (DRF). The goal of this feature is to > provide more granular control over which rows are replicated by applying > publication filters after the WAL decoding process, before sending data to > subscribers. > > > > Currently, PostgreSQL's logical replication filters apply > deterministically. Deferred filtering, however, operates after the WAL has > been decoded, giving it access to the complete row data and making > filtering decisions based on mutable values. Additionally, record columns > may be omitted by the filter. > > > > This opens up several possibilities for granular control. Consider the > following examples: > > Alice and Bob subscribe to changes on a table with RLS enabled, allowing > CRUD operations based on user's IDs. > > 1. Alice needs to know the timestamp at which Bob updated the table. > With DRF, we can omit all columns except for the timestamp. > > 2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all > columns on any deleted row, potentially exposing complete records he > shouldn't be authorized to view. DRF can filter these rows out. > > > > Deferred replica filtering allows for session-specific, per-row, and > per-column filtering - features currently not supported by existing > replication filters, enhancing security and data privacy. > > > > We provide column lists [1] and row filters [2]. Doesn't that suffice > the need, if not, kindly let us know what exactly you need with some > examples. > > [1] - > https://www.postgresql.org/docs/devel/logical-replication-col-lists.html > [2] - > https://www.postgresql.org/docs/devel/logical-replication-row-filter.html > > -- > With Regards, > Amit Kapila. >
Some optimisations for numeric division
Attached are 3 small patches that improve the performance of numeric division. Patch 0002 seems to have the biggest impact, but I think they're all worth including, since they're quite simple changes, with noticeable performance gains. Patch 0001 vectorises the inner loop of div_var_fast(). This loop is almost identical to the inner loop of mul_var(), which was vectorised by commit 8870917623. The thing preventing the div_var_fast() loop from being vectorised is the assignment to div[qi + i], and simply replacing that with div_qi[i] where div_qi = &div[qi], in the same style as mul_var(), fixes that. One difference between this and the mul_var() code is that it is also necessary to add an explicit cast to get the compiler to generate matching output, and give the best results. This is because in mul_var() the compiler recognises that var1digit is actually 16-bit, rather than 32-bit, and so it doesn't need to multiply the high part, but in div_var_fast() it's not obvious to the compiler that qdigit also fits in 16 bits, hence the cast. The actual performance gain is typically quite modest, since div_var_fast() is always only a part of some more complex numeric computation, but it can be seen in cases like raising numbers to negative integer powers, e.g.: CREATE TEMP TABLE num_test(x numeric); SELECT setseed(0); INSERT INTO num_test SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, '')||x)::numeric FROM generate_series(1,100)) FROM generate_series(1,10) g(x); SELECT sum(x^(-2)) FROM num_test; Time: 112.949 ms (HEAD) Time: 98.537 ms (with patch) Patch 0002 simplifies the inner loop of div_var() (the guts of the public-facing numeric division operator) by more closely combining the multiplication and subtraction operations and folding the separate "carry" and "borrow" variables into a single "borrow", as suggested by the old code comment. IMO, this makes the code easier to understand, as well as giving more significant performance gains: CREATE TEMP TABLE div_test(x numeric); SELECT setseed(0); INSERT INTO div_test SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, ''))::numeric + x FROM generate_series(1,50)) FROM generate_series(1,1000) g(x); SELECT sum(x/y) FROM div_test t1(x), div_test t2(y); Time: 1477.340 ms (HEAD) Time: 826.748 ms (with patch) Patch 0003 replaces some uses of div_var_fast() with div_var(). Specifically, when the divisor has just one or two base-NBASE digits, div_var() is faster. This is especially true for 1-digit divisors, due to the "fast path" code in div_var() to handle that. It's also just about true for 2-digit divisors, and it occurs to me that that case could potentially be optimised further with similar fast path code in div_var(), but I haven't tried that yet. One-digit divisors are quite common. For example, in the Taylor series expansions in exp_var() and ln_var(), since the number of Taylor series terms never exceeds a few hundred in practice. Also, exp_var()'s argument reduction step divides by 2^ndiv2, which is roughly 100 times the input, rounded up to a power of two, and valid inputs are less than 6000, so this will always be one or two digits. Testing this with a bunch of random exp() and ln() computations I saw a speed-up of 15-20%, and it reduced the run time of the numeric-big regression test by around 10%, which seems worth having. Regards, Dean From 41732ad9a44dcd12e52d823fb59cb23cce4fe217 Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Sun, 20 Feb 2022 12:45:01 + Subject: [PATCH 1/3] Apply auto-vectorization to the inner loop of div_var_fast(). This loop is basically the same as the inner loop of mul_var(), which was auto-vectorized in commit 8870917623, but the compiler will only consider auto-vectorizing the div_var_fast() loop if the assignment target div[qi + i] is replaced by div_qi[i], where div_qi = &div[qi]. Additionally, since the compiler doesn't know that qdigit is guaranteed to fit in a 16-bit NumericDigit, cast it to NumericDigit before multiplying to make the resulting auto-vectorized code more efficient. --- src/backend/utils/adt/numeric.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index 3208789f75..fc43d2a456 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -8908,13 +8908,22 @@ div_var_fast(const NumericVar *var1, const NumericVar *var2, * which would make the new value simply div[qi] mod vardigits[0]. * The lower-order terms in qdigit can change this result by not * more than about twice INT_MAX/NBASE, so overflow is impossible. + * + * This inner loop is the performance bottleneck for division, so + * code it in the same way as the inner loop of mul_var() so that +
Re: Some optimisations for numeric division
On Wed, 23 Feb 2022 at 20:55, Tom Lane wrote: > > I took a quick look through these (just eyeball, didn't try to verify > your performance statements). Thanks for looking! > I'm +1 on 0001 and 0002, but 0003 feels > a bit ad-hoc. It certainly *looks* weird for the allegedly faster > function to be handing off to the allegedly slower one. I also wonder > if we're leaving anything on the table by not exploiting > div_var_fast's weaker roundoff guarantees in this case. Should we > think about a more thoroughgoing redesign of these functions' APIs? Hmm, I'm not sure what kind of thing you had in mind. One thought that occurred to me was that it's a bit silly that exp_var() and ln_var() have to use a NumericVar for what could just be an int, if we had a div_var_int() function that could divide by an int. Then both div_var() and div_var_fast() could hand off to it for one and two digit divisors. Regards, Dean
Re: Some optimisations for numeric division
On Wed, 23 Feb 2022 at 22:55, Tom Lane wrote: > > Dean Rasheed writes: > > > One thought that occurred to me was that it's a bit silly that > > exp_var() and ln_var() have to use a NumericVar for what could just be > > an int, if we had a div_var_int() function that could divide by an > > int. Then both div_var() and div_var_fast() could hand off to it for > > one and two digit divisors. > > Oooh, that seems like a good idea. > OK, I've replaced the 0003 patch with one that does that instead. The div_var_int() API is slightly different in that it also accepts a divisor weight argument, but the alternative would have been for the callers to have to adjust both the result weight and rscale, which would have been uglier. There's a large block of code in div_var() that needs re-indenting, but I think it would be better to leave that to a later pgindent run. The performance results are quite pleasing. It's slightly faster than the old one-digit div_var() code because it manages to avoid some digit array copying, and for two digit divisors it's much faster: CREATE TEMP TABLE div_test(x numeric, y numeric); SELECT setseed(0); INSERT INTO div_test SELECT (SELECT (((x%9)+1)||string_agg((random()*9)::int::text, ''))::numeric FROM generate_series(1,50)), (SELECT ('1.'||string_agg((random()*9)::int::text, '')||(x%10)||'e3')::numeric FROM generate_series(1,6)) FROM generate_series(1,5000) g(x); select * from div_test limit 10; SELECT sum(t1.x/t2.y) FROM div_test t1, div_test t2; Time: 11600.034 ms (HEAD) Time: 9890.788 ms (with 0002) Time: 6202.851 ms (with 0003) And obviously it'll be a larger relative gain for div_var_fast(), since that was slower to begin with in such cases. This makes me think that it might also be worthwhile to follow this with a similar div_var_int64() function on platforms with 128-bit integers, which could then be used to handle 3- and 4-digit divisors, which are probably quite common in practice. Attached is the updated patch series (0001 and 0002 unchanged). Regards, Dean From 481b7c0044afbe72adff1961624a1bb515791b96 Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Sun, 20 Feb 2022 17:15:49 + Subject: [PATCH 2/3] Simplify the inner loop of numeric division in div_var(). In the standard numeric division algorithm, the inner loop multiplies the divisor by the next quotient digit and subtracts that from the working dividend. As suggested by the original code comment, the separate "carry" and "borrow" variables (from the multiplication and subtraction steps respectively) can be folded together into a single variable. Doing so significantly improves performance, as well as simplifying the code. --- src/backend/utils/adt/numeric.c | 36 ++--- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index fc43d2a456..909f4eed74 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -8605,31 +8605,25 @@ div_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, /* As above, need do nothing more when quotient digit is 0 */ if (qhat > 0) { +NumericDigit *dividend_j = ÷nd[j]; + /* * Multiply the divisor by qhat, and subtract that from the - * working dividend. "carry" tracks the multiplication, - * "borrow" the subtraction (could we fold these together?) + * working dividend. The multiplication and subtraction are + * folded together here, noting that qhat <= NBASE (since it + * might be one too large), and so the intermediate result + * "tmp_result" is in the range [-NBASE^2, NBASE - 1], and + * "borrow" is in the range [0, NBASE]. */ -carry = 0; borrow = 0; for (i = var2ndigits; i >= 0; i--) { - carry += divisor[i] * qhat; - borrow -= carry % NBASE; - carry = carry / NBASE; - borrow += dividend[j + i]; - if (borrow < 0) - { - dividend[j + i] = borrow + NBASE; - borrow = -1; - } - else - { - dividend[j + i] = borrow; - borrow = 0; - } + int tmp_result; + + tmp_result = dividend_j[i] - borrow - divisor[i] * qhat; + borrow = (NBASE - 1 - tmp_result) / NBASE; + dividend_j[i] = tmp_result + borrow * NBASE; } -Assert(carry == 0); /* * If we got a borrow out of the top dividend digit, then @@ -8645,15 +8639,15 @@ div_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, carry = 0; for (i = var2ndigits; i >= 0; i--) { - carry += dividend[j + i] + divisor[i]; + carry += dividend_j[i] + divisor[i]; if (carry >= NBASE) { - dividend[j + i] = carry - NBASE; +
Re: Some optimisations for numeric division
On Fri, 25 Feb 2022 at 10:45, Dean Rasheed wrote: > > Attached is the updated patch series (0001 and 0002 unchanged). > And another update following feedback from the cfbot. Regards, Dean From 41732ad9a44dcd12e52d823fb59cb23cce4fe217 Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Sun, 20 Feb 2022 12:45:01 + Subject: [PATCH 1/3] Apply auto-vectorization to the inner loop of div_var_fast(). This loop is basically the same as the inner loop of mul_var(), which was auto-vectorized in commit 8870917623, but the compiler will only consider auto-vectorizing the div_var_fast() loop if the assignment target div[qi + i] is replaced by div_qi[i], where div_qi = &div[qi]. Additionally, since the compiler doesn't know that qdigit is guaranteed to fit in a 16-bit NumericDigit, cast it to NumericDigit before multiplying to make the resulting auto-vectorized code more efficient. --- src/backend/utils/adt/numeric.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index 3208789f75..fc43d2a456 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -8908,13 +8908,22 @@ div_var_fast(const NumericVar *var1, const NumericVar *var2, * which would make the new value simply div[qi] mod vardigits[0]. * The lower-order terms in qdigit can change this result by not * more than about twice INT_MAX/NBASE, so overflow is impossible. + * + * This inner loop is the performance bottleneck for division, so + * code it in the same way as the inner loop of mul_var() so that + * it can be auto-vectorized. We cast qdigit to NumericDigit + * before multiplying to allow the compiler to generate more + * efficient code (using 16-bit multiplication), which is safe + * since we know that the quotient digit is off by at most one, so + * there is no overflow risk. */ if (qdigit != 0) { int istop = Min(var2ndigits, div_ndigits - qi + 1); +int *div_qi = &div[qi]; for (i = 0; i < istop; i++) - div[qi + i] -= qdigit * var2digits[i]; + div_qi[i] -= ((NumericDigit) qdigit) * var2digits[i]; } } -- 2.26.2 From 481b7c0044afbe72adff1961624a1bb515791b96 Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Sun, 20 Feb 2022 17:15:49 + Subject: [PATCH 2/3] Simplify the inner loop of numeric division in div_var(). In the standard numeric division algorithm, the inner loop multiplies the divisor by the next quotient digit and subtracts that from the working dividend. As suggested by the original code comment, the separate "carry" and "borrow" variables (from the multiplication and subtraction steps respectively) can be folded together into a single variable. Doing so significantly improves performance, as well as simplifying the code. --- src/backend/utils/adt/numeric.c | 36 ++--- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index fc43d2a456..909f4eed74 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -8605,31 +8605,25 @@ div_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, /* As above, need do nothing more when quotient digit is 0 */ if (qhat > 0) { +NumericDigit *dividend_j = ÷nd[j]; + /* * Multiply the divisor by qhat, and subtract that from the - * working dividend. "carry" tracks the multiplication, - * "borrow" the subtraction (could we fold these together?) + * working dividend. The multiplication and subtraction are + * folded together here, noting that qhat <= NBASE (since it + * might be one too large), and so the intermediate result + * "tmp_result" is in the range [-NBASE^2, NBASE - 1], and + * "borrow" is in the range [0, NBASE]. */ -carry = 0; borrow = 0; for (i = var2ndigits; i >= 0; i--) { - carry += divisor[i] * qhat; - borrow -= carry % NBASE; - carry = carry / NBASE; - borrow += dividend[j + i]; - if (borrow < 0) - { - dividend[j + i] = borrow + NBASE; - borrow = -1; - } - else - { - dividend[j + i] = borrow; - borrow = 0; - } + int tmp_result; + + tmp_result = dividend_j[i] - borrow - divisor[i] * qhat; + borrow = (NBASE - 1 - tmp_result) / NBASE; + dividend_j[i] = tmp_result + borrow * NBASE; } -Assert(carry == 0); /* * If we got a borrow out of the top dividend digit, then @@ -8645,15 +8639,15 @@ div_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, carry = 0; for (i = var2ndigits; i >= 0; i--) { - carry += dividend[j + i] + divisor[i]; + carry += dividend_j[i] + divisor[i]; if (carry >= NBASE)
Re: [PATCH] Add reloption for views to enable RLS
On Fri, 18 Feb 2022 at 14:57, Laurenz Albe wrote: > > Here is a new version, with improved documentation and the option renamed > to "check_permissions_owner". I just prefer the shorter form. > Re-reading this thread, I think I preferred the name "security_invoker". The main objection seemed to come from the potential confusion with SECURITY INVOKER/DEFINER functions, but I think that's really a different thing. As long as the documentation for the default behaviour is clear (which I think it was), then it should be easy to explain how a security invoker view behaves differently. Also, there's value in using the same terminology as other databases, because many users will already be familiar with the feature from those databases. Some other review comments: 1). This new comment: + +Be aware that USAGE privileges on schemas containing +the underlying base relations are not checked. + is not entirely accurate. It's more accurate to say that a user creating or replacing a view must have CREATE privileges on the schema containing the view and USAGE privileges on any schemas referred to in the view query, whereas a user using the view only needs USAGE privileges on the schema containing the view. (Note that, for the view creator, USAGE is required on any schema referred to in the query -- e.g., schemas containing functions as well as base relations.) 2). The patch is adding a new field to RangeTblEntry which seems to be unnecessary -- it's set, and copied around, but never read, so it should just be removed. 3). Looking at this change: -setRuleCheckAsUser((Node *) rule->actions, relation->rd_rel->relowner); -setRuleCheckAsUser(rule->qual, relation->rd_rel->relowner); +if (!(relation->rd_rel->relkind == RELKIND_VIEW + && !RelationSubqueryCheckPermsOwner(relation))) +{ +setRuleCheckAsUser((Node *) rule->actions, relation->rd_rel->relowner); +setRuleCheckAsUser(rule->qual, relation->rd_rel->relowner); +} I think it should call setRuleCheckAsUser() in all cases. It might be true that the rule fetched has checkAsUser set to InvalidOid throughout its action and quals, but it seems unwise to rely on that -- better to code defensively and explicitly set it in all cases. 4). In the same code block, I think the new behaviour should be applied to SELECT rules only. The view may have other non-SELECT rules (just as a table may have non-SELECT rules), created using CREATE RULE, but their actions are independent of the view definition. Currently their permissions are checked as the view/table owner, and if anyone wanted to change that, it should be an option on the rule, not the view (just as triggers can be made security definer or invoker, depending on how the trigger function is defined). (Note: I'm not suggesting that anyone actually spend any time adding such an option to rules. Given all the pitfalls associated with rules, I think their use should be discouraged, and no development effort should be expended enhancing them.) 5). In the same function, the block of code that fetches rules and triggers has been moved. I think it would be worth adding a comment to explain why it's now important to extract the reloptions *before* fetching the relation's rules and triggers. 6). The second set of tests added to rowsecurity.sql seem to have nothing to do with RLS, and probably belong in updatable_views.sql, and I think it would be worth adding a few more tests for things like views on top of views. Regards, Dean
Re: Some optimisations for numeric division
On Fri, 25 Feb 2022 at 18:30, Tom Lane wrote: > > Dean Rasheed writes: > > And another update following feedback from the cfbot. > > This patchset LGTM. On my machine there doesn't seem to be any > measurable performance change for the numeric regression test, > but numeric_big gets about 15% faster. > Yes, that matches my observations. Thanks for reviewing and testing. > The only additional thought I have, based on your comments about > 0001, is that maybe in mul_var we should declare var1digit as > being NumericDigit not int. I tried that here and didn't see > any material change in the generated assembly code (using gcc > 8.5.0), so you're right that modern gcc doesn't need any help > there; but I wonder if it could help on other compiler versions. > Yes, that makes sense. Done that way. Regards, Dean
Re: [PATCH] Add reloption for views to enable RLS
On Tue, 1 Mar 2022 at 16:40, Christoph Heiss wrote: > > That is also the main reason I preferred naming it "security_invoker" - > it is consistent with other databases and eases transition from such > systems. > > I kept "check_permissions_owner" for now. Constantly changing it around > with each iteration doesn't really bring any value IMHO, I'd rather have > a final consensus on how to name the option and *then* change it for good. > Yes indeed, it's annoying to keep changing the name between patch versions, so let's try to get a consensus now. For my part, I find myself more and more convinced that "security_invoker" is the right name, because it matches the terminology used for functions, and in other database systems. I think the parallels between security invoker functions and security invoker views are quite strong. There are a couple of additional considerations that lend weight to that choice of name, though not uniquely to it: 1). There is a slight advantage to having an option that defaults to false/off, like the existing "security_barrier" option -- it allows a shorthand to turn the option on, because the system automatically turns "WITH (security_barrier)" into "WITH (security_barrier=true)". 2). Grammatically, a name like this works better, because it serves both as the name of the boolean option, and as an adjective that can be used to describe and name the feature -- as in "security barrier views are cool" -- making it easier to talk about the feature. "check_permissions_owner=false" doesn't work as well in either regard, and just feels much more clumsy. When we come to write the release notes for this feature, saying that this version of PG now supports security invoker views is going to mean a lot more to people who already use that feature in other databases. What are other people's opinions? Regards, Dean
Re: [PATCH] Add reloption for views to enable RLS
On Mon, 14 Mar 2022 at 16:16, Laurenz Albe wrote: > > The patch is fine from my point of view. > > It passes "make check-world". > > I'll mark it as "ready for committer". > Cool, thanks. I think this will make a useful addition to PG15. I have been hacking on it a bit, and attached is an updated version. Aside from some general copy editing, the most notable changes are: In the updatable_views tests, I have moved the new tests to immediately after the existing permission checking tests, which seems like a more logical place to put them, and modified them to use the same style as those existing tests. IMO, this test style makes the task of writing tests simpler, since the expected output is a little more obvious. Similarly in the rowsecurity tests, I have moved the new tests to immediately after the existing tests for RLS policies on tables accessed via views, and added a few new tests in the same style, including verifying permission checks on relations in subqueries in RLS policies, when the table is accessed via a view. I wasn't happy with the overall level of test coverage for this new feature, so I have expanded on them quite a bit. This includes tests for a bug in rewriteTargetView() -- it wasn't consistently handling the case of an update involving an ordinary view on top of a security invoker view. I have added explicit documentation for the fact that a security invoker view always does permission checks as the current user, even if it is accessed from a non-security invoker view, since that was the cause of some discussion on this thread. I've also added some more detailed documentation describing how all this affects RLS, since that's likely to be a common use case. I've done a fairly extensive doc search, and I *think* I've identified all the other places that needed updating. One additional thing that had been missed was that the LOCK command can be used to lock views, which includes locking all underlying base relations, after checking permissions as the view owner. The logical/consistent thing to do for security invoker views is to do the permission checks as the invoking user, so I've done that. Barring any other comments or objections, I'll push this in a couple of days or so, after a bit more proof-reading. Regards, Dean diff --git a/doc/src/sgml/ref/alter_view.sgml b/doc/src/sgml/ref/alter_view.sgml new file mode 100644 index 98c312c..8bdc90a --- a/doc/src/sgml/ref/alter_view.sgml +++ b/doc/src/sgml/ref/alter_view.sgml @@ -156,7 +156,17 @@ ALTER VIEW [ IF EXISTS ] Changes the security-barrier property of the view. The value must - be Boolean value, such as true + be a Boolean value, such as true + or false. + + + + +security_invoker (boolean) + + + Changes the security-invoker property of the view. The value must + be a Boolean value, such as true or false. diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml new file mode 100644 index 9f53206..f898b7a --- a/doc/src/sgml/ref/create_policy.sgml +++ b/doc/src/sgml/ref/create_policy.sgml @@ -608,7 +608,9 @@ AND This does not change how views work, however. As with normal queries and views, permission checks and policies for the tables which are referenced by a view will use the view - owner's rights and any policies which apply to the view owner. + owner's rights and any policies which apply to the view owner, except if + the view is defined using the security_invoker option + (see CREATE VIEW). diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml new file mode 100644 index bf03287..72a92bb --- a/doc/src/sgml/ref/create_view.sgml +++ b/doc/src/sgml/ref/create_view.sgml @@ -137,8 +137,6 @@ CREATE VIEW [ schemalocal or cascaded, and is equivalent to specifying WITH [ CASCADED | LOCAL ] CHECK OPTION (see below). - This option can be changed on existing views using ALTER VIEW. @@ -152,7 +150,22 @@ CREATE VIEW [ schema - + + +security_invoker (boolean) + + + This option causes the underlying base relations to be checked + against the privileges of the user of the view rather than the view + owner. See the notes below for full details. + + + + + + All of the above options can be changed on existing views using ALTER VIEW. + @@ -265,18 +278,74 @@ CREATE VIEW vista AS SELECT text 'Hello -Access to tables referenced in the view is determined by permissions of -the view owner. In some cases, this can be used to provide secure but -restricted access to the underlying
Re: [PATCH] Add reloption for views to enable RLS
On Mon, 21 Mar 2022 at 09:47, Laurenz Albe wrote: > > Thanks for your diligent work on this, and the patch looks good to me. Thanks for looking again. Pushed. Regards, Dean
Re: Additional improvements to extended statistics
On Sun, 29 Nov 2020 at 21:02, Tomas Vondra wrote: > > Those are fairly minor issues. I don't have any deeper objections, and > it seems committable. Do you plan to do that sometime soon? > OK, I've updated the patch status in the CF app, and I should be able to push it in the next day or so. Regards, Dean
Re: Additional improvements to extended statistics
makes that code more future-proof. I also think it's neater because now the signature of mcv_clause_selectivity_or() is more natural --- it's primary return value is now the clause's MCV selectivity, as suggested by the function's name, rather than the overlap selectivity that the previous version was returning. Also, after your "Var Op Var" patch is applied, I think it would be possible to construct queries that would benefit from this, so it would be good to get that committed too. Barring any further comments, I'll push this sometime soon. Regards, Dean diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c new file mode 100644 index 37a735b..b88b29e --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -44,6 +44,12 @@ static void addRangeClause(RangeQueryCla bool varonleft, bool isLTsel, Selectivity s2); static RelOptInfo *find_single_rel_for_clauses(PlannerInfo *root, List *clauses); +static Selectivity clauselist_selectivity_or(PlannerInfo *root, + List *clauses, + int varRelid, + JoinType jointype, + SpecialJoinInfo *sjinfo, + bool use_extended_stats); / * ROUTINES TO COMPUTE SELECTIVITIES @@ -61,64 +67,10 @@ static RelOptInfo *find_single_rel_for_c * * The basic approach is to apply extended statistics first, on as many * clauses as possible, in order to capture cross-column dependencies etc. - * The remaining clauses are then estimated using regular statistics tracked - * for individual columns. This is done by simply passing the clauses to - * clauselist_selectivity_simple. - */ -Selectivity -clauselist_selectivity(PlannerInfo *root, - List *clauses, - int varRelid, - JoinType jointype, - SpecialJoinInfo *sjinfo) -{ - Selectivity s1 = 1.0; - RelOptInfo *rel; - Bitmapset *estimatedclauses = NULL; - - /* - * Determine if these clauses reference a single relation. If so, and if - * it has extended statistics, try to apply those. - */ - rel = find_single_rel_for_clauses(root, clauses); - if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL) - { - /* - * Estimate as many clauses as possible using extended statistics. - * - * 'estimatedclauses' tracks the 0-based list position index of - * clauses that we've estimated using extended statistics, and that - * should be ignored. - */ - s1 *= statext_clauselist_selectivity(root, clauses, varRelid, - jointype, sjinfo, rel, - &estimatedclauses); - } - - /* - * Apply normal selectivity estimates for the remaining clauses, passing - * 'estimatedclauses' so that it skips already estimated ones. - */ - return s1 * clauselist_selectivity_simple(root, clauses, varRelid, - jointype, sjinfo, - estimatedclauses); -} - -/* - * clauselist_selectivity_simple - - * Compute the selectivity of an implicitly-ANDed list of boolean - * expression clauses. The list can be empty, in which case 1.0 - * must be returned. List elements may be either RestrictInfos - * or bare expression clauses --- the former is preferred since - * it allows caching of results. The estimatedclauses bitmap tracks - * clauses that have already been estimated by other means. - * - * See clause_selectivity() for the meaning of the additional parameters. - * - * Our basic approach is to take the product of the selectivities of the - * subclauses. However, that's only right if the subclauses have independent - * probabilities, and in reality they are often NOT independent. So, - * we want to be smarter where we can. + * The remaining clauses are then estimated by taking the product of their + * selectivities, but that's only right if they have independent + * probabilities, and in reality they are often NOT independent even if they + * only refer to a single column. So, we want to be smarter where we can. * * We also recognize "range queries", such as "x > 34 AND x < 42". Clauses * are recognized as possible range query components if they are restriction @@ -147,28 +99,68 @@ clauselist_selectivity(PlannerInfo *root * selectivity functions; perhaps some day we can generalize the approach. */ Selectivity -clauselist_selectivity_simple(PlannerInfo *root, - List *clauses, - int varRelid, - JoinType jointype, - SpecialJoinInfo *sjinfo, - Bitmapset *estimatedclauses) +clauselist_selectivity(PlannerInfo *root, + List *clauses, + int varRelid, + JoinType jointype, + SpecialJoinInfo *sjinfo) +{ + return clauselist_selectivity_ext(root, clauses, varRelid, + jointype, sjinfo, true); +} + +/* + * clauselist_selectivity_ext - + * Extended ver
Re: Additional improvements to extended statistics
On Wed, 2 Dec 2020 at 16:34, Tomas Vondra wrote: > > On 12/2/20 4:51 PM, Dean Rasheed wrote: > > > > Barring any further comments, I'll push this sometime soon. > > +1 > Pushed. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Thu, 3 Dec 2020 at 15:23, Tomas Vondra wrote: > > Attached is a patch series rebased on top of 25a9e54d2d. After reading this thread and [1], I think I prefer the name "standard" rather than "expressions", because it is meant to describe the kind of statistics being built rather than what they apply to, but maybe that name doesn't actually need to be exposed to the end user: Looking at the current behaviour, there are a couple of things that seem a little odd, even though they are understandable. For example, the fact that CREATE STATISTICS s (expressions) ON (expr), col FROM tbl; fails, but CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl; succeeds and creates both "expressions" and "mcv" statistics. Also, the syntax CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl; tends to suggest that it's going to create statistics on the pair of expressions, describing their correlation, when actually it builds 2 independent statistics. Also, this error text isn't entirely accurate: CREATE STATISTICS s ON col FROM tbl; ERROR: extended statistics require at least 2 columns because extended statistics don't always require 2 columns, they can also just have an expression, or multiple expressions and 0 or 1 columns. I think a lot of this stems from treating "expressions" in the same way as the other (multi-column) stats kinds, and it might actually be neater to have separate documented syntaxes for single- and multi-column statistics: CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ON (expression) FROM table_name CREATE STATISTICS [ IF NOT EXISTS ] statistics_name [ ( statistics_kind [, ... ] ) ] ON { column_name | (expression) } , { column_name | (expression) } [, ...] FROM table_name The first syntax would create single-column stats, and wouldn't accept a statistics_kind argument, because there is only one kind of single-column statistic. Maybe that might change in the future, but if so, it's likely that the kinds of single-column stats will be different from the kinds of multi-column stats. In the second syntax, the only accepted kinds would be the current multi-column stats kinds (ndistinct, dependencies, and mcv), and it would always build stats describing the correlations between the columns listed. It would continue to build standard/expression stats on any expressions in the list, but that's more of an implementation detail. It would no longer be possible to do "CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to issue 2 separate "CREATE STATISTICS" commands, but that seems more logical, because they're independent stats. The parsing code might not change much, but some of the errors would be different. For example, the errors "building only extended expression statistics on simple columns not allowed" and "extended expression statistics require at least one expression" would go away, and the error "extended statistics require at least 2 columns" might become more specific, depending on the stats kind. Regards, Dean [1] https://www.postgresql.org/message-id/flat/1009.1579038764%40sss.pgh.pa.us#8624792a20ae595683b574f5933dae53
Re: PoC/WIP: Extended statistics on expressions
On Mon, 7 Dec 2020 at 14:15, Tomas Vondra wrote: > > On 12/7/20 10:56 AM, Dean Rasheed wrote: > > it might actually be > > neater to have separate documented syntaxes for single- and > > multi-column statistics: > > > > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name > > ON (expression) > > FROM table_name > > > > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name > > [ ( statistics_kind [, ... ] ) ] > > ON { column_name | (expression) } , { column_name | (expression) } [, > > ...] > > FROM table_name > > I think it makes sense in general. I see two issues with this approach, > though: > > * By adding expression/standard stats for individual statistics, it > makes the list of statistics longer - I wonder if this might have > measurable impact on lookups in this list. > > * I'm not sure it's a good idea that the second syntax would always > build the per-expression stats. Firstly, it seems a bit strange that it > behaves differently than the other kinds. Secondly, I wonder if there > are cases where it'd be desirable to explicitly disable building these > per-expression stats. For example, what if we have multiple extended > statistics objects, overlapping on a couple expressions. It seems > pointless to build the stats for all of them. > Hmm, I'm not sure it would really be a good idea to build MCV stats on expressions without also building the standard stats for those expressions, otherwise the assumptions that mcv_combine_selectivities() makes about simple_sel and mcv_basesel wouldn't really hold. But then, if multiple MCV stats shared the same expression, it would be quite wasteful to build standard stats on the expression more than once. It feels like it should build a single extended stats object for each unique expression, with appropriate dependencies for any MCV stats that used those expressions, but I'm not sure how complex that would be. Dropping the last MCV stat object using a standard expression stat object might reasonably drop the expression stats ... except if they were explicitly created by the user, independently of any MCV stats. That could get quite messy. Regards, Dean
Re: Additional improvements to extended statistics
On Wed, 2 Dec 2020 at 15:51, Dean Rasheed wrote: > > The sort of queries I had in mind were things like this: > > WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2) > > However, the new code doesn't apply the extended stats directly using > clauselist_selectivity_or() for this kind of query because there are > no RestrictInfos for the nested AND clauses, so > find_single_rel_for_clauses() (and similarly > statext_is_compatible_clause()) regards those clauses as not > compatible with extended stats. So what ends up happening is that > extended stats are used only when we descend down to the two AND > clauses, and their results are combined using the original "s1 + s2 - > s1 * s2" formula. That actually works OK in this case, because there > is no overlap between the two AND clauses, but it wouldn't work so > well if there was. > > I'm pretty sure that can be fixed by teaching > find_single_rel_for_clauses() and statext_is_compatible_clause() to > handle BoolExpr clauses, looking for RestrictInfos underneath them, > but I think that should be left for a follow-in patch. Attached is a patch doing that, which improves a couple of the estimates for queries with AND clauses underneath OR clauses, as expected. This also revealed a minor bug in the way that the estimates for multiple statistics objects were combined while processing an OR clause -- the estimates for the overlaps between clauses only apply for the current statistics object, so we really have to combine the estimates for each set of clauses for each statistics object as if they were independent of one another. 0001 fixes the multiple-extended-stats issue for OR clauses, and 0002 improves the estimates for sub-AND clauses underneath OR clauses. These are both quite small patches, that hopefully won't interfere with any of the other extended stats patches. Regards, Dean 0001-Improve-estimation-of-OR-clauses-using-multiple-exte.patch Description: Binary data 0002-Improve-estimation-of-ANDs-under-ORs-using-extended-.patch Description: Binary data
Re: PoC/WIP: Extended statistics on expressions
On Tue, 8 Dec 2020 at 12:44, Tomas Vondra wrote: > > Possibly. But I don't think it's worth the extra complexity. I don't > expect people to have a lot of overlapping stats, so the amount of > wasted space and CPU time is expected to be fairly limited. > > So I don't think it's worth spending too much time on this now. Let's > just do what you proposed, and revisit this later if needed. > Yes, I think that's a reasonable approach to take. As long as the documentation makes it clear that building MCV stats also causes standard expression stats to be built on any expressions included in the list, then the user will know and can avoid duplication most of the time. I don't think there's any need for code to try to prevent that -- just as we don't bother with code to prevent a user building multiple indexes on the same column. The only case where duplication won't be avoidable is where there are multiple MCV stats sharing the same expression, but that's probably quite unlikely in practice, and it seems acceptable to leave improving that as a possible future optimisation. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Fri, 11 Dec 2020 at 20:17, Tomas Vondra wrote: > > OK. Attached is an updated version, reworking it this way. Cool. I think this is an exciting development, so I hope it makes it into the next release. I have started looking at it. So far I have only looked at the catalog, parser and client changes, but I thought it's worth posting my comments so far. > I tried tweaking the grammar to differentiate these two syntax variants, > but that led to shift/reduce conflicts with the existing ones. I tried > fixing that, but I ended up doing that in CreateStatistics(). Yeah, that makes sense. I wasn't expecting the grammar to change. > The other thing is that we probably can't tie this to just MCV, because > functional dependencies need the per-expression stats too. So I simply > build expression stats whenever there's at least one expression. Makes sense. > I also decided to keep the "expressions" statistics kind - it's not > allowed to specify it in CREATE STATISTICS, but it's useful internally > as it allows deciding whether to build the stats in a single place. > Otherwise we'd need to do that every time we build the statistics, etc. Yes, I thought that would be the easiest way to do it. Essentially the "expressions" stats kind is an internal implementation detail, hidden from the user, because it's built automatically when required, so you don't need to (and can't) explicitly ask for it. This new behaviour seems much more logical to me. > I added a brief explanation to the sgml docs, not sure if that's good > enough - maybe it needs more details. Yes, I think that could use a little tidying up, but I haven't looked too closely yet. Some other comments: * I'm not sure I understand the need for 0001. Wasn't there an earlier version of this patch that just did it by re-populating the type array, but which still had it as an array rather than turning it into a list? Making it a list falsifies some of the comments and function/variable name choices in that file. * There's a comment typo in catalog/Makefile -- "are are reputedly other...", should be "there are reputedly other...". * Looking at the pg_stats_ext view, I think perhaps expressions stats should be omitted entirely from that view, since it doesn't show any useful information about them. So it could remove "e" from the "kinds" array, and exclude rows whose only kind is "e", since such rows have no interesting data in them. Essentially, the new view pg_stats_ext_exprs makes having any expression stats in pg_stats_ext redundant. Hiding this data in pg_stats_ext would also be consistent with making the "expressions" stats kind hidden from the user. * In gram.y, it wasn't quite obvious why you converted the column list for CREATE STATISTICS from an expr_list to a stats_params list. I figured it out, and it makes sense, but I think it could use a comment, perhaps something along the lines of the one for index_elem, e.g.: /* * Statistics attributes can be either simple column references, or arbitrary * expressions in parens. For compatibility with index attributes permitted * in CREATE INDEX, we allow an expression that's just a function call to be * written without parens. */ * In parse_func.c and parse_agg.c, there are a few new error strings that use the abbreviation "stats expressions", whereas most other errors refer to "statistics expressions". For consistency, I think they should all be the latter. * In generateClonedExtStatsStmt(), I think the "expressions" stats kind needs to be explicitly excluded, otherwise CREATE TABLE (LIKE ...) fails if the source table has expression stats. * CreateStatistics() uses ShareUpdateExclusiveLock, but in tcop/utility.c the relation is opened with a ShareLock. ISTM that the latter lock mode should be made to match CreateStatistics(). * Why does the new code in tcop/utility.c not use RangeVarGetRelidExtended together with RangeVarCallbackOwnsRelation? That seems preferable to doing the ACL check in CreateStatistics(). For one thing, as it stands, it allows the lock to be taken even if the user doesn't own the table. Is it intentional that the current code allows extended stats to be created on system catalogs? That would be one thing that using RangeVarCallbackOwnsRelation would change, but I can't see a reason to allow it. * In src/bin/psql/describe.c, I think the \d output should also exclude the "expressions" stats kind and just list the other kinds (or have no kinds list at all, if there are no other kinds), to make it consistent with the CREATE STATISTICS syntax. * The pg_dump output for a stats object whose only kind is "expressions" is broken -- it includes a spurious "()" for the kinds list. That's it for now. I'll look at the optimiser changes next, and try to post more comments later this week. Regards, Dean
Bug in numeric_power() if exponent is INT_MIN
(Amusingly I only found this after discovering that Windows Calculator has a similar bug which causes it to crash if you try to raise a number to the power INT_MIN.) On my machine, numeric_power() loses all precision if the exponent is INT_MIN, though the actual failure mode might well be platform-dependent: SELECT n, 1.0123^n AS pow FROM (VALUES (-2147483647), (-2147483648), (-2147483649)) AS v(n); n |pow -+ -2147483647 | 0.7678656557347558 -2147483648 | 1. -2147483649 | 0.7678656555458609 (3 rows) The issue is in this line from power_var_int(): sig_digits += (int) log(Abs(exp)) + 8; because "exp" is a signed int, so Abs(exp) leaves INT_MIN unchanged. The most straightforward fix is to use fabs() instead, so that "exp" is cast to double *before* the absolute value is taken, as in the attached patch. This code was added in 7d9a4737c2, which first appeared in PG 9.6, so barring objections, I'll push and back-patch this fix that far. Regards, Dean diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index 68d8791..7cf5656 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -10290,7 +10290,7 @@ power_var_int(const NumericVar *base, in * to around log10(abs(exp)) digits, so work with this many extra digits * of precision (plus a few more for good measure). */ - sig_digits += (int) log(Abs(exp)) + 8; + sig_digits += (int) log(fabs(exp)) + 8; /* * Now we can proceed with the multiplications. diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out new file mode 100644 index 56e7799..30a5642 --- a/src/test/regress/expected/numeric.out +++ b/src/test/regress/expected/numeric.out @@ -2321,6 +2321,12 @@ select 0.12 ^ (-20); 2608405330458882702.5529619561355838 (1 row) +select 1.0123 ^ (-2147483648); + ?column? + + 0.7678656556403084 +(1 row) + -- cases that used to error out select 0.12 ^ (-25); ?column? diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql new file mode 100644 index f19793a..db812c8 --- a/src/test/regress/sql/numeric.sql +++ b/src/test/regress/sql/numeric.sql @@ -1089,6 +1089,7 @@ select 3.789 ^ 21; select 3.789 ^ 35; select 1.2 ^ 345; select 0.12 ^ (-20); +select 1.0123 ^ (-2147483648); -- cases that used to error out select 0.12 ^ (-25);
Re: PoC/WIP: Extended statistics on expressions
On Tue, 5 Jan 2021 at 00:45, Tomas Vondra wrote: > > On 1/4/21 4:34 PM, Dean Rasheed wrote: > > > > * In src/bin/psql/describe.c, I think the \d output should also > > exclude the "expressions" stats kind and just list the other kinds (or > > have no kinds list at all, if there are no other kinds), to make it > > consistent with the CREATE STATISTICS syntax. > > Not sure I understand. Why would this make it consistent with CREATE > STATISTICS? Can you elaborate? > This isn't absolutely essential, but I think it would be neater. For example, if I have a table with stats like this: CREATE TABLE foo(a int, b int); CREATE STATISTICS foo_s_ab (mcv) ON a,b FROM foo; then the \d output is as follows: \d foo Table "public.foo" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | b | integer | | | Statistics objects: "public"."foo_s_ab" (mcv) ON a, b FROM foo and the stats line matches the DDL used to create the stats. It could, for example, be copy-pasted and tweaked to create similar stats on another table, but even if that's not very likely, it's neat that it reflects how the stats were created. OTOH, if there are expressions in the list, it produces something like this: Table "public.foo" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | b | integer | | | Statistics objects: "public"."foo_s_ab" (mcv, expressions) ON a, b, ((a * b)) FROM foo which no longer matches the DDL used, and isn't part of an accepted syntax, so seems a bit inconsistent. In general, if we're making the "expressions" kind an internal implementation detail that just gets built automatically when needed, then I think we should hide it from this sort of output, so the list of kinds matches the list that the user used when the stats were created. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
Looking over the statscmds.c changes, there are a few XXX's and FIXME's that need resolving, and I had a couple of other minor comments: + /* +* An expression using mutable functions is probably wrong, +* since if you aren't going to get the same result for the +* same data every time, it's not clear what the index entries +* mean at all. +*/ + if (CheckMutability((Expr *) expr)) + ereport(ERROR, That comment is presumably copied from the index code, so needs updating. + /* +* Disallow data types without a less-than operator +* +* XXX Maybe allow this, but only for EXPRESSIONS stats and +* prevent building e.g. MCV etc. +*/ + atttype = exprType(expr); + type = lookup_type_cache(atttype, TYPECACHE_LT_OPR); + if (type->lt_opr == InvalidOid) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg("expression cannot be used in statistics because its type %s has no default btree operator class", + format_type_be(atttype; As the comment suggests, it's probably worth skipping this check if numcols is 1 so that single-column stats can be built for more types of expressions. (I'm assuming that it's basically no more effort to make that work, so I think it falls into the might-as-well-do-it category.) + /* +* Parse the statistics kinds. Firstly, check that this is not the +* variant building statistics for a single expression, in which case +* we don't allow specifying any statistis kinds. The simple variant +* only has one expression, and does not allow statistics kinds. +*/ + if ((list_length(stmt->exprs) == 1) && (list_length(stxexprs) == 1)) + { Typo: "statistis" Nit-picking, this test could just be: + if ((numcols == 1) && (list_length(stxexprs) == 1)) which IMO is a little more readable, and matches a similar test a little further down. + /* +* If there are no simply-referenced columns, give the statistics an +* auto dependency on the whole table. In most cases, this will +* be redundant, but it might not be if the statistics expressions +* contain no Vars (which might seem strange but possible). +* +* XXX This is copied from index_create, not sure if it's applicable +* to extended statistics too. +*/ Seems right to me. + /* +* FIXME use 'expr' for expressions, which have empty column names. +* For indexes this is handled in ChooseIndexColumnNames, but we +* have no such function for stats. +*/ + if (!name) + name = "expr"; In theory, this function could be made to duplicate the logic used for indexes, creating names like "expr1", "expr2", etc. To be honest though, I don't think it's worth the effort. The code for indexes isn't really bulletproof anyway -- for example there might be a column called "expr" that is or isn't included in the index, which would make the generated name ambiguous. And in any case, a name like "tbl_cola_expr_colb_expr1_colc_stat" isn't really any more useful than "tbl_cola_expr_colb_expr_colc_stat". So I'd be tempted to leave that code as it is. + +/* + * CheckMutability + * Test whether given expression is mutable + * + * FIXME copied from indexcmds.c, maybe use some shared function? + */ +static bool +CheckMutability(Expr *expr) +{ As the comment says, it's quite messy duplicating this code, but I'm wondering whether it would be OK to just skip this check entirely. I think someone else suggested that elsewhere, and I think it might not be a bad idea. For indexes, it could easily lead to wrong query results, but for stats the most likely problem is that the stats would get out of date (which they tend to do all by themselves anyway) and need rebuilding. If you ignore intentionally crazy examples (which are still possible even with this check), then there are probably many legitimate cases where someone might want to use non-immutable functions in stats, and this check just forces them to create an immutable wrapper function. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
Starting to look at the planner code, I found an oversight in the way expression stats are read at the start of planning -- it is necessary to call ChangeVarNodes() on any expressions if the relid isn't 1, otherwise the stats expressions may contain Var nodes referring to the wrong relation. Possibly the easiest place to do that would be in get_relation_statistics(), if rel->relid != 1. Here's a simple test case: CREATE TABLE foo AS SELECT x FROM generate_series(1,10) g(x); CREATE STATISTICS foo_s ON (x%10) FROM foo; ANALYSE foo; EXPLAIN SELECT * FROM foo WHERE x%10 = 0; EXPLAIN SELECT * FROM (SELECT 1) t, foo WHERE x%10 = 0; (in the second query, the stats don't get applied). Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Thu, 4 Mar 2021 at 22:16, Tomas Vondra wrote: > > Attached is a slightly improved version of the patch series, addressing > most of the issues raised in the previous message. Cool. Sorry for the delay replying. > 0003-Extended-statistics-on-expressions-20210304.patch > > Mostly unchanged, The one improvement is removing some duplicate code in > in mvc.c. > > 0004-WIP-rework-tracking-of-expressions-20210304.patch > > This is mostly unchanged of the patch reworking how we assign artificial > attnums to expressions (negative instead of (MaxHeapAttributeNumber+i)). Looks good. I see you undid the change to get_relation_statistics() in plancat.c, which offset the attnums of plain attributes in the StatisticExtInfo struct. I was going to suggest that as a simplification to the previous 0004 patch. Related to that, is this comment in dependencies_clauselist_selectivity(): /* * Count matching attributes - we have to undo two attnum offsets. * First, the dependency is offset using the number of expressions * for that statistics, and then (if it's a plain attribute) we * need to apply the same offset as above, by unique_exprs_cnt. */ which needs updating, since there is now just one attnum offset, not two. Only the unique_exprs_cnt offset is relevant now. Also, related to that change, I don't think that stat_covers_attributes() is needed anymore. I think that the code that calls it can now just be reverted back to using bms_is_subset(), since that bitmapset holds plain attributes that aren't offset. > 0005-WIP-unify-handling-of-attributes-and-expres-20210304.patch > > This reworks how we build statistics on attributes and expressions. > Instead of treating attributes and expressions separately, this allows > handling them uniformly. > > Until now, the various "build" functions (for different statistics > kinds) extracted attribute values from sampled tuples, but expressions > were pre-calculated in a separate array. Firstly to save CPU time (not > having to evaluate expensive expressions repeatedly) and to keep the > different stats consistent (there might be volatile functions etc.). > > So the build functions had to look at the attnum, determine if it's > attribute or expression, and in some cases it was tricky / easy to get > wrong. > > This patch replaces this "split" view with a simple "consistent" > representation merging values from attributes and expressions, and just > passes that to the build functions. There's no need to check the attnum, > and handle expressions in some special way, so the build functions are > much simpler / easier to understand (at least I think so). > > The build data is represented by "StatsBuildData" struct - not sure if > there's a better name. > > I'm mostly happy with how this turned out. I'm sure there's a bit more > cleanup needed (e.g. the merging/remapping of dependencies needs some > refactoring, I think) but overall this seems reasonable. Agreed. That's a nice improvement. I wonder if dependency_is_compatible_expression() can be merged with dependency_is_compatible_clause() to reduce code duplication. It probably also ought to be possible to support "Expr IN Array" there, in a similar way to the other code in statext_is_compatible_clause(). Also, should this check rinfo->clause_relids against the passed-in relid to rule out clauses referencing other relations, in the same way that statext_is_compatible_clause() does? > I did some performance testing, I don't think there's any measurable > performance degradation. I'm actually wondering if we need to transform > the AttrNumber arrays into bitmaps in various places - maybe we should > just do a plain linear search. We don't really expect many elements, as > each statistics has 8 attnums at most. So maybe building the bitmapsets > is a net loss? The one exception might be functional dependencies, where > we can "merge" multiple statistics together. But even then it'd require > many statistics objects to make a difference. Possibly. There's a danger in trying to change too much at once though. As it stands, I think it's fairly close to being committable, with just a little more tidying up. Regards, Dean
Re: pgbench - add pseudo-random permutation function
On Thu, 11 Mar 2021 at 00:58, Bruce Momjian wrote: > > Maybe Dean Rasheed can help because of his math background --- CC'ing him. > Reading the thread I can see how such a function might be useful to scatter non-uniformly random values. The implementation looks plausible too, though it adds quite a large amount of new code. The main thing that concerns me is justifying the code. With this kind of thing, it's all too easy to overlook corner cases and end up with trivial sequences in certain special cases. I'd feel better about that if we were implementing a standard algorithm with known pedigree. Thinking about the use case for this, it seems that it's basically designed to turn a set of non-uniform random numbers (produced by random_exponential() et al.) into another set of non-uniform random numbers, where the non-uniformity is scattered so that the more/less common values aren't all clumped together. I'm wondering if that's something that can't be done more simply by passing the non-uniform random numbers through the uniform random number generator to scatter them uniformly across some range -- e.g., given an integer n, return the n'th value from the sequence produced by random(), starting from some initial seed -- i.e., implement nth_random(lb, ub, seed, n). That would actually be pretty straightforward to implement using O(log(n)) time to execute (see the attached python example), though it wouldn't generate a permutation, so it'd need a bit of thought to see if it met the requirements. Regards, Dean import math a = 0x5deece66d c = 0xb m = 0x def erand48(seed): x = (seed[2] << 32) | (seed[1] << 16) | seed[0] x = (x * a + c) & m seed[0] = x & 0x seed[1] = (x >> 16) & 0x seed[2] = (x >> 32) & 0x return math.ldexp(x, -48) def nth_erand48(seed, n): x = (seed[2] << 32) | (seed[1] << 16) | seed[0] # Binary exponentiation to compute a^n and the sum of the geometric # series gs = 1+a+...+a^(n-1) a_pow_n = 1 gs = 0 t = 1 e = a while n > 0: if n & 1 == 1: a_pow_n = (a_pow_n * e) & m gs = (gs * e + t) & m t = ((e+1)*t) & m e = (e*e) & m n = n/2 # n'th value in erand48 sequence x = (x * a_pow_n + gs * c) & m return math.ldexp(x, -48) iseed = [1234, 5678, 9012] seed = list(iseed) for n in range(1,21): r = erand48(seed) print n, seed, r, nth_erand48(iseed, n)
Re: pgbench - add pseudo-random permutation function
On Thu, 11 Mar 2021 at 19:06, David Bowen wrote: > > The algorithm for generating a random permutation with a uniform distribution > across all permutations is easy: > for (i=0; iswap a[n-i] with a[rand(n-i+1)] > } > > where 0 <= rand(x) < x and a[i] is initially i (see Knuth, Section 3.4.2 > Algorithm P) > True, but n can be very large, so that might use a lot of memory and involve a lot of pre-processing. Regards, Dean
Re: Additional improvements to extended statistics
On Thu, 12 Nov 2020 at 14:18, Tomas Vondra wrote: > > Here is an improved WIP version of the patch series, modified to address > the issue with repeatedly applying the extended statistics, as discussed > with Dean in this thread. It's a bit rough and not committable, but I > need some feedback so I'm posting it in this state. > Cool. I haven't forgotten that I promised to look at this. > Dean, does this address the issue you had in mind? Can you come up with > an example of that issue in the form of a regression test or something? > I'm quite busy with my day job at the moment, but I expect to have time to review this next week. Regards, Dean
Re: Additional improvements to extended statistics
On Thu, 12 Nov 2020 at 14:18, Tomas Vondra wrote: > > Here is an improved WIP version of the patch series, modified to address > the issue with repeatedly applying the extended statistics, as discussed > with Dean in this thread. It's a bit rough and not committable, but I > need some feedback so I'm posting it in this state. As it stands, it doesn't compile if 0003 is applied, because it missed one of the callers of clauselist_selectivity_simple(), but that's easily fixed. > 0001 is the original patch improving estimates of OR clauses > > 0002 adds thin wrappers for clause[list]_selectivity, with "internal" > functions allowing to specify whether to keep considering extended stats > > 0003 does the same for the "simple" functions > > > I've kept it like this to demonstrate that 0002 is not sufficient. In my > response from March 24 I wrote this: > > > Isn't it the case that clauselist_selectivity_simple (and the OR > > variant) should ignore extended stats entirely? That is, we'd need > > to add a flag (or _simple variant) to clause_selectivity, so that it > > calls causelist_selectivity_simple_or. > But that's actually wrong, as 0002 shows (as it breaks a couple of > regression tests), because of the way we handle OR clauses. At the top > level, an OR-clause is actually just a single clause and it may get > passed to clauselist_selectivity_simple. So entirely disabling extended > stats for the "simple" functions would also mean disabling extended > stats for a large number of OR clauses. Which is clearly wrong. > > So 0003 addresses that, by adding a flag to the two "simple" functions. > Ultimately, this should probably do the same thing as 0002 and add thin > wrappers, because the existing functions are part of the public API. I agree that, taken together, these patches fix the multiple-extended-stats-evaluation issue. However: I think this has ended up with too many variants of these functions, since we now have "_internal" and "_simple" variants, and you're proposing adding more. The original purpose of the "_simple" variants was to compute selectivities without looking at extended stats, and now the "_internal" variants compute selectivities with an additional "use_extended_stats" flag to control whether or not to look at extended stats. Thus they're basically the same, and could be rolled together. Additionally, it's worth noting that the "_simple" variants expose the "estimatedclauses" bitmap as an argument, which IMO is a bit messy as an API. All callers of the "_simple" functions outside of clausesel.c actually pass in estimatedclauses=NULL, so it's possible to refactor and get rid of that, turning estimatedclauses into a purely internal variable. Also, it's quite messy that clauselist_selectivity_simple_or() needs to be passed a Selectivity input (the final argument) that is the selectivity of any already-estimated clauses, or the value to return if no not-already-estimated clauses are found, and must be 0.0 when called from the extended stats code. Attached is the kind of thing I had in mind (as a single patch, since I don't think it's worth splitting up). This replaces the "_simple" and "_internal" variants of these functions with "_opt_ext_stats" variants whose signatures match the originals except for having the single extra "use_extended_stats" boolean parameter. Additionally, the "_simple" functions are merged into the originals (making them more like they were in PG11) so that the "estimatedclauses" bitmap and partial-OR-list Selectivity become internal details, no longer exposed in the API. Regards, Dean diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c new file mode 100644 index 37a735b..ab16a4c --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -44,6 +44,12 @@ static void addRangeClause(RangeQueryCla bool varonleft, bool isLTsel, Selectivity s2); static RelOptInfo *find_single_rel_for_clauses(PlannerInfo *root, List *clauses); +static Selectivity clause_selectivity_opt_ext_stats(PlannerInfo *root, + Node *clause, + int varRelid, + JoinType jointype, + SpecialJoinInfo *sjinfo, + bool use_extended_stats); / * ROUTINES TO COMPUTE SELECTIVITIES @@ -61,64 +67,8 @@ static RelOptInfo *find_single_rel_for_c * * The basic approach is to apply extended statistics first, on as many * clauses as possible, in order to capture cross-column dependencies etc. - * The remaining clauses
Re: Additional improvements to extended statistics
On Wed, 18 Nov 2020 at 22:37, Tomas Vondra wrote: > > Seems fine to me, although the "_opt_ext_stats" is rather cryptic. > AFAICS we use "_internal" for similar functions. > There's precedent for using "_opt_xxx" for function variants that add an option to existing functions, but I agree that in this case it's a bit of a mouthful. I don't think "_internal" is appropriate though, since the clauselist function isn't internal. Perhaps using just "_ext" would be OK. Regards, Dean
Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE
On Sun, 6 Sept 2020 at 22:42, Tom Lane wrote: > > I think you'd be better off to make transformInsertStmt(), specifically > its multi-VALUES-rows code path, check for all-DEFAULT columns and adjust > the tlist itself. Doing it there might be a good bit less inefficient > for very long VALUES lists, too, which is a case that we do worry about. > Since that's already iterating through the sub-lists, you could track > whether all rows seen so far contain SetToDefault in each column position, > and avoid extra scans of the sublists. (A BitmapSet might be a convenient > representation of that, though you could also use a bool array I suppose.) > > I do not care for what you did in rewriteValuesRTE() either: just removing > a sanity check isn't OK, unless you've done something to make the sanity > check unnecessary which you surely have not. Perhaps you could extend > the initial scan of the tlist (lines 1294-1310) to notice SetToDefault > nodes as well as Var nodes and keep track of which columns have those. > Then you could cross-check that one or the other case applies whenever > you see a SetToDefault in the VALUES lists. That's not quite right because by the time rewriteValuesRTE() sees the tlist, it will contain already-rewritten generated column expressions, not SetToDefault nodes. If we're going to keep that sanity check (and I think that we should), I think that the way to do it is to have rewriteTargetListIU() record which columns it has expanded defaults for, and pass that information to rewriteValuesRTE(). Those columns of the VALUES RTE are no longer used in the query, so the sanity check can be amended to ignore them while continuing to check the other columns. Incidentally, there is another way of causing that sanity check to fail -- an INSERT ... OVERRIDING USER VALUE query with some DEFAULTS in the VALUES RTE (but not necessarily all DEFAULTs) will trigger it. So even if the parser completely removed any all-DEFAULT columns from the VALUES RTE, some work in the rewriter would still be necessary. > BTW, another thing that needs checking is whether a rule containing > an INSERT like this will reverse-list sanely. The whole idea of > replacing some of the Vars might not work so far as ruleutils.c is > concerned. In that case I think we might have to implement this > by having transformInsertStmt restructure the VALUES lists to > physically remove the all-DEFAULT column, and adjust the target column > list accordingly --- that is, make a parse-time transformation of > INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT); > into > INSERT INTO gtest0(a) VALUES (1), (2); > That'd have the advantage that you'd not have to hack up the > rewriter at all. I think it's actually easier to just do it all in the rewriter -- at the point where we see that we're about to insert potentially illegal values from a VALUES RTE into a generated column, scan it to see if all the values in that column are DEFAULTs, and if so trigger the existing code to replace the Var in the tlist with the generated column expression. That way we only do extra work in the case for which we're currently throwing an error, rather than for every query. Also, I think that scanning the VALUES lists in this way is likely to be cheaper than rebuilding them to remove a column. Attached is a patch doing it that way, along with additional regression tests that trigger both the original error and the sanity-check error triggered by INSERT ... OVERRIDING USER VALUES. I also added a few additional comments where I found the existing code a little non-obvious. I haven't touched the existing error messages. I think that's a subject for a separate patch. Regards, Dean diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c new file mode 100644 index 41dd670..966cab5 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -69,13 +69,18 @@ static List *rewriteTargetListIU(List *t CmdType commandType, OverridingKind override, Relation target_relation, - int result_rti); + int result_rti, + RangeTblEntry *values_rte, + int values_rte_index, + Bitmapset **unused_values_attrnos); static TargetEntry *process_matched_tle(TargetEntry *src_tle, TargetEntry *prior_tle, const char *attrName); static Node *get_assignment_input(Node *node); +static Bitmapset *findDefaultOnlyColumns(RangeTblEntry *rte); static bool rewriteValuesRTE(Query *parsetree, RangeTblEntry *rte, int rti, - Relation target_relation, bool force_nulls); + Relation target_relation, bool force_nulls, + Bitmapset *unused_cols); static void markQueryForLocking(Query *qry, Node *jtnode, LockClauseStrength strength, LockWait
Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE
On Sun, 22 Nov 2020 at 20:58, Tom Lane wrote: > > I found only one nitpicky bug: in > findDefaultOnlyColumns, the test must be bms_is_empty(default_only_cols) > not just default_only_cols == NULL, or it will fail to fall out early > as intended when the first row contains some DEFAULTs but later rows > don't. Ah, good point. Thanks for fixing that. > > I haven't touched the existing error messages. I think that's a > > subject for a separate patch. > > Fair. After looking around a bit, I think that getting an error > cursor as I'd speculated about is more trouble than it's worth. > For starters, we'd have to pass down the query string into this > code, and there might be some ticklish issues about whether a given > chunk of parsetree came from that string or from some rule or view. > However, I think that just adjusting the error string would be > helpful, as attached. +1 > (I'm also wondering why the second case is generic ERRCODE_SYNTAX_ERROR > and not ERRCODE_GENERATED_ALWAYS. Didn't change it here, though.) I can't see any reason for it to be different, and ERRCODE_GENERATED_ALWAYS seems like the right code to use for both cases. Regards, Dean
Re: proposal: possibility to read dumped table's name from file
On Thu, 19 Nov 2020 at 19:57, Pavel Stehule wrote: > > minor update - fixed handling of processing names with double quotes inside > I see this is marked RFC, but reading the thread it doesn't feel like we have reached consensus on the design for this feature. I agree that being able to configure pg_dump via a config file would be very useful, but the syntax proposed here feels much more like a hacked-up syntax designed to meet this one use case, rather than a good general-purpose design that can be easily extended. IMO, a pg_dump config file should be able to specify all options currently supported through the command line, and vice versa (subject to command line length limits), with a single common code path for handling options. That way, any new options we add will work on the command line and in config files. Likewise, the user should only need to learn one set of options, and have the choice of specifying them on the command line or in a config file (or a mix of both). I can imagine eventually supporting multiple different file formats, each just being a different representation of the same data, so perhaps this could work with 2 new options: --option-file-format=plain|yaml|json|... --option-file=filename with "plain" being the default initial implementation, which might be something like our current postgresql.conf file format. Also, I think we should allow multiple "--option-file" arguments (e.g., to list different object types in different files), and for a config file to contain its own "--option-file" arguments, to allow config files to include other config files. The current design feels far too limited to me, and requires new code and new syntax to be added each time we extend it, so I'm -1 on this patch as it stands. Regards, Dean
Re: proposal: possibility to read dumped table's name from file
On Thu, 26 Nov 2020 at 06:43, Pavel Stehule wrote: > > st 25. 11. 2020 v 21:00 odesílatel Tom Lane napsal: >> >> (One thing to consider is >> how painful will it be for people to quote table names containing >> funny characters, for instance. On the command line, we largely >> depend on the shell's quoting behavior to solve that, but we'd not >> have that infrastructure when reading from a file.) > > This is not a problem with the current patch - and the last version of this > patch supports well obscure names. > Actually, that raises a different possible benefit of passing options in an options file -- if the user wants to pass in a table name pattern, it can be a nuisance if the shell's argument processing does additional unwanted things like globbing and environment variable substitutions. Using an options file could provide a handy way to ensure that any option values are interpreted exactly as written, without any additional mangling. > There was a requirement for supporting all and future pg_dump options - ok it > can make sense. I have not a problem to use instead a line format > > "option argument" or "long-option=argument" > > This format - can it be a solution? I'll try to rewrite the parser for this > format. > Yes, that's the sort of thing I was thinking of, to make the feature more general-purpose. > It is implementable, but this is in collision with Stephen's requirement for > human well readable format designed for handy writing. There are requests > that have no intersection. Well readable format needs a more complex parser. > And machine generating in this format needs more fork - generating flat file > is more simple and more robust than generating JSON or YAML. > To be clear, I wasn't suggesting that this patch implement multiple formats. Just the "plain" format above. Other formats like YAML might well be more human-readable, and be able to take advantage of values that are lists to avoid repeating option names, and they would have the advantage of being readable and writable by other standard tools, which might be useful. But I think such things would be for the future. Maybe no one will ever add support for other formats, or maybe someone will just write a separate external tool to convert YAML or JSON to our plain format. I don't know. But supporting all pg_dump options makes more things possible. >> I wasn't very excited about multiple switch files either, though >> depending on how the implementation is done, that could be simple >> enough to be in the might-as-well category. >> That's what I was hoping. Regards, Dean
Re: Additional improvements to extended statistics
> On Wed, 18 Nov 2020 at 22:37, Tomas Vondra > wrote: > > > > Seems fine to me, although the "_opt_ext_stats" is rather cryptic. > > AFAICS we use "_internal" for similar functions. > > I have been thinking about this some more. The one part of this that I still wasn't happy with was the way that base frequencies were used to compute the selectivity correction to apply. As noted in [1], using base frequencies in this way isn't really appropriate for clauses combined using "OR". The reason is that an item's base frequency is computed as the product of the per-column selectivities, so that (freq - base_freq) is the right correction to apply for a set of clauses combined with "AND", but it doesn't really work properly for clauses combined with "OR". This is why a number of the estimates in the regression tests end up being significant over-estimates. I speculated in [1] that we might fix that by tracking which columns of the match bitmap actually matched the clauses being estimated, and then only use those base frequencies. Unfortunately that would also mean changing the format of the stats that we store, and so would be a rather invasive change. It occurred to me though, that there is another, much more straightforward way to do it. We can rewrite the "OR" clauses, and turn them into "AND" clauses using the fact that P(A OR B) = P(A) + P(B) - P(A AND B) and then use the multivariate stats to estimate the P(A AND B) part in the usual way. Attached is the resulting patch doing it that way. The main change is in the way that statext_mcv_clauselist_selectivity() works, combined with a new function mcv_clause_selectivity_or() that does the necessary MCV bitmap manipulations. Doing it this way also means that clausesel.c doesn't need to export clauselist_selectivity_or(), and the new set of exported functions seem a bit neater now. A handful of regression test results change, and in all cases except one the new estimates are much better. One estimate is made worse, but in that case we only have 2 sets of partial stats: SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0 with stats on (a,b) and (c,d) so it's not surprising that combining (a = 0 OR b = 0) with (c = 0 OR d = 0) mis-estimates a bit. I suspect the old MV stats estimate was more down to chance in this case. Regards, Dean [1] https://www.postgresql.org/message-id/CAEZATCX8u9bZzcWEzqA_t7f_OQHu2oxeTUGnFHNEOXnJo35AQg%40mail.gmail.com diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c new file mode 100644 index 37a735b..7d6e678 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -44,6 +44,12 @@ static void addRangeClause(RangeQueryCla bool varonleft, bool isLTsel, Selectivity s2); static RelOptInfo *find_single_rel_for_clauses(PlannerInfo *root, List *clauses); +static Selectivity clauselist_selectivity_or(PlannerInfo *root, + List *clauses, + int varRelid, + JoinType jointype, + SpecialJoinInfo *sjinfo, + bool use_extended_stats); / * ROUTINES TO COMPUTE SELECTIVITIES @@ -61,64 +67,8 @@ static RelOptInfo *find_single_rel_for_c * * The basic approach is to apply extended statistics first, on as many * clauses as possible, in order to capture cross-column dependencies etc. - * The remaining clauses are then estimated using regular statistics tracked - * for individual columns. This is done by simply passing the clauses to - * clauselist_selectivity_simple. - */ -Selectivity -clauselist_selectivity(PlannerInfo *root, - List *clauses, - int varRelid, - JoinType jointype, - SpecialJoinInfo *sjinfo) -{ - Selectivity s1 = 1.0; - RelOptInfo *rel; - Bitmapset *estimatedclauses = NULL; - - /* - * Determine if these clauses reference a single relation. If so, and if - * it has extended statistics, try to apply those. - */ - rel = find_single_rel_for_clauses(root, clauses); - if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL) - { - /* - * Estimate as many clauses as possible using extended statistics. - * - * 'estimatedclauses' tracks the 0-based list position index of - * clauses that we've estimated using extended statistics, and that - * should be ignored. - */ - s1 *= statext_clauselist_selectivity(root, clauses, varRelid, - jointype, sjinfo, rel, - &estimatedclauses); - } - - /* - * Apply normal selectivity estimates for the remaining clauses, passing - * 'estimatedclauses' so that it skips already estimated ones. - */ - return s1 * clauselist_selectivity_simple(root, clauses, varRelid, - jointype, sjin
Re: PoC/WIP: Extended statistics on expressions
Looking through extended_stats.c, I found a corner case that can lead to a seg-fault: CREATE TABLE foo(); CREATE STATISTICS s ON (1) FROM foo; ANALYSE foo; This crashes in lookup_var_attr_stats(), because it isn't expecting nvacatts to be 0. I can't think of any case where building stats on a table with no analysable columns is useful, so it should probably just exit early in that case. In BuildRelationExtStatistics(), it looks like min_attrs should be declared assert-only. In evaluate_expressions(): + /* set the pointers */ + result = (ExprInfo *) ptr; + ptr += sizeof(ExprInfo); I think that should probably have a MAXALIGN(). A slightly bigger issue that I don't like is the way it assigns attribute numbers for expressions starting from MaxHeapAttributeNumber+1, so the first expression has an attnum of 1601. That leads to pretty inefficient use of Bitmapsets, since most tables only contain a handful of columns, and there's a large block of unused space in the middle the Bitmapset. An alternative approach might be to use regular attnums for columns and use negative indexes -1, -2, -3, ... for expressions in the stored stats. Then when storing and retrieving attnums from Bitmapsets, it could just offset by STATS_MAX_DIMENSIONS (8) to avoid negative values in the Bitmapsets, since there can't be more than that many expressions (just like other code stores system attributes using FirstLowInvalidHeapAttributeNumber). That would be a somewhat bigger change, but hopefully fairly mechanical, and then some code like add_expressions_to_attributes() would go away. Looking at the new view pg_stats_ext_exprs, I noticed that it fails to show expressions until the statistics have been built. For example: CREATE TABLE foo(a int, b int); CREATE STATISTICS s ON (a+b), (a*b) FROM foo; SELECT statistics_name, tablename, expr, n_distinct FROM pg_stats_ext_exprs; statistics_name | tablename | expr | n_distinct -+---+--+ s | foo | | (1 row) but after populating and analysing the table, this becomes: statistics_name | tablename | expr | n_distinct -+---+-+ s | foo | (a + b) | 11 s | foo | (a * b) | 11 (2 rows) I think it should show the expressions even before the stats have been built. Another issue is that it returns rows for non-expression stats as well. For example: CREATE TABLE foo(a int, b int); CREATE STATISTICS s ON a, b FROM foo; SELECT statistics_name, tablename, expr, n_distinct FROM pg_stats_ext_exprs; statistics_name | tablename | expr | n_distinct -+---+--+ s | foo | | (1 row) and those values will never be populated, since they're not expressions, so I would expect them to not be shown in the view. So basically, instead of + LEFT JOIN LATERAL ( + SELECT + * + FROM ( + SELECT + unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr, + unnest(sd.stxdexpr)::pg_statistic AS a + ) x + ) stat ON sd.stxdexpr IS NOT NULL; perhaps just + JOIN LATERAL ( + SELECT + * + FROM ( + SELECT + unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr, + unnest(sd.stxdexpr)::pg_statistic AS a + ) x + ) stat ON true; Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Tue, 19 Jan 2021 at 01:57, Tomas Vondra wrote: > > > A slightly bigger issue that I don't like is the way it assigns > > attribute numbers for expressions starting from > > MaxHeapAttributeNumber+1, so the first expression has an attnum of > > 1601. That leads to pretty inefficient use of Bitmapsets, since most > > tables only contain a handful of columns, and there's a large block of > > unused space in the middle the Bitmapset. > > > > An alternative approach might be to use regular attnums for columns > > and use negative indexes -1, -2, -3, ... for expressions in the stored > > stats. Then when storing and retrieving attnums from Bitmapsets, it > > could just offset by STATS_MAX_DIMENSIONS (8) to avoid negative values > > in the Bitmapsets, since there can't be more than that many > > expressions (just like other code stores system attributes using > > FirstLowInvalidHeapAttributeNumber). > > Well, I tried this but unfortunately it's not that simple. We still need > to build the bitmaps, so I don't think add_expression_to_attributes can > be just removed. I mean, we need to do the offsetting somewhere, even if > we change how we do it. Hmm, I was imagining that the offsetting would happen in each place that adds or retrieves an attnum from a Bitmapset, much like a lot of other code does for system attributes, and then you'd know you had an expression if the resulting attnum was negative. I was also thinking that it would be these negative attnums that would be stored in the stats data, so instead of something like "1, 2 => 1601", it would be "1, 2 => -1", so in some sense "-1" would be the "real" attnum associated with the expression. > But the main issue is that in some cases the number of expressions is > not really limited by STATS_MAX_DIMENSIONS - for example when applying > functional dependencies, we "merge" multiple statistics, so we may end > up with more expressions. So we can't just use STATS_MAX_DIMENSIONS. Ah, I see. I hadn't really fully understood what that code was doing. ISTM though that this is really an internal problem to the dependencies code, in that these "merged" Bitmapsets containing attrs from multiple different stats objects do not (and must not) ever go outside that local code that uses them. So that code would be free to use a different offset for its own purposes -- e..g., collect all the distinct expressions across all the stats objects and then offset by the number of distinct expressions. > Also, if we offset regular attnums by STATS_MAX_DIMENSIONS, that inverts > the order of processing (so far we've assumed expressions are after > regular attnums). So the changes are more extensive - I tried doing that > anyway, and I'm still struggling with crashes and regression failures. > Of course, that doesn't mean we shouldn't do it, but it's far from > mechanical. (Some of that is probably a sign this code needs a bit more > work to polish.) Interesting. What code assumes expressions come after attributes? Ideally, I think it would be cleaner if no code assumed any particular order, but I can believe that it might be convenient in some circumstances. > But I wonder if it'd be easier to just calculate the actual max attnum > and then use it instead of MaxHeapAttributeNumber ... Hmm, I'm not sure how that would work. There still needs to be an attnum that gets stored in the database, and it has to continue to work if the user adds columns to the table. That's why I was advocating storing negative values, though I haven't actually tried it to see what might go wrong. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Fri, 22 Jan 2021 at 04:46, Justin Pryzby wrote: > > I think you'll maybe have to do something better - this seems a bit too weird: > > | postgres=# CREATE STATISTICS s2 ON (i+1) ,i FROM t; > | postgres=# \d t > | ... > | "public"."s2" (ndistinct, dependencies, mcv) ON i FROM t > I guess that's not surprising, given that old psql knows nothing about expressions in stats. In general, I think connecting old versions of psql to newer servers is not supported. You're lucky if \d works at all. So it shouldn't be this patch's responsibility to make that output nicer. Regards, Dean
Re: Schema variables - new implementation for Postgres 15
On Wed, 3 Nov 2021 at 13:05, Tomas Vondra wrote: > > 2) I find this a bit confusing: > > SELECT non_existent_variable; > test=# select s; > ERROR: column "non_existent_variable" does not exist > LINE 1: select non_existent_variable; > > I wonder if this means using SELECT to read variables is a bad idea, and > we should have a separate command, just like we have LET (instead of > just using UPDATE in some way). > Hmm. This way of reading variables worries me for a different reason -- I think it makes it all too easy to break existing applications by inadvertently (or deliberately) defining variables that conflict with column names referred to in existing queries. For example, if I define a variable called "relkind", then psql's \sv meta-command is broken because the query it performs can't distinguish between the column and the variable. Similarly, there's ambiguity between alias.colname and schema.variablename. So, for example, if I do the following: CREATE SCHEMA n; CREATE VARIABLE n.nspname AS int; then lots of things are broken, including pg_dump and a number of psql meta-commands. I don't think it's acceptable to make it so easy for a user to break the system in this way. Those are examples that a malicious user might use, but even without such examples, I think it would be far too easy to inadvertently break a large application by defining a variable that conflicted with a column name you didn't know about. Regards, Dean
Re: Schema variables - new implementation for Postgres 15
On Thu, 13 Jan 2022 at 17:42, Pavel Stehule wrote: > > I like the idea of prioritizing tables over variables with warnings when > collision is detected. It cannot break anything. And it allows to using short > identifiers when there is not collision. Yeah, that seems OK, as long as it's clearly documented. I don't think a warning is necessary. (FWIW, testing with dbfiddle, that appears to match Db2's behaviour). Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Sun, 7 Mar 2021 at 21:10, Tomas Vondra wrote: > > 2) ndistinct > > There's one thing that's bugging me, in how we handle "partial" matches. > For each expression we track both the original expression and the Vars > we extract from it. If we can't find a statistics matching the whole > expression, we try to match those individual Vars, and we remove the > matching ones from the list. And in the end we multiply the estimates > for the remaining Vars. > > This works fine with one matching ndistinct statistics. Consider for example > > GROUP BY (a+b), (c+d) > > with statistics on [(a+b),c] - that is, expression and one column. I've just been going over this example, and I think it actually works slightly differently from how you described, though I haven't worked out the full general implications of that. > We parse the expressions into two GroupExprInfo > > {expr: (a+b), vars: [a, b]} > {expr: (c+d), vars: [c, d]} > Here, I think what you actually get, in the presence of stats on [(a+b),c] is actually the following two GroupExprInfos: {expr: (a+b), vars: []} {expr: (c+d), vars: [c, d]} because of the code immediately after this comment in estimate_num_groups(): /* * If examine_variable is able to deduce anything about the GROUP BY * expression, treat it as a single variable even if it's really more * complicated. */ As it happens, that makes no difference in this case, where there is just this one stats object, but it does change things when there are two stats objects. > and the statistics matches the first item exactly (the expression). The > second expression is not in the statistics, but we match "c". So we end > up with an estimate for "(a+b), c" and have one remaining GroupExprInfo: > > {expr: (c+d), vars: [d]} Right. > Without any other statistics we estimate that as ndistinct for "d", so > we end up with > > ndistinct((a+b), c) * ndistinct(d) > > which mostly makes sense. It assumes ndistinct(c+d) is product of the > ndistinct estimates, but that's kinda what we've been always doing. Yes, that appears to be what happens, and it's probably the best that can be done with the available stats. > But now consider we have another statistics on just (c+d). In the second > loop we end up matching this expression exactly, so we end up with > > ndistinct((a+b), c) * ndistinct((c+d)) In this case, with stats on (c+d) as well, the two GroupExprInfos built at the start change to: {expr: (a+b), vars: []} {expr: (c+d), vars: []} so it actually ends up not using any multivariate stats, but instead uses the two univariate expression stats, giving ndistinct((a+b)) * ndistinct((c+d)) which actually seems pretty good, and gave very good estimates in the simple test case I tried. > i.e. we kinda use the "c" twice. Which is a bit unfortunate. I think > what we should do after the first loop is just discarding the whole > expression and "expand" into per-variable GroupExprInfo, so in the > second step we would not match the (c+d) statistics. Not using the (c+d) stats would give either ndistinct((a+b)) * ndistinct(c) * ndistinct(d) or ndistinct((a+b), c) * ndistinct(d) depending on exactly how the algorithm was changed. In my test, these both gave worse estimates, but there are probably other datasets for which they might do better. It all depends on how much correlation there is between (a+b) and c. I suspect that there is no optimal strategy for handling overlapping stats that works best for all datasets, but the current algorithm seems to do a pretty decent job. > Of course, maybe there's a better way to pick the statistics, but I > think our conclusion so far was that people should just create > statistics covering all the columns in the query, to not have to match > multiple statistics like this. Yes, I think that's always likely to work better, especially for ndistinct stats, where all possible permutations of subsets of the columns are included, so a single ndistinct stat can work well for a range of different queries. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Wed, 17 Mar 2021 at 17:26, Tomas Vondra wrote: > > My concern is that the current behavior (where we prefer expression > stats over multi-column stats to some extent) works fine as long as the > parts are independent, but once there's dependency it's probably more > likely to produce underestimates. I think underestimates for grouping > estimates were a risk in the past, so let's not make that worse. > I'm not sure the current behaviour really is preferring expression stats over multi-column stats. In this example, where we're grouping by (a+b), (c+d) and have stats on [(a+b),c] and (c+d), neither of those multi-column stats actually match more than one column/expression. If anything, I'd go the other way and say that it was wrong to use the [(a+b),c] stats in the first case, where they were the only stats available, since those stats aren't really applicable to (c+d), which probably ought to be treated as independent. IOW, it might have been better to estimate the first case as ndistinct((a+b)) * ndistinct(c) * ndistinct(d) and the second case as ndistinct((a+b)) * ndistinct((c+d)) Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Wed, 17 Mar 2021 at 19:07, Tomas Vondra wrote: > > On 3/17/21 7:54 PM, Dean Rasheed wrote: > > > > it might have been better to estimate the first case as > > > > ndistinct((a+b)) * ndistinct(c) * ndistinct(d) > > > > and the second case as > > > > ndistinct((a+b)) * ndistinct((c+d)) > > OK. I might be confused, but isn't that what the algorithm currently > does? Or am I just confused about what the first/second case refers to? > No, it currently estimates the first case as ndistinct((a+b),c) * ndistinct(d). Having said that, maybe that's OK after all. It at least makes an effort to account for any correlation between (a+b) and (c+d), using the known correlation between (a+b) and c. For reference, here is the test case I was using (which isn't really very good for catching dependence between columns): DROP TABLE IF EXISTS foo; CREATE TABLE foo (a int, b int, c int, d int); INSERT INTO foo SELECT x%10, x%11, x%12, x%13 FROM generate_series(1,10) x; SELECT COUNT(DISTINCT a) FROM foo; -- 10 SELECT COUNT(DISTINCT b) FROM foo; -- 11 SELECT COUNT(DISTINCT c) FROM foo; -- 12 SELECT COUNT(DISTINCT d) FROM foo; -- 13 SELECT COUNT(DISTINCT (a+b)) FROM foo; -- 20 SELECT COUNT(DISTINCT (c+d)) FROM foo; -- 24 SELECT COUNT(DISTINCT ((a+b),c)) FROM foo; -- 228 SELECT COUNT(DISTINCT ((a+b),(c+d))) FROM foo; -- 478 -- First case: stats on [(a+b),c] CREATE STATISTICS s1(ndistinct) ON (a+b),c FROM foo; ANALYSE foo; EXPLAIN ANALYSE SELECT (a+b), (c+d) FROM foo GROUP BY (a+b), (c+d); -- Estimate = 2964, Actual = 478 -- This estimate is ndistinct((a+b),c) * ndistinct(d) = 228*13 -- Second case: stats on (c+d) as well CREATE STATISTICS s2 ON (c+d) FROM foo; ANALYSE foo; EXPLAIN ANALYSE SELECT (a+b), (c+d) FROM foo GROUP BY (a+b), (c+d); -- Estimate = 480, Actual = 478 -- This estimate is ndistinct((a+b)) * ndistinct((c+d)) = 20*24 I think that's probably pretty reasonable behaviour, given incomplete stats (the estimate with no extended stats is capped at 1). Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Wed, 17 Mar 2021 at 20:48, Dean Rasheed wrote: > > For reference, here is the test case I was using (which isn't really very > good for > catching dependence between columns): > And here's a test case with much more dependence between the columns: DROP TABLE IF EXISTS foo; CREATE TABLE foo (a int, b int, c int, d int); INSERT INTO foo SELECT x%2, x%5, x%10, x%15 FROM generate_series(1,10) x; SELECT COUNT(DISTINCT a) FROM foo; -- 2 SELECT COUNT(DISTINCT b) FROM foo; -- 5 SELECT COUNT(DISTINCT c) FROM foo; -- 10 SELECT COUNT(DISTINCT d) FROM foo; -- 15 SELECT COUNT(DISTINCT (a+b)) FROM foo; -- 6 SELECT COUNT(DISTINCT (c+d)) FROM foo; -- 20 SELECT COUNT(DISTINCT ((a+b),c)) FROM foo; -- 10 SELECT COUNT(DISTINCT ((a+b),(c+d))) FROM foo; -- 30 -- First case: stats on [(a+b),c] CREATE STATISTICS s1(ndistinct) ON (a+b),c FROM foo; ANALYSE foo; EXPLAIN ANALYSE SELECT (a+b), (c+d) FROM foo GROUP BY (a+b), (c+d); -- Estimate = 150, Actual = 30 -- This estimate is ndistinct((a+b),c) * ndistinct(d) = 10*15, -- which is much better than ndistinct((a+b)) * ndistinct(c) * ndistinct(d) = 6*10*15 = 900 -- Estimate with no stats = 1500 -- Second case: stats on (c+d) as well CREATE STATISTICS s2 ON (c+d) FROM foo; ANALYSE foo; EXPLAIN ANALYSE SELECT (a+b), (c+d) FROM foo GROUP BY (a+b), (c+d); -- Estimate = 120, Actual = 30 -- This estimate is ndistinct((a+b)) * ndistinct((c+d)) = 6*20 Again, I'd say the current behaviour is pretty good. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Wed, 17 Mar 2021 at 21:31, Tomas Vondra wrote: > > I agree applying at least the [(a+b),c] stats is probably the right > approach, as it means we're considering at least the available > information about dependence between the columns. > > I think to improve this, we'll need to teach the code to use overlapping > statistics, a bit like conditional probability. In this case we might do > something like this: > > ndistinct((a+b),c) * (ndistinct((c+d)) / ndistinct(c)) Yes, I was thinking the same thing. That would be equivalent to applying a multiplicative "correction" factor of ndistinct(a,b,c,...) / ( ndistinct(a) * ndistinct(b) * ndistinct(c) * ... ) for each multivariate stat applicable to more than one column/expression, regardless of whether those columns were already covered by other multivariate stats. That might well simplify the implementation, as well as probably produce better estimates. > But that's clearly a matter for a future patch, and I'm sure there are > cases where this will produce worse estimates. Agreed. > Anyway, I plan to go over the patches one more time, and start pushing > them sometime early next week. I don't want to leave it until the very > last moment in the CF. +1. I think they're in good enough shape for that process to start. Regards, Dean
Re: pgbench - add pseudo-random permutation function
On Sun, 14 Mar 2021 at 16:08, Fabien COELHO wrote: > > > My main question on this now is, do you have a scholar reference for > > this algorithm? > > Nope, otherwise I would have put a reference. I'm a scholar though, if > it helps:-) > > I could not find any algorithm that fitted the bill. The usual approach > (eg benchmarking designs) is too use some hash function and assume that it > is not a permutation, too bad. > > Basically the algorithm mimics a few rounds of cryptographic encryption > adapted to any size and simple operators, whereas encryption function are > restricted to power of two blocks (eg the Feistel network). The structure > is the same AES with its AddRoundKey the xor-ing stage (adapted to non > power of two in whiten above), MixColumns which does the scattering, and > for key expansion, I used Donald Knuth generator. Basically one could say > that permute is an inexpensive and insecure AES:-) > I spent a little time playing around with this, trying to come up with a reasonable way to test it. It's apparent from the code and associated comments that the transformation is bijective and so will produce a permutation, but it's less obvious how random that permutation will be. Obviously the Fisher-Yates algorithm would give the most random permutation, but that's not really practical in this context. Any other algorithm will, in some sense, be less random than that, so I think it's really just a question of testing that it's random enough to satisfy the intended use case. The approach to testing I tried was to use the Kolmogorov-Smirnov test to test how uniformly random a couple of quantities are: 1). For a given size and fixed input value, and a large number of randomly chosen seed values, is the output uniformly random? 2). For a given size and a pair of nearby input values, are the two output values uniformly randomly spaced apart? This second test seems desirable to ensure sufficient shuffling so that inputs coming from some non-uniform random distribution are scattered sufficiently to distribute the maxima/minima (x -> x + rand mod size would pass test 1, so that by itself is insufficient). I tested this using the attached (somewhat ugly) stand-alone test C program, and for the most part these 2 tests seemed to pass. The program also tests that the output really is a permutation, just to be sure, and this was confirmed in all cases. However, I noticed that the results are less good when size is a power of 2. In this case, the results seem significantly less uniform, suggesting that for such sizes, there is an increased chance that the permuted output might still be skewed. Based on the above, I also experimented with a different permutation algorithm (permute2() in the test), which attempts to inject more randomness into the bijection, and between pairs of inputs, to make the output less predictable and less likely to be accidentally non-uniform. It's possible that it suffers from other problems, but it did do better in these 2 tests. Maybe some other tests might be useful, but I really don't know. As noted above, any algorithm is likely to lead to some pattern in the output (e.g., it looks like both these algorithms cause adjacent inputs to always end up separated by an odd number), so a judgement call will be required to decide what is random enough. Regards, Dean #include #include #include typedef unsigned char uint8; typedef long int64; typedef unsigned long uint64; typedef unsigned __int128 uint128; typedef int64 (*permute_fn)(const int64, const int64, const int64); #define PRP_PRIMES 16 static uint64 primes[PRP_PRIMES] = { 8388617, 8912921, 9437189, 9961487, 10485767, 11010059, 11534351, 12058679, 12582917, 13107229, 13631489, 14155777, 14680067, 15204391, 15728681, 16252967 }; #define PRP_ROUNDS 4 static uint64 compute_mask(uint64 n) { n |= n >> 1; n |= n >> 2; n |= n >> 4; n |= n >> 8; n |= n >> 16; n |= n >> 32; return n; } static uint64 modular_multiply(uint64 x, uint64 y, const uint64 m) { return (uint128) x * (uint128) y % (uint128) m; } #define DK_LCG_MUL 6364136223846793005L #define DK_LCG_INC 1442695040888963407L #define LCG_SHIFT 13 static int64 permute(const int64 data, const int64 isize, const int64 seed) { uint64 size = (uint64) isize; uint64 v = (uint64) data % size; uint64 key = (uint64) seed; uint64 mask = compute_mask(size - 1) >> 1; if (isize == 1) return 0; for (unsigned int i = 0, p = key % PRP_PRIMES; i < PRP_ROUNDS; i++, p = (p + 1) % PRP_PRIMES) { uint64 t; key = key * DK_LCG_MUL + DK_LCG_INC; if (v <= mask) v ^= (key >> LCG_SHIFT) & mask; key
Re: PoC/WIP: Extended statistics on expressions
On Wed, 24 Mar 2021 at 10:22, Tomas Vondra wrote: > > Thanks, it seems to be some thinko in handling in PlaceHolderVars, which > seem to break the code's assumptions about varnos. This fixes it for me, > but I need to look at it more closely. > I think that makes sense. Reviewing the docs, I noticed a couple of omissions, and had a few other suggestions (attached). Regards, Dean diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml new file mode 100644 index dadca67..382cbd7 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -7377,6 +7377,15 @@ SCRAM-SHA-256$<iteration m for most common values (MCV) list statistics + + + + stxexprs pg_node_tree + + + A list of any expressions covered by this statistics object. + + @@ -7474,6 +7483,16 @@ SCRAM-SHA-256$<iteration pg_mcv_list type + + + + stxdexpr pg_statistic[] + + + Per-expression statistics, serialized as an array of + pg_statistic type + + @@ -12843,7 +12862,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p The view pg_stats_ext provides access to - the information stored in the pg_statistic_ext and pg_statistic_ext_data catalogs. This view allows access only to rows of @@ -12930,7 +12950,16 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p (references pg_attribute.attname) - Names of the columns the extended statistics is defined on + Names of the columns included in the extended statistics + + + + + + exprs text[] + + + Expressions included in the extended statistics @@ -13033,7 +13062,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p The view pg_stats_ext_exprs provides access to - the information stored in the pg_statistic_ext and pg_statistic_ext_data catalogs. This view allows access only to rows of @@ -13119,7 +13149,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p expr text - Expression the extended statistics is defined on + Expression included in the extended statistics diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml new file mode 100644 index 5f3aefd..f561599 --- a/doc/src/sgml/ref/create_statistics.sgml +++ b/doc/src/sgml/ref/create_statistics.sgml @@ -27,7 +27,7 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_name [ ( statistics_kind [, ... ] ) ] -ON { column_name | ( expression ) } [, ...] +ON { column_name | ( expression ) }, { column_name | ( expression ) } [, ...] FROM table_name @@ -45,12 +45,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] The CREATE STATISTICS command has two basic forms. The - simple variant allows building statistics for a single expression, does - not allow specifying any statistics kinds and provides benefits similar - to an expression index. The full variant allows defining statistics objects - on multiple columns and expressions, and selecting which statistics kinds will - be built. The per-expression statistics are built automatically when there - is at least one expression. + first form allows univariate statistics for a single expression to be + collected, providing benefits similar to an expression index without the + overhead of index maintenance. This form does not allow the statistics + kind to be specified, since the various statistics kinds refer only to + multivariate statistics. The second form of the command allows + multivariate statistics on multiple columns and/or expressions to be + collected, optionally specifying which statistics kinds to include. This + form will also automatically cause univariate statistics to be collected on + any expressions included in the list. @@ -93,16 +96,16 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_kind - A statistics kind to be computed in this statistics object. + A multivariate statistics kind to be computed in this statistics object. Currently supported kinds are ndistinct, which enables n-distinct statistics, dependencies, which enables functional dependency statistics, and mcv which enables most-common values lists. If this clause is omitted, all supported statistics kinds are - included in the statistics object. Expression statistics are built - automatically when the statistics definition includes complex - expressions and not just simple column references. + included in the statistics object. Univariate expression statistics are + built automatically if the statistics definition includes any complex + expressions rather than just simple column references. For more information, see and . @@ -114,8 +117,9 @@ CREATE
Re: PoC/WIP: Extended statistics on expressions
On Wed, 24 Mar 2021 at 14:48, Tomas Vondra wrote: > > AFAIK the primary issue here is that the two places disagree. While > estimate_num_groups does this > > varnos = pull_varnos(root, (Node *) varshere); > if (bms_membership(varnos) == BMS_SINGLETON) > { ... } > > the add_unique_group_expr does this > > varnos = pull_varnos(root, (Node *) groupexpr); > > That is, one looks at the group expression, while the other look at vars > extracted from it by pull_var_clause(). Apparently for PlaceHolderVar > this can differ, causing the crash. > > So we need to change one of those places - my fix tweaked the second > place to also look at the vars, but maybe we should change the other > place? Or maybe it's not the right fix for PlaceHolderVars ... > I think that it doesn't make any difference which place is changed. This is a case of an expression with no stats. With your change, you'll get a single GroupExprInfo containing a list of VariableStatData's for each of it's Var's, whereas if you changed it the other way, you'd get a separate GroupExprInfo for each Var. But I think they'd both end up being treated the same by estimate_multivariate_ndistinct(), since there wouldn't be any stats matching the expression, only the individual Var's. Maybe changing the first place would be the more bulletproof fix though. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Wed, 24 Mar 2021 at 16:48, Tomas Vondra wrote: > > As for the changes proposed in the create_statistics, do we really want > to use univariate / multivariate there? Yes, the terms are correct, but > I'm not sure how many people looking at CREATE STATISTICS will > understand them. > Hmm, I think "univariate" and "multivariate" are pretty ubiquitous, when used to describe statistics. You could use "single-column" and "multi-column", but then "column" isn't really right anymore, since it might be a column or an expression. I can't think of any other terms that fit. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra wrote: > > Actually, I think we need that block at all - there's no point in > keeping the exact expression, because if there was a statistics matching > it it'd be matched by the examine_variable. So if we get here, we have > to just split it into the vars anyway. So the second block is entirely > useless. Good point. > That however means we don't need the processing with GroupExprInfo and > GroupVarInfo lists, i.e. we can revert back to the original simpler > processing, with a bit of extra logic to match expressions, that's all. > > The patch 0003 does this (it's a bit crude, but hopefully enough to > demonstrate). Cool. I did wonder about that, but I didn't fully think it through. I'll take a look. > 0002 is an attempt to fix an issue I noticed today - we need to handle > type changes. > > I think we have two options: > > a) Make UpdateStatisticsForTypeChange smarter to also transform and > update the expression string, and reset pg_statistics[] data. > > b) Just recreate the statistics, just like we do for indexes. Currently > this does not force analyze, so it just resets all the stats. Maybe it > should do analyze, though. I'd vote for (b) without an analyse, and I agree with getting rid of UpdateStatisticsForTypeChange(). I've always been a bit skeptical about trying to preserve extended statistics after a type change, when we don't preserve regular per-column stats. > BTW I wonder how useful the updated statistics actually is. Consider > this example: > ... > the expression now looks like this: > > > "public"."s" (ndistinct) ON ((a + b)), b)::numeric)::double > precision + c)) FROM t > > > But we're matching it to (((b)::double precision + c)), so that fails. > > This is not specific to extended statistics - indexes have exactly the > same issue. Not sure how common this is in practice. Hmm, that's unfortunate. Maybe it's not that common in practice though. I'm not sure if there is any practical way to fix it, but if there is, I guess we'd want to apply the same fix to both stats and indexes, and that certainly seems out of scope for this patch. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra wrote: > > here's an updated patch. 0001 The change to the way that CreateStatistics() records dependencies isn't quite right -- recordDependencyOnSingleRelExpr() will not create any dependencies if the expression uses only a whole-row Var. However, pull_varattnos() will include whole-row Vars, and so nattnums_exprs will be non-zero, and CreateStatistics() will not create a whole-table dependency when it should. I suppose that could be fixed up by inspecting the bitmapset returned by pull_varattnos() in more detail, but I think it's probably safer to revert to the previous code, which matched what index_create() did. Regards, Dean
Re: PoC/WIP: Extended statistics on expressions
On Thu, 25 Mar 2021 at 19:59, Tomas Vondra wrote: > > Attached is an updated patch series, with all the changes discussed > here. I've cleaned up the ndistinct stuff a bit more (essentially > reverting back from GroupExprInfo to GroupVarInfo name), and got rid of > the UpdateStatisticsForTypeChange. > I've looked over all that and I think it's in pretty good shape. I particularly like how much simpler the ndistinct code has now become. Some (hopefully final) review comments: 1). I don't think index.c is the right place for StatisticsGetRelation(). I appreciate that it is very similar to the adjacent IndexGetRelation() function, but index.c is really only for index-related code, so I think StatisticsGetRelation() should go in statscmds.c 2). Perhaps the error message at statscmds.c:293 should read "expression cannot be used in multivariate statistics because its type %s has no default btree operator class" (i.e., add the word "multivariate", since the same expression *can* be used in univariate statistics even though it has no less-than operator). 3). The comment for ATExecAddStatistics() should probably mention that "ALTER TABLE ADD STATISTICS" isn't a command in the grammar, in a similar way to other similar functions, e.g.: /* * ALTER TABLE ADD STATISTICS * * This is no such command in the grammar, but we use this internally to add * AT_ReAddStatistics subcommands to rebuild extended statistics after a table * column type change. */ 4). The comment at the start of ATPostAlterTypeParse() needs updating to mention CREATE STATISTICS statements. 5). I think ATPostAlterTypeParse() should also attempt to preserve any COMMENTs attached to statistics objects, i.e., something like: --- src/backend/commands/tablecmds.c.orig2021-03-26 10:39:38.328631864 + +++ src/backend/commands/tablecmds.c2021-03-26 10:47:21.042279580 + @@ -12619,6 +12619,9 @@ CreateStatsStmt *stmt = (CreateStatsStmt *) stm; AlterTableCmd *newcmd; +/* keep the statistics object's comment */ +stmt->stxcomment = GetComment(oldId, StatisticExtRelationId, 0); + newcmd = makeNode(AlterTableCmd); newcmd->subtype = AT_ReAddStatistics; newcmd->def = (Node *) stmt; 6). Comment typo at extended_stats.c:2532 - s/statitics/statistics/ 7). I don't think that the big XXX comment near the start of estimate_multivariate_ndistinct() is really relevant anymore, now that the code has been simplified and we no longer extract Vars from expressions, so perhaps it can just be deleted. Regards, Dean
Re: pgbench - add pseudo-random permutation function
On Mon, 22 Mar 2021 at 13:43, Dean Rasheed wrote: > > On Sun, 14 Mar 2021 at 16:08, Fabien COELHO wrote: > > > > > My main question on this now is, do you have a scholar reference for > > > this algorithm? > > > > Nope, otherwise I would have put a reference. I'm a scholar though, if > > it helps:-) > > > > I could not find any algorithm that fitted the bill. The usual approach > > (eg benchmarking designs) is too use some hash function and assume that it > > is not a permutation, too bad. > > > > Basically the algorithm mimics a few rounds of cryptographic encryption > > adapted to any size and simple operators, whereas encryption function are > > restricted to power of two blocks (eg the Feistel network). The structure > > is the same AES with its AddRoundKey the xor-ing stage (adapted to non > > power of two in whiten above), MixColumns which does the scattering, and > > for key expansion, I used Donald Knuth generator. Basically one could say > > that permute is an inexpensive and insecure AES:-) > > > > I spent a little time playing around with this, trying to come up with > a reasonable way to test it. I spent more time testing this -- the previous testing was mostly for large values of size, so I decided to also test it for small sizes. The theoretical number of possible permutations grows rapidly with size (size!), and the actual number of permutations observed grew almost as quickly: size size!distinct perms 2 22 3 66 4 24 24 5 120 120 6 720 720 7 5040 5040 8 4032024382 9 362880 181440 103628800 3627355 My test script stopped once the first permutation had been seen 100 times, so it might have seen more permutations had it kept going for longer. However, looking at the actual output, there is a very significant non-uniformity in the probability of particular permutations being chosen, especially at certain sizes like 8 and 10. For example, at size = 8, the identity permutation is significantly more likely than almost any other permutation (roughly 13 times more likely than it would be by random chance). Additionally, the signs are that this non-uniformity tends to increase with size. At size = 10, the average number of occurrences of each permutation in the test was 148, but there were some that didn't appear at all, many that only appeared 2 or 3 times, and then some that appeared nearly 5000 times. Also, there appears to be an unfortunate dependence on the seed -- if the seed is even and the size is a power of 2, it looks like the number of distinct permutations produced is just size/2, which is a tiny fraction of size!. This, together with the results from the previous K-S uniformity tests at larger sizes, suggests that the current algorithm may not be random enough to scatter values and remove correlations from a non-uniform distribution. In an attempt to do better, I tweaked the algorithm in the attached patch, which makes use of erand48() to inject more randomness into the permutation choice. Repeating the tests with the updated algorithm shows that it has a number of advantages: 1). For small sizes (2-10), each of the size! possible permutations is now produced with roughly equal probability. No single permutation was much more likely than expected. 2). The loss of randomness for even seeds is gone. 3). For any given input, the output is more uniformly randomly distributed for different seeds. 4). For any pair of nearby inputs, the corresponding outputs are more uniformly randomly separated from one another. 5). The updated algorithm no longer uses modular_multiply(), so it works the same on all platforms. I still feel slightly uneasy about inventing our own algorithm here, but I wasn't able to find anything else suitable, and I've now tested this quite extensively. All the indications are that this updated algorithm works well at all sizes and produces sufficiently random results, though if anyone else can think of other approaches to testing the core algorithm, that would be useful. For reference, I attach 2 standalone test C programs I used for testing, which include copies of the old and new algorithms. I also did a quick copy editing pass over the docs, and I think the patch is in pretty good shape. Barring objections, I'll see about committing it later this week. Regards, Dean diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml new file mode 100644 index 50cf22b..84d9566 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1057,7 +1057,7 @@ pgbench options< default_seed - seed used in hash functions by default + seed used in hash and pseudorandom permutation functions by default @@ -1866,6 +1866,24 @@ SELECT
Re: pgbench - add pseudo-random permutation function
On Tue, 30 Mar 2021 at 19:26, Fabien COELHO wrote: > > First, I have a thing against erand48. Yeah, that's probably a fair point. However, all the existing pgbench random functions are using it, so I think it's fair enough for permute() to do the same (and actually 2^48 is pretty huge). Switching to a 64-bit PRNG might not be a bad idea, but I think that's something we'd want to do across the board, and so I think it should be out of scope for this patch. > Second, I have a significant reservation about the very structure of the > transformation in this version: > >loop 4 times : > > // FIRST HALF STEER > m/r = pseudo randoms > if v in first "half" > v = ((v * m) ^ r) & mask; > rotate1(v) > > // FULL SHIFT 1 > r = pseudo random > v = (v + r) % size > > // SECOND HALF STEER > m/r = pseudo randoms > if v in second "half" > same as previous on second half > > // FULL SHIFT 2 > r = pseudo random > v = (v + r) % size > > I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of > values are kept out of STEERING. Whole chunks of values could be kept > unshuffled because they would only have SHIFTS apply to them and each time > fall in the not steered half. It should be an essential part of the design > that at least one steer is applied on a value at each round, and if two > are applied then fine, but certainly not zero. So basically I think that > the design would be significantly improved by removing "FULL SHIFT 1". Ah, that's a good point. Something else that also concerned me there was that it might lead to 2 consecutive full shifts with nothing in between, which would lead to less uniform randomness (like the Irwin-Hall distribution). I just did a quick test without the first full shift, and the results do appear to be better, so removing that looks like a good idea. > Third, I think that the rotate code can be simplified, in particular the > ?: should be avoided because it may induce branches quite damaging to > processor performance. Yeah, I wondered about that. Perhaps there's a "trick" that can be used to simplify it. Pre-computing the number of bits in the mask would probably help. I'll give it some thought. Regards, Dean
Re: pgbench - add pseudo-random permutation function
On Tue, 30 Mar 2021 at 20:31, Dean Rasheed wrote: > > Yeah, that's probably a fair point. However, all the existing pgbench > random functions are using it, so I think it's fair enough for > permute() to do the same (and actually 2^48 is pretty huge). Switching > to a 64-bit PRNG might not be a bad idea, but I think that's something > we'd want to do across the board, and so I think it should be out of > scope for this patch. > Of course the immediate counter-argument to changing the existing random functions would be that doing so would break lots of people's tests, and no one would thank us for that. Still, I think that, since the existing random functions use a 48-bit PRNG, it's not unreasonable for permute() to do the same. Regards, Dean
Re: pgbench - add pseudo-random permutation function
On Wed, 31 Mar 2021 at 09:02, Fabien COELHO wrote: > > >> First, I have a thing against erand48. > > > Also, there is a 64 bits seed provided to the function which instantly > ignores 16 of them, which looks pretty silly to me. > Yeah, that was copied from set_random_seed(). > At least, I suggest that two 48-bits prng could be initialized with parts > of the seed and used in different places, eg for r & m. > That could work. I'd certainly feel better about that than implementing a whole new PRNG. > Also, the seed could be used to adjust the rotation, maybe. > Perhaps. I'm not sure it's really necessary though. > >> I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of > >> values are kept out of STEERING. [...] > > > > Ah, that's a good point. Something else that also concerned me there was > > that it might lead to 2 consecutive full shifts with nothing in between, > > which would lead to less uniform randomness (like the Irwin-Hall > > distribution). I just did a quick test without the first full shift, and > > the results do appear to be better, > > Indeed, it makes sense to me. > OK, attached is an update making this change and simplifying the rotate code, which hopefully just leaves the question of what (if anything) to do with pg_erand48(). > >> Third, I think that the rotate code can be simplified, in particular > >> the ?: should be avoided because it may induce branches quite damaging > >> to processor performance. > > > > Yeah, I wondered about that. Perhaps there's a "trick" that can be > > used to simplify it. Pre-computing the number of bits in the mask > > would probably help. > > See pg_popcount64(). > Actually, I used pg_leftmost_one_pos64() to calculate the mask length, allowing the mask to be computed from that, so there is no longer a need for compute_mask(), which seems like a neat little simplification. Regards, Dean diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml new file mode 100644 index 50cf22b..84d9566 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1057,7 +1057,7 @@ pgbench options< default_seed - seed used in hash functions by default + seed used in hash and pseudorandom permutation functions by default @@ -1866,6 +1866,24 @@ SELECT 4 AS four \; SELECT 5 AS five \as +permute ( i, size [, seed ] ) +integer + + +Permuted value of i, in the range +[0, size). This is the new position of +i (modulo size) in a +pseudorandom permutation of the integers 0...size-1, +parameterized by seed. + + +permute(0, 4) +an integer between 0 and 3 + + + + + pi () double @@ -2071,29 +2089,70 @@ f(x) = PHI(2.0 * parameter * (x - mu) / + + + When designing a benchmark which selects rows non-uniformly, be aware + that the rows chosen may be correlated with other data such as IDs from + a sequence or the physical row ordering, which may skew performance + measurements. + + + To avoid this, you may wish to use the permute + function, or some other additional step with similar effect, to shuffle + the selected rows and remove such correlations. + + + Hash functions hash, hash_murmur2 and hash_fnv1a accept an input value and an optional seed parameter. In case the seed isn't provided the value of :default_seed is used, which is initialized randomly unless set by the command-line --D option. Hash functions can be used to scatter the -distribution of random functions such as random_zipfian or -random_exponential. For instance, the following pgbench -script simulates possible real world workload typical for social media and -blogging platforms where few accounts generate excessive load: +-D option. + + + +permute accepts an input value, a size, and an optional +seed parameter. It generates a pseudorandom permutation of integers in +the range [0, size), and returns the index of the input +value in the permuted values. The permutation chosen is parameterized by +the seed, which defaults to :default_seed, if not +specified. Unlike the hash functions, permute ensures +that there are no collisions or holes in the output values. Input values +outside the interval are interpreted modulo the size. The function raises +an error if the size is not positive. permute can be +used to scatter the distribution of non-uniform random functions such as +random_zipfian or random_exponential +so that values drawn more often are n
Re: pgbench - add pseudo-random permutation function
On Wed, 31 Mar 2021 at 18:53, Fabien COELHO wrote: > > While looking at it, I have some doubts on this part: > > m = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)) | 1; > r = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)); > r = (uint64) (pg_erand48(random_state.xseed) * size); > > I do not understand why the random values are multiplied by anything in > the first place… > These are just random integers in the range [0,mask] and [0,size-1], formed in exactly the same way as getrand(). > This one looks like a no-op : > > r = (uint64) (pg_erand48(random_state.xseed) * size); > v = (v + r) % size; > > v = (v + r) % size > = (v + rand * size) % size > =? (v % size + rand * size % size) % size > =? (v % size + 0) % size > = v % size > = v > rand * size % size is not zero because rand is a floating point number in the range [0,1), so actually rand * size % size = rand * size. Similarly in the other case, you're forgetting that rand is not an integer. Thinking more about our use of erand48(), the only real impact it has is to limit the number of possible permutations produced, and actually 2^48 is so huge (roughly 281 million million) that I can't ever see that being an issue in practice. (In a quick dummy test, replacing erand48() with a silly "erand8()" function that only returned one of 256 distinct values, permute() still worked fine at any size, except for the fact that only up to 256 distinct permutations were produced. In other words, limitations on the source of randomness don't prevent it from producing permutations of any size, they just limit the number of distinct permutations possible. And since 2^48 is so big, that shouldn't be an issue.) Also, I think the source of the input seed is most likely to be either manually hand-picked integers or pgbench's own random() function, so the only real issue I can see is that by ignoring the upper 16-bits, there's a very small chance of always using the same random sequence if some hand-picked numbers only vary in the top 16 bits, though I think that's highly unlikely in practice. Nonetheless, it's not much more effort to make another random state and use those remaining bits of the seed and get more internal random states, so here's an update doing that. I intentionally chose to reuse the lower 16 bits of the seed in the second random function (in a different slot of the random state), since those are probably the ones most likely to vary in practice. This doesn't actually make any measurable difference to any of the tests, but it closes that potential loophole of ignoring part of the seed. In all my tests, the biggest improvement was between v23 and v24 of the patch. By comparison, the later versions have been relatively small improvements, and it's probably now "random enough" for the intended purposes. Regards, Dean diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml new file mode 100644 index 50cf22b..84d9566 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1057,7 +1057,7 @@ pgbench options< default_seed - seed used in hash functions by default + seed used in hash and pseudorandom permutation functions by default @@ -1866,6 +1866,24 @@ SELECT 4 AS four \; SELECT 5 AS five \as +permute ( i, size [, seed ] ) +integer + + +Permuted value of i, in the range +[0, size). This is the new position of +i (modulo size) in a +pseudorandom permutation of the integers 0...size-1, +parameterized by seed. + + +permute(0, 4) +an integer between 0 and 3 + + + + + pi () double @@ -2071,29 +2089,70 @@ f(x) = PHI(2.0 * parameter * (x - mu) / + + + When designing a benchmark which selects rows non-uniformly, be aware + that the rows chosen may be correlated with other data such as IDs from + a sequence or the physical row ordering, which may skew performance + measurements. + + + To avoid this, you may wish to use the permute + function, or some other additional step with similar effect, to shuffle + the selected rows and remove such correlations. + + + Hash functions hash, hash_murmur2 and hash_fnv1a accept an input value and an optional seed parameter. In case the seed isn't provided the value of :default_seed is used, which is initialized randomly unless set by the command-line --D option. Hash functions can be used to scatter the -distribution of random functions such as random_zipfian or -random_exponential. For instance, the following pgbench -script simulates possible real w
Re: pgbench - add pseudo-random permutation function
ice some slight non-uniformity in the way adjacent inputs were separated from another when the size was just under a power of 2. I think that's the hardest case for this algorithm, because there's very little overlap between the 2 halves. Increasing the number of rounds from 4 to 6 ironed out that non-uniformity (and as mentioned above, is still cheaper than using randu64() with 4 rounds), so I think we should go with that. > I still think that *rand48 is a poor (relatively small state) and > inefficient (the implementation includes packing and unpacking 16 bits > ints to build a 64 bits int) choice. > I can somewhat understand your dislike of *rand48(), but in the context of pgbench I think that it is perfectly adequate. Since we now use 2 RandomStates, I don't think the state size is an issue anymore, if it ever really was. Using erand48() gives better performance than jrand48() because it returns 48 bits rather than 32, so fewer calls are needed, which allows more rounds for the same cost. Additionally, following the same pattern as existing code reduces the risk of bugs, and builds on proven, tested code. You may wish to submit a separate patch to replace pgbench's use of *rand48() with something else, and that would be discussed on its own merits, but I don't see why that should hold up adding permute(). Regards, Dean
Re: pgbench - add pseudo-random permutation function
On Mon, 5 Apr 2021 at 13:07, Fabien COELHO wrote: > > Attached a v28 which I hope fixes the many above issues and stays with > ints. The randu64 is still some kind of a joke, I artificially reduced the > cost by calling jrand48 once and extending it to 64 bits, so it could give > an idea of the cost endured if a 64-bit prng was used. > > Now you are the committer, you can do as you please, I'm just stating my > (mathematical) opinions about using floating point computations for that. > I think that apart from this point of principle/philosophy the permute > performance and implementation are reasonable, and better than my initial > version because it avoids int128 computations and the large prime number > business. > Pushed. I decided not to go with the "joke" randu64() function, but instead used getrand() directly. This at least removes any *direct* use of doubles in permute() (though of course they're still used indirectly), and means that if getrand() is improved in the future, permute() will benefit too. Regards, Dean
Re: Wrong results from in_range() tests with infinite offset
On Tue, 21 Jul 2020 at 03:06, Tom Lane wrote: > > Pushed, but I chickened out of back-patching. The improvement in what > happens for finite comparison values seems somewhat counterbalanced by > the possibility that someone might not like the definition we arrived > at for infinities. So, it's not quite an open-and-shut bug fix, so > I just put it in HEAD (for now anyway). > Yeah, that seems fair enough, and it's quite an obscure corner-case that has gone unnoticed for quite some time. Regards, Dean
Re: Infinities in type numeric
On Tue, 21 Jul 2020 at 23:18, Tom Lane wrote: > > Here's a v4 that syncs numeric in_range() with the new behavior of > float in_range(), and addresses your other comments too. > LGTM. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 18 March 2018 at 23:57, Tomas Vondra wrote: > Attached is an updated version of the patch series, addressing issues > pointed out by Alvaro. I'm just starting to look at this now, and I think I'll post individual comments/questions as I get to them rather than trying to review the whole thing, because it's quite a large patch. Apologies if some of this has already been discussed. Looking at the changes to UpdateStatisticsForTypeChange(): + memset(nulls, 1, Natts_pg_statistic_ext * sizeof(bool)); why the "1" there -- is it just a typo? A wider concern I have is that I think this function is trying to be too clever by only resetting selected stats. IMO it should just reset all stats unconditionally when the column type changes, which would be consistent with what we do for regular stats. Consider, for example, what would happen if a column was changed from real to int -- all the data values will be coerced to integers, losing precision, and any ndistinct and dependency stats would likely be completely wrong afterwards. IMO that's a bug, and should be back-patched independently of these new types of extended stats. Thoughts? Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 26 March 2018 at 14:08, Tomas Vondra wrote: > On 03/26/2018 12:31 PM, Dean Rasheed wrote: >> A wider concern I have is that I think this function is trying to be >> too clever by only resetting selected stats. IMO it should just reset >> all stats unconditionally when the column type changes, which would >> be consistent with what we do for regular stats. >> > The argument a year ago was that it's more plausible that the semantics > remains the same. I think the question is how the type change affects > precision - had the type change in the opposite direction (int to real) > there would be no problem, because both ndistinct and dependencies would > produce the same statistics. > > In my experience people are far more likely to change data types in a > way that preserves precision, so I think the current behavior is OK. Hmm, I don't really buy that argument. Altering a column's type allows the data in it to be rewritten in arbitrary ways, and I don't think we should presume that the statistics will still be valid just because the user *probably* won't do something that changes the data much. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 18 March 2018 at 23:57, Tomas Vondra wrote: > Attached is an updated version of the patch series, addressing issues > pointed out by Alvaro. I've just been reading the new code in statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and I'm having a hard time convincing myself that it's correct. This code in statext_clauselist_selectivity() looks a bit odd: /* * Evaluate the MCV selectivity. See if we got a full match and the * minimal selectivity. */ if (stat->kind == STATS_EXT_MCV) s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid, jointype, sjinfo, rel, &fullmatch, &mcv_lowsel); /* * If we got a full equality match on the MCV list, we're done (and the * estimate is likely pretty good). */ if (fullmatch && (s1 > 0.0)) return s1; /* * If it's a full match (equalities on all columns) but we haven't found * it in the MCV, then we limit the selectivity by frequency of the last * MCV item. Otherwise reset it to 1.0. */ if (!fullmatch) mcv_lowsel = 1.0; return Min(s1, mcv_lowsel); So if fullmatch is true and s1 is greater than 0, it will return s1. If fullmatch is true and s1 equals 0, it will return Min(s1, mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel will be set to 1 and it will return Min(s1, mcv_lowsel) which will also be s1. So it always just returns s1, no? Maybe there's no point in computing fullmatch. Also, wouldn't mcv_lowsel potentially be a significant overestimate anyway? Perhaps 1 minus the sum of the MCV frequencies might be closer, but even that ought to take into account the number of distinct values remaining, although that information may not always be available. Also, just above that, in statext_clauselist_selectivity(), it computes the list stat_clauses, then doesn't appear to use it anywhere. I think that would have been the appropriate thing to pass to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses into mcv_clauselist_selectivity() will cause it to fail to find any matches and then underestimate. I've also come across a few incorrect/out-of-date comments: /* * mcv_clauselist_selectivity * Return the estimated selectivity of the given clauses using MCV list * statistics, or 1.0 if no useful MCV list statistic exists. */ -- I can't see any code path that returns 1.0 if there are no MCV stats. The last part of that comment is probably more appropriate to statext_clauselist_selectivity() /* * mcv_update_match_bitmap * [snip] * The function returns the number of items currently marked as 'match', and * ... -- it doesn't seem to return the number of items marked as 'match'. Then inside that function, this comment is wrong (copied from the preceding comment): /* AND clauses assume nothing matches, initially */ memset(bool_matches, STATS_MATCH_FULL, sizeof(char) * mcvlist->nitems); Still reading... Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 26 March 2018 at 20:17, Tomas Vondra wrote: > On 03/26/2018 09:01 PM, Dean Rasheed wrote: >> Also, just above that, in statext_clauselist_selectivity(), it >> computes the list stat_clauses, then doesn't appear to use it >> anywhere. I think that would have been the appropriate thing to pass >> to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses >> into mcv_clauselist_selectivity() will cause it to fail to find any >> matches and then underestimate. > > Will check. > Here's a test case demonstrating this bug: drop table if exists foo; create table foo(a int, b int, c int); insert into foo select 0,0,0 from generate_series(1,10); insert into foo select 1,1,1 from generate_series(1,1); insert into foo select 2,2,2 from generate_series(1,1000); insert into foo select 3,3,3 from generate_series(1,100); insert into foo select x,x,x from generate_series(4,1000) g(x); insert into foo select x,x,x from generate_series(4,1000) g(x); insert into foo select x,x,x from generate_series(4,1000) g(x); insert into foo select x,x,x from generate_series(4,1000) g(x); insert into foo select x,x,x from generate_series(4,1000) g(x); analyse foo; explain analyse select * from foo where a=1 and b=1 and c=1; create statistics foo_mcv_ab (mcv) on a,b from foo; analyse foo; explain analyse select * from foo where a=1 and b=1 and c=1; With the multivariate MCV statistics, the estimate gets worse because it passes the c=1 clause to mcv_clauselist_selectivity(), and nothing matches. There's also another bug, arising from the fact that statext_is_compatible_clause() says that NOT clauses are supported, but mcv_clauselist_selectivity() doesn't support them. So with the above table: select * from foo where (a=0 or b=0) and not (b in (1,2)); ERROR: unknown clause type: 111 Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 27 March 2018 at 01:36, Tomas Vondra wrote: > 4) handling of NOT clauses in MCV lists (and in histograms) > > The query you posted does not fail anymore... > Ah, it turns out the previous query wasn't actually failing for the reason I thought it was -- it was failing because it had a ScalarArrayOpExpr that was being passed to mcv_clauselist_selectivity() because of the wrong list being passed to it. I could see from the code that a NOT clause would have tripped it up, but most NOT clauses actually get rewritten by negate_clause() so they end up not being NOT clauses. One way to get a NOT clause, is with a boolean column, and this reveals another couple of problems: drop table if exists foo; create table foo(a int, b boolean); insert into foo values(1,true); insert into foo values(1,true); insert into foo values(1,false); create statistics foo_mcv_ab (mcv) on a,b from foo; analyse foo; select * from foo where a=1 and b; ERROR: unknown clause type: 99 This fails because the clause is now a Var, which statext_is_compatible_clause() lets through, but mcv_clauselist_selectivity() doesn't support. So it's important to keep those 2 functions in sync, and it might be worth having comments in each to emphasise that. And, if a NOT clause is used: select * from foo where a=1 and not b; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. This is an Assert failure in mcv_update_match_bitmap()'s BoolExpr handling block: Assert(bool_clauses != NIL); Assert(list_length(bool_clauses) >= 2); The first of those Asserts is actually redundant with the second, but the second fails because a NOT clause always only has one argument. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 27 March 2018 at 01:36, Tomas Vondra wrote: > BTW I think there's a bug in handling the fullmatch flag - it should not > be passed to AND/OR subclauses the way it is, because then > > WHERE a=1 OR (a=2 AND b=2) > > will probably set it to 'true' because of (a=2 AND b=2). Which will > short-circuit the statext_clauselist_selectivity, forcing it to ignore > the non-MCV part. > I'm not sure that's true. Won't the outer call to mcv_update_match_bitmap() overwrite the value of fullmatch returned by the nested call, and set fullmatch to false because it has only seen 1 attribute equality match? I think that's the correct result, but I think that's just luck. The dubious part is the way fullmatch is calculated for OR clauses -- I think for an OR clause we want to know the attributes matched in *every* subclause, rather than in *any* subclause, as we do for AND. So I think the only way an OR clause at the top-level should return a full match is if every sub-clause was a full match, for example: WHERE (a=1 AND b=2) OR (a=2 AND b=1) But then consider this: WHERE a=1 AND (b=1 OR b=2) That should also potentially be a full match, but that can only work if mcv_update_match_bitmap() returned the set of matching attributes (eqmatches), rather than fullmatch, so that it can be merged appropriately in the caller. So for an OR clause, it needs to return eqmatches containing the list of attributes for which every sub-clause matched with equality against the MCV list, and in an outer AND clause that can be added to the outer eqmatches list, which is the list of attributes for which any sub-clause matched with equality. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 27 March 2018 at 14:58, Dean Rasheed wrote: > On 27 March 2018 at 01:36, Tomas Vondra wrote: >> 4) handling of NOT clauses in MCV lists (and in histograms) >> >> The query you posted does not fail anymore... >> > Ah, it turns out the previous query wasn't actually failing for the > reason I thought it was -- it was failing because it had a > ScalarArrayOpExpr that was being passed to > mcv_clauselist_selectivity() because of the wrong list being passed to > it. I could see from the code that a NOT clause would have tripped it > up, but most NOT clauses actually get rewritten by negate_clause() so > they end up not being NOT clauses. > Thinking about that some, I think that the only NOT clauses this needs to actually worry about are NOTs of boolean Vars. Anything else that this code supports will have been transformed into something other than a NOT before reaching this point. Thus it might be much simpler to handle that as a special case in statext_is_compatible_clause() and mcv_update_match_bitmap(), rather than trying to support general NOT clauses, and going through a recursive call to mcv_update_match_bitmap(), and then having to merge bitmaps. NOT of a boolean Var could then be treated just like var=false, setting the appropriate attribute match entry if it's found in the MCV list. This would allow clauses like (a=1 and NOT b) to be supported, which I don't think currently works, because fullmatch won't get set. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 28 March 2018 at 01:34, Tomas Vondra wrote: > Attached is a patch fixing this. In the end I've decided to keep both > branches - one handling boolean Vars and one for NOT clauses. I think > you're right we can only see (NOT var) cases, but I'm not sure about that. > > For example, what if an operator does not have a negator? Then we can't > transform NOT (a AND b) => (NOT a OR NOT b), I guess. So I kept this for > now, and we can remove this later. > OK, but it's going to have to work harder to set "fullmatch" correctly. If you have a boolean Var clause, which is identical to "bool_var = true", it ought to add to "eqmatches" if true is found in the MCV list. Likewise a boolean Var under a NOT clause is identical to "bool_var = false", so it ought to add to "eqmatches" if false is found in the MCV list. Both those cases would be easy to handle, if general NOT support wasn't required, and you just special-cased "NOT bool_var". If you're going to handle the general case of an arbitrary clause under a NOT, then the recursive call to mcv_update_match_bitmap() would seem to need to know that it's under a NOT (a new "is_not" parameter?), to invert the logic around adding to "eqmatches". That applies to other general OpExpr's too -- for example, "NOT (box_var = ?)" won't be rewritten because there is no box_ne operator, but when mcv_update_match_bitmap() is called recursively with the "box_var = ?", it shouldn't add to "eqmatches", despite this being an EQSEL operator. As mentioned before, I think this whole thing only works if mcv_update_match_bitmap() returns the "eqmatches" list, so that if it is called recursively, it can be merged with the caller's list. What isn't immediately obvious to me is what happens to a NOT clause under another NOT clause, possibly with an AND or OR in-between. Would the "is_not" parameter just flip back to false again? There's also an interesting question around the NullTest clause. Since NULLs are being recorded in the MCV list, shouldn't "IS NULL" be treated as semantically like an equality clause, and cause that attribute to be added to "eqmatches" if NULL is found in the MCV list? > I've also realized that the "fullmatch" flag is somewhat confused, > because some places interpreted it as "there is equality on each > attribute" but in fact it also required an actual MCV match. Yes, I was having similar thoughts. I think "eqmatches" / "fullmatch" probably just wants to track whether there was an exact comparison on all the attributes, not whether or not the value was in the MCV list, because the latter is already available in the "matches" bitmap. Knowing that complete, exact comparisons happened, and it wasn't in the MCV list, makes the "(1 - mcv_totalsel)) / otherdistinct" estimate reasonable. However, I don't think that tracking "eqmatches" or "fullmatch" is sufficient for the general case. For example, for other operators like "!=", "<", "<=", all (or maybe half) the "1 - mcv_totalsel" ought to count towards the selectivity, plus possibly part of the MCV list (e.g., for "<=", using the sum of the matching MCV frequencies plus half the sum of the non-MCV frequencies might be reasonable -- c.f. scalarineqsel()). For an OR clause, you might want to count the number of non-MCV matches, because logically each one adds another "(1 - mcv_totalsel)) / otherdistinct" to the total selectivity. It's not immediately obvious how that can be made to fit into the current code structure. Perhaps it could be made to work by tracking the overall selectivity as it goes along. Or perhaps it could track the count/proportion of non-MCV matches. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 28 March 2018 at 15:50, Tomas Vondra wrote: > After thinking about this a bit more, I'm not sure if updating the info > based on recursive calls makes sense. The fullmatch flag was supposed to > answer a simple question - can there be just a single matching item? > > If there are equality conditions on all columns, there can be just a > single matching item - if we have found it in the MCV (i.e. s1 > 0.0), > then we don't need to inspect the non-MCV part. > > But handling this in recursive manner breaks this entirely, because with > something like > >(a=1) AND (b=1 OR b=2) > > you suddenly can have multiple matching items. Which makes the fullmatch > flag somewhat useless. > > So I think we should be looking at top-level equality clauses only, just > like number_of_groups() does. > I'm not quite sure what you mean by that, but it sounds a bit limiting in terms of the kinds of user queries that would be supported. > I think we can remove the fullmatch flag from mcv_update_bitmap > entirely. All we need to know is the presence of equality clauses and > whether there was a match in MCV (which we know from s1 > 0.0). > I agree with removing the fullmatch flag, but I don't think we actually need to know about the presence of equality clauses: The way that mcv_update_bitmap() recursively computes the set of matching MCVs seems to be correct. That gives us a value (call it mcv_matchsel) for the proportion of the table's rows that are in the MCV list and satisfy the clauses in stat_clauses. We can also estimate that there are (1-mcv_totalsel)*N rows that are not in the MCV list, for which the MCV stats therefore tell us nothing. The best way to estimate those rows would seem to be to use the logic from the guts of clauselist_selectivity(), without consulting any extended MCV stats (but still consulting other extended stats, I think). Doing that would return a selectivity value (call it nonmcv_sel) for those remaining rows. Then a reasonable estimate for the overall selectivity would seem to be mcv_matchsel + (1-mcv_totalsel) * nonmcv_sel and there would be no need for mcv_update_bitmap() to track eqmatches or return fullmatch, and it wouldn't actually matter whether or not we had equality clauses or if all the MCV columns were used. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 7 April 2018 at 15:12, Bruce Momjian wrote: > Uh, where are we on this patch? It isn't going to make it into PG 11? > Feature development for this has been going on for years. Unfortunately, I think there's no way that this will be ready for PG11. So far, I have only read parts of the patch, focusing mainly on the planner changes, and how it will make use of the new statistics. I think there are still issues to work out in that area, although I haven't read the latest version yet, I have some doubts about the new approach described. Looking back at the history of this patch, it appears to have moved through all 4 PG11 commitfests with fairly minimal review, never becoming Ready for Committer. That's a real shame because I agree that it's an important feature, but IMO it's not yet in a committable state. I feel bad saying that, because the patch hasn't really had a fair chance to-date, despite Tomas' efforts and quick responses to all review comments. At this stage though, all I can say is that I'll make every effort to keep reviewing it, and I hope Tomas will persist, so that it has a better chance in PG12. Regards, Dean
Re: BF failure: could not open relation with OID XXXX while querying pg_views
On Sat, 14 Sep 2019 at 05:25, Tom Lane wrote: > > Tomas Vondra writes: > > On Wed, Aug 14, 2019 at 05:24:26PM +1200, Thomas Munro wrote: > >> On Wed, Aug 14, 2019 at 5:06 PM Tom Lane wrote: > >>> Oh, hmm --- yeah, that should mean it's safe. Maybe somebody incautiously > >>> changed one of the other tests that run concurrently with "rules"? > > >> Looks like stats_ext.sql could be the problem. It creates and drops > >> priv_test_view, not in a schema. Adding Dean, author of commit > >> d7f8d26d. > > > Yeah, that seems like it might be the cause. I'll take a look at fixing > > this, probably by creating the view in a different schema. > > Ping? We're still getting intermittent failures of this ilk, eg > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2019-09-14%2003%3A37%3A03 > > With v12 release approaching, I'd like to not have failures > like this in a released branch. > Ah sorry, I missed this thread before. As author of that commit, it's really on me to fix it, and the cause seems pretty clear-cut, so I'll aim to get that done today. Regards, Dean
Re: BF failure: could not open relation with OID XXXX while querying pg_views
On Sun, 15 Sep 2019 at 11:11, Tomas Vondra wrote: > > On Sun, Sep 15, 2019 at 10:16:30AM +0100, Dean Rasheed wrote: > > > >Ah sorry, I missed this thread before. As author of that commit, it's > >really on me to fix it, and the cause seems pretty clear-cut, so I'll > >aim to get that done today. > > FWIW here is a draft patch that I was going to propose - it simply moves > the table+view into a "tststats" schema. I suppose that's rougly what we > discussed earlier in this thread. > Yes, that matches what I just drafted. Do you want to push it, or shall I? Regards, Dean
Re: BF failure: could not open relation with OID XXXX while querying pg_views
On Sun, 15 Sep 2019 at 12:20, Tomas Vondra wrote: > > On Sun, Sep 15, 2019 at 11:27:19AM +0100, Dean Rasheed wrote: > >On Sun, 15 Sep 2019 at 11:11, Tomas Vondra > >wrote: > >> > >> On Sun, Sep 15, 2019 at 10:16:30AM +0100, Dean Rasheed wrote: > >> > > >> >Ah sorry, I missed this thread before. As author of that commit, it's > >> >really on me to fix it, and the cause seems pretty clear-cut, so I'll > >> >aim to get that done today. > >> > >> FWIW here is a draft patch that I was going to propose - it simply moves > >> the table+view into a "tststats" schema. I suppose that's rougly what we > >> discussed earlier in this thread. > >> > > > >Yes, that matches what I just drafted. > >Do you want to push it, or shall I? > > Please go ahead and push. I'm temporarily without commit access. > OK, pushed. Regards, Dean
Re: BUG #15293: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events
If you are trying to get around the issue for now, what my team did was cron an insert statement on the database server. We have a queue table that has some of these triggers setup so it was easy to write a no-op row to the queue. This had the side effect of flushing the notification queue. We haven't had any issues with this approach. I can also volunteer to help test out patches. Thanks, -mdj On Fri, Feb 22, 2019 at 11:37 AM Robert Welin wrote: > I have reproduced this bug on PostgreSQL version 10.7 and 11.2 using the > steps described in Michael Powers' original report. The issue also still > seems to be present even with the patch provided by Sergei Kornilov. > > Are there plans to address this issue any time soon or is there some way > I can assist in fixing it? It would be great to have notifications from > logical replication. > > Regards, > Robert Welin > >
Re: BUG #15623: Inconsistent use of default for updatable view
On Thu, 28 Feb 2019 at 07:47, Amit Langote wrote: > > +if (attrno == 0) > +elog(ERROR, "Cannot set value in column %d to > DEFAULT", i); > > Maybe: s/Cannot/cannot/g > Ah yes, you're right. That is the convention. > +Assert(list_length(sublist) == numattrs); > > Isn't this Assert useless, because we're setting numattrs to > list_length() and transformInsertStmt ensures that all > sublists are same length? > Well possibly I'm being over-paranoid, but given that it may have come via a previous invocation of rewriteValuesRTE() that may have completely rebuilt the lists, it seemed best to be sure that it hadn't done something unexpected. It's about to use that to read from the attrnos array, so it might read beyond the array bounds if any of the prior logic was faulty. Thanks for looking. Regards, Dean
Re: Row Level Security − leakproof-ness and performance implications
On Thu, 28 Feb 2019 at 14:13, Robert Haas wrote: > A wild idea might be to let > proleakproof take on three values: yes, no, and maybe. When 'maybe' > functions are involved, we tell them whether or not the current query > involves any security barriers, and if so they self-censor. > Does self-censoring mean that they might still throw an error for some inputs, but that error won't reveal any information about the input values? That's not entirely consistent with my understanding of the definition of leakproof, but maybe there are situations where that amount of information leakage would be OK. So maybe we could have "strictly leakproof" functions that never throw errors and "weakly leakproof" functions (needs a better name) that can throw errors, as long as those errors don't include data values. Then we could allow strict and weak security barriers on a per-table basis, depending on how sensitive the data is in each table (I'm not a fan of using GUCs to control this). Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Thu, 28 Feb 2019 at 19:56, Tomas Vondra wrote: > Attached is an updated version of this patch series. Here are some random review comments. I'll add more later, but I'm out of energy for today. 1). src/test/regress/expected/type_sanity.out has bit-rotted. 2). Duplicate OIDs (3425). 3). It looks a bit odd that clauselist_selectivity() calls statext_clauselist_selectivity(), which does MCV stats and will do histograms, but it doesn't do dependencies, so clauselist_selectivity() has to then separately call dependencies_clauselist_selectivity(). It would seem neater if statext_clauselist_selectivity() took care of calling dependencies_clauselist_selectivity(), since dependencies are just another kind of extended stats. 4). There are no tests for pg_mcv_list_items(). Given a table with a small enough amount of data, so that it's all sampled, it ought to be possible to get predictable MCV stats. 5). It's not obvious what some of the new test cases in the "stats_ext" tests are intended to show. For example, the first test creates a table with 5000 rows and a couple of indexes, does a couple of queries, builds some MCV stats, and then repeats the queries, but the results seem to be the same with and without the stats. I wonder if it's possible to write smaller, more targeted tests. Currently "stats_ext" is by far the slowest test in its group, and I'm not sure that some of those tests add much. It ought to be possible to write a function that calls EXPLAIN and returns a query's row estimate, and then you could write tests to confirm the effect of the new stats by verifying the row estimates change as expected. 6). This enum isn't needed for MCVs: /* * Degree of how much MCV item matches a clause. * This is then considered when computing the selectivity. */ #define STATS_MATCH_NONE0/* no match at all */ #define STATS_MATCH_PARTIAL1/* partial match */ #define STATS_MATCH_FULL2/* full match */ STATS_MATCH_PARTIAL is never used for MCVs, so you may as well just use booleans instead of this enum. If those are needed for histograms, they can probably be made local to histogram.c. 7). estimate_num_groups_simple() isn't needed in this patch. 8). In README.mcv, s/clauselist_mv_selectivity_mcvlist/mcv_clauselist_selectivity/. 9). In the list of supported clause types that follows (e) is the same a (c), but with a more general description. 10). It looks like most of the subsequent description of the algorithm is out of date and needs rewriting. All the stuff about full matches and the use of ndistinct is now obsolete. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sat, 9 Mar 2019 at 18:33, Dean Rasheed wrote: > > On Thu, 28 Feb 2019 at 19:56, Tomas Vondra > wrote: > > Attached is an updated version of this patch series. > > Here are some random review comments. I'll add more later, but I'm out > of energy for today. > Here are some more comments: 11). In dependency_degree(): -/* sort the items so that we can detect the groups */ -qsort_arg((void *) items, numrows, sizeof(SortItem), - multi_sort_compare, mss); +/* + * build an array of SortItem(s) sorted using the multi-sort support + * + * XXX This relies on all stats entries pointing to the same tuple + * descriptor. Not sure if that might not be the case. + */ +items = build_sorted_items(numrows, rows, stats[0]->tupDesc, + mss, k, attnums_dep); That XXX comment puzzled me for a while. Actually it's OK though, unless/until we try to support stats across multiple relations, which will require a much larger refactoring of this code. For now though, The stats entries all point to the same tuple descriptor from the onerel passed to BuildRelationExtStatistics(), so it's OK to just use the first tuple descriptor in this way. The comment should be updated to explain that. 12). bms_member_index() should surely be in bitmapset.c. It could be more efficient by just traversing the bitmap words and making use of bmw_popcount(). Also, its second argument should be of type 'int' for consistency with other bms_* functions. 13). estimate_ndistinct() has been moved from mvdistinct.c to extended_stats.c and changed from static to extern, but it is only called from mvdistinct.c, so that change is unnecessary (at least as far as this patch is concerned). 14). The attnums Bitmapset passed to statext_is_compatible_clause_internal() is an input/output argument that it updates. That should probably be documented. When it calls itself recursively for AND/OR/NOT clauses, it could just pass the original Bitmapset through to be updated (rather than creating a new one and merging it), as it does for other types of clause. On the other hand, the outer function statext_is_compatible_clause() does need to return a new bitmap, which may or may not be used by its caller, so it would be cleaner to make that a strictly "out" parameter and initialise it to NULL in that function, rather than in its caller. 15). As I said yesterday, I don't think that there is a clean separator of concerns between the functions clauselist_selectivity(), statext_clauselist_selectivity(), dependencies_clauselist_selectivity() and mcv_clauselist_selectivity(), I think things could be re-arranged as follows: statext_clauselist_selectivity() - as the name suggests - should take care of *all* extended stats estimation, not just MCVs and histograms. So make it a fairly small function, calling mcv_clauselist_selectivity() and dependencies_clauselist_selectivity(), and histograms when that gets added. Most of the current code in statext_clauselist_selectivity() is really MCV-specific, so move that to mcv_clauselist_selectivity(). Amongst other things, that would move the call to choose_best_statistics() to mcv_clauselist_selectivity() (just as dependencies_clauselist_selectivity() calls choose_best_statistics() to get the best dependencies statistics). Then, when histograms are added later, you won't have the problem pointed out before where it can't apply both MCV and histogram stats if they're on different STATISTICS objects. Most of the comments for statext_clauselist_selectivity() are also MCV-related. Those too would move to mcv_clauselist_selectivity(). Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sun, 10 Mar 2019 at 13:09, Dean Rasheed wrote: > Here are some more comments: > One more thing --- the comment for statext_clauselist_selectivity() says: * So (simple_selectivity - base_selectivity) may be seen as a correction for * the part not covered by the MCV list. That's not quite right. It should really say that (simple_selectivity - base_selectivity) is an estimate for the part not covered by the MCV list, or that (mcv_selectivity - base_selectivity) is a correction for the part covered by the MCV list. Those 2 statements are actually equivalent, and different from what you wrote. Perhaps the easiest way to see it is to work through a simple example: Suppose you had the following clauses: a = 1 AND b >= 0 AND b <= 10 The per-column stats might be expected to give reasonable independent estimates for the following 2 things: P(a = 1) P(b >= 0 AND b <= 10) -- in general, greater than P(b >= 0) * P(b <= 10) but the overall estimate produced by clauselist_selectivity_simple() would then just be the product of those 2 things: simple_sel = P(a = 1) * P(b >= 0 AND b <= 10) which might not be so good if the columns were correlated. Now suppose you had MCV stats, which included MCV items for the following specific values: (a=1,b=1), (a=1,b=2), (a=1,b=3) but no other relevant MCV entries. (There might be lots of other MCV items that don't match the original clauses, but they're irrelavent for this discssion.) That would mean that we could get reasonable estimates for the following 2 quantities: mcv_sel = P(a = 1 AND b IN (1,2,3)) = P(a = 1 AND b = 1) + P(a = 1 AND b = 2) + P(a = 1 AND b = 3) mcv_basesel = base_freq(a = 1 AND b IN (1,2,3)) = P(a = 1) * (P(b = 1) + P(b = 2) + P(b = 3)) So how is that useful? Well, returning to the quantity that we're actually trying to compute, it can be split into MCV and non-MCV parts, and since they're mutually exclusive possibilities, their probabilities just add up. Thus we can write: P(a = 1 AND b >= 0 AND b <= 10) = P(a = 1 AND b IN (1,2,3)) -- MCV part + P(a = 1 AND b >= 0 AND b <= 10 AND b NOT IN (1,2,3)) -- non-MCV part = mcv_sel + other_sel So the first term is easy -- it's just mcv_sel, from above. The second term is trickier though, since we have no information about the correlation between a and b in the non-MCV region. Just about the best we can do is assume that they're independent, which gives: other_sel = P(a = 1 AND b >= 0 AND b <= 10 AND b NOT IN (1,2,3)) ~= P(a = 1) * P(b >= 0 AND b <= 10 AND b NOT IN (1,2,3)) and that can now be written in terms of things that we know other_sel ~= P(a = 1) * P(b >= 0 AND b <= 10 AND b NOT IN (1,2,3)) = P(a = 1) * P(b >= 0 AND b <= 10) - P(a = 1) * P(b IN (1,2,3)) -- mutually exclusive possibilities = simple_sel - mcv_basesel So, as I said above, (simple_selectivity - base_selectivity) is an estimate for the part not covered by the MCV list. Another way to look at it is to split the original per-column estimate up into MCV and non-MCV parts, and correct the MCV part using the MCV stats: simple_sel = P(a = 1) * P(b >= 0 AND b <= 10) = P(a = 1) * P(b IN (1,2,3)) + P(a = 1) * P(b >= 0 AND b <= 10 AND b NOT IN (1,2,3)) The first term is just mcv_basesel, so we can define other_sel to be the other term, giving simple_sel = mcv_basesel -- MCV part + other_sel-- non-MCV part Clearly mcv_basesel isn't the best estimate for the MCV part, and it should really be mcv_sel, so we can improve upon simple_sel by applying a correction of (mcv_sel - basesel) to it: better estimate = simple_sel + (mcv_sel - mcv_basesel) = mcv_sel + other_sel (where other_sel = simple_sel - mcv_basesel) Of course, that's totally equivalent, but looking at it this way (mcv_selectivity - base_selectivity) can be seen as a correction for the part covered by the MCV list. All of that generalises to arbitrary clauses, because the matching items in the MCV list are independent possibilities that sum up, and the MCV and non-MCV parts are mutually exclusive. That's also why the basesel calculation in mcv_clauselist_selectivity() must only include matching MCV items, and the following XXX comment is wrong: + for (i = 0; i < mcv->nitems; i++) + { + *totalsel += mcv->items[i]->frequency; + + if (matches[i] != STATS_MATCH_NONE) + { + /* XXX Shouldn't the basesel be outside the if condition? */ + *basesel += mcv->items[i]->base_frequency; + s += mcv->items[i]->frequency; + } + } So I believe that that code is correct, as written. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sun, 10 Mar 2019 at 17:36, Tomas Vondra wrote: > On 3/10/19 2:09 PM, Dean Rasheed wrote: > > 14). The attnums Bitmapset passed to > > statext_is_compatible_clause_internal() is an input/output argument > > that it updates. That should probably be documented. When it calls > > itself recursively for AND/OR/NOT clauses, it could just pass the > > original Bitmapset through to be updated (rather than creating a new > > one and merging it), as it does for other types of clause. > > I don't think it's really possible, because the AND/OR/NOT clause is > considered compatible only when all the pieces are compatible. So we > can't tweak the original bitmapset directly in case the incompatible > clause is not the very first one. > In the case where the overall clause is incompatible, you don't actually care about the attnums returned. Right now it will return an empty set (NULL). With this change it would return all the attnums encountered before the incompatible piece, but that wouldn't matter. In fact, you could easily preserve the current behaviour just by having the outer statext_is_compatible_clause() function set attnums back to NULL if the result is false. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sun, 10 Mar 2019 at 22:28, David Rowley wrote: > > On Mon, 11 Mar 2019 at 06:36, Tomas Vondra > wrote: > > > > On 3/9/19 7:33 PM, Dean Rasheed wrote: > > > I wonder if it's possible to write smaller, more targeted tests. > > > Currently "stats_ext" is by far the slowest test in its group, and I'm > > > not sure that some of those tests add much. It ought to be possible to > > > write a function that calls EXPLAIN and returns a query's row > > > estimate, and then you could write tests to confirm the effect of the > > > new stats by verifying the row estimates change as expected. > > > > Sure, if we can write more targeted tests, that would be good. But it's > > not quite clear to me how wrapping EXPLAIN in a function makes those > > tests any faster? > > I've not looked at the tests in question, but if they're executing an > inferior plan is used when no extended stats exists, then maybe that's > why they're slow. > > I think Dean might mean to create a function similar to > explain_parallel_append() in partition_prune.sql then write tests that > check the row estimate with EXPLAIN (COSTS ON) but strip out the other > costing stuff instead of validating that the poor plan was chosen. > Yeah that's the sort of thing I was thinking of. I think it might be possible to write simpler and faster tests by inserting far fewer rows and relying on ANALYSE having sampled everything, so the row estimates should be predictable. It may be the case that, with just a handful of rows, the extended stats don't affect the plan, but you'd still see a difference in the row estimates, and that could be a sufficient test I think. > > On 3/10/19 2:09 PM, Dean Rasheed wrote: > > > 12). bms_member_index() should surely be in bitmapset.c. It could be > > > more efficient by just traversing the bitmap words and making use of > > > bmw_popcount(). Also, its second argument should be of type 'int' for > > > consistency with other bms_* functions. > > > > Yes, moving to bitmapset.c definitely makes sense. I don't see how it > > could use bms_popcount() though. > > I think it could be done by first checking if the parameter is a > member of the set, and then if so, count all the bits that come on and > before that member. You can use bmw_popcount() for whole words before > the specific member's word then just bitwise-and a bit mask of a > bitmapword that has all bits set for all bits on and before your > parameter's BITNUM(), and add the bmw_popcount of the final word > bitwise-anding the mask. bms_add_range() has some masking code you > could copy. Yep, that's what I was imagining. Except I think that to get a 0-based index result you'd want the mask to have all bits set for bits *before* the parameter's BITNUM(), rather than on and before. So I think the mask would simply be ((bitmapword) 1 << bitnum) - 1 Regards, Dean
Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance
On Sun, 3 Feb 2019 at 15:12, Tom Lane wrote: > > Andrew Gierth writes: > > The spec doesn't require the inverse functions (asinh, acosh, atanh), > > but surely there is no principled reason to omit them? > > +1 --- AFAICS, the C library has offered all six since C89. > +1 for including the inverse functions. However, it looks to me like the inverse functions are C99-specific, so they might not be available on all supported platforms. If they're not, we may need to provide our own implementations. I did a bit of research and had play. It looks like it might not be too hard to provide our own implementations, but it does take a bit of care to avoid rounding and overflow errors. Attached are some standalone C programs where I tested various algorithms. A decent approach seems to be to either use log1p() (which is itself C99-specific, hence that's the first thing I played with) or to use a single round of Newton-Raphson to correct rounding errors, in a similar way to how we implement cbrt() on platforms that don't have that. Of course that may all be moot -- those functions may in fact be available everywhere we care about, but it was interesting to play around with them anyway. Regards, Dean #include #include #include /* * A commonly used algorithm, due to Kahan (citation needed). * * Max error: 1 ulp. * Avg error: 0.158 ulp. * * It's not obvious, but this does appear to be monotonic at the cutover point * (at least on my machine). Can that be relied upon on all platforms? * * -5.5511151231257851673e-17 -> -5.5511151231257851673e-17 (u != 1 branch) * -5.5511151231257839347e-17 -> -5.5511151231257839347e-17 (u != 1 branch) * -5.5511151231257827021e-17 -> -5.5511151231257827021e-17 (u == 1 branch) * -5.5511151231257820858e-17 -> -5.5511151231257820858e-17 (u == 1 branch) * * 1.1102230246251564172e-16 -> 1.1102230246251564172e-16 (u == 1 branch) * 1.1102230246251565404e-16 -> 1.1102230246251565404e-16 (u == 1 branch) * 1.1102230246251567869e-16 -> 1.1102230246251565404e-16 (u != 1 branch) * 1.1102230246251570335e-16 -> 1.1102230246251567869e-16 (u != 1 branch) * * However, it doesn't appear to be monotonic at various other points. */ double kahan_log1p(double x) { volatile double u = 1+x; return u == 1 ? x : x * (log(u) / (u-1)); } /* * Algorithm used in the GNU Scientific Library. * * Max error: 1 ulp. * Avg error: 0.086 ulp. * * Where does this formula come from? Licensing? * Doesn't return -0 when x is -0, but that could be fixed up. * Looks to be monotonic everywhere. */ double gsl_log1p(double x) { volatile double y, z; y = 1 + x; z = y - 1; return log(y) - (z-x)/y; } /* * Alternative algorithm using one round of Newton-Raphson. * * Max error: 1 ulp. * Avg error: 0.094 ulp. * * Works well for all inputs. * Looks to be monotonic everywhere. */ double nr_log1p(double x) { double y, e; if (x == 0) return x; y = log(1+x); e = exp(y); return y - (e-1-x)/e; } /* === TESTS === */ int num_tests = 0; double max_kahan_err = 0.0; double max_gsl_err = 0.0; double max_nr_err = 0.0; double total_kahan_err = 0.0; double total_gsl_err = 0.0; double total_nr_err = 0.0; double err(double x, double y) { if (x < y) { double diff = y - x; double ulp = nextafter(x, y) - x; return diff / ulp; } if (x > y) { double diff = x - y; double ulp = nextafter(y, x) - y; return diff / ulp; } return 0.0; } void test_log1p(double x) { double y = log1p(x); double kahan_y = kahan_log1p(x); double gsl_y = gsl_log1p(x); double nr_y = nr_log1p(x); double kahan_err = err(y, kahan_y); double gsl_err = err(y, gsl_y); double nr_err = err(y, nr_y); double prev_x = nextafter(x, -DBL_MAX); double next_x = nextafter(x, DBL_MAX); #define monotonic(fn) \ ( prev_x == -1 || (fn)(prev_x) <= (fn)(x) ) && \ ( next_x == DBL_MAX || (fn)(next_x) >= (fn)(x) ) ? \ "Monotonic" : "Not monotonic" printf("x = %.20g\n", x); printf("Standard result: %.20g %s\n", y, monotonic(log1p)); printf("Kahan log1p(): %.20g, err=%f %s\n", kahan_y, kahan_err, monotonic(kahan_log1p)); printf("GSL log1p(): %.20g, err=%f %s\n", gsl_y, gsl_err, monotonic(gsl_log1p)); printf("NR log1p(): %.20g, err=%f %s\n", nr_y, nr_err, monotonic(nr_log1p)); if (kahan_err > max_kahan_err) max_kahan_err = kahan_err; if (gsl_err > max_gsl_err) max_gsl_err = gsl_err; if (nr_err > max_nr_err) max_nr_err = nr_err; total_kahan_err += kahan_err; total_gsl_err += gsl_err; total_nr_err += nr_err; num_tests++; } int main() { test_log1p(nextafter(-1, 0)); test_log1p(3.141e-16-1); test_log1p(3.141e-15-1); test_log1p(3.141e-14-1); test_log1p(3.141e-13-1); test_log1p(3.141e-12-1); test_log1p(3.141e-11-1); test_log1p(3.141e-10-1); test_log1p(3.141e-9-1); test_log1p(3.141e-8-1);
Re: pgsql: Add support for hyperbolic functions, as well as log10().
On Wed, 13 Mar 2019, 21:56 Tom Lane, wrote: > > Of these, probably the least bad is #3, even though it might require > a few rounds of experimentation to find the best extra_float_digits > setting to use. I'll go try it without any roundoff, just to see > what the raw results look like ... > Yeah, that seems like a reasonable thing to try. I'm amazed that jacana's asinh() returned -0 for an input of +0. I'm not aware of any implementation that does that. I'd be quite interested to know what it returned for an input like 1e-20. If that returned any variety of zero, I'd say that it's worse than useless. Another interesting test case would be whether or not it satisfies asinh(-x) = -asinh(x) for a variety of different values of x, because that's something that commonly breaks down badly with naive implementations. It's not unreasonable to expect these functions to be accurate to within the last 1 or 2 digits, so testing with extra_float_digits or whatever seems reasonable, but I think a wider variety of test inputs is required. I also wonder if we should be doing what we do for the regular trig functions and explicitly handle special cases like Inf and NaN to ensure POSIX compatibility on all platforms. Regards, Dean >
Re: pgsql: Add support for hyperbolic functions, as well as log10().
On Thu, 14 Mar 2019 at 04:41, Tom Lane wrote: > > Dean Rasheed writes: > > I'm amazed that jacana's asinh() returned -0 for an input of +0. > > Even more amusingly, it returns NaN for acosh('infinity'), cf > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-03-14%2003%3A00%3A34 > > Presumably that means they calculated "infinity - infinity" at some > point, but why? > Given the -0 result, I don't find that particularly surprising. I suspect lots of formulae would end up doing that without proper special-case handling upfront. It looks like that's the only platform that isn't POSIX compliant though, so maybe it's not worth worrying about. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Fri, 15 Mar 2019 at 00:06, Tomas Vondra wrote: > I've noticed an annoying thing when modifying type of column not > included in a statistics... > > That is, we don't remove the statistics, but the estimate still changes. > But that's because the ALTER TABLE also resets reltuples/relpages: > > That's a bit unfortunate, and it kinda makes the whole effort to not > drop the statistics unnecessarily kinda pointless :-( > Well not entirely. Repeating that test with 100,000 rows, I get an initial estimate of 9850 (actual 10,000), which then drops to 2451 after altering the column. But if you drop the dependency statistics, the estimate drops to 241, so clearly there is some benefit in keeping them in that case. Besides, I thought there was no extra effort in keeping the extended statistics in this case -- isn't it just using the column dependencies, so in this case UpdateStatisticsForTypeChange() never gets called anyway? Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
it run out of memory before getting to that test? Even if not, it would likely take an excessive amount of time. I think this part of the patch needs a bit of a rethink. My first thought is to do something similar to what happens for per-column MCVs, and set an upper limit on the size of each value that is ever considered for inclusion in the stats (c.f. WIDTH_THRESHOLD and toowide_cnt in analyse.c). Over-wide values should be excluded early on, and it will need to track whether or not any such values were excluded, because then it wouldn't be appropriate to treat the stats as complete and keep the entire list, without calling get_mincount_for_mcv_list(). That's it for now. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sat, 16 Mar 2019 at 23:44, Tomas Vondra wrote: > > > 21). For consistency with other bms_ functions, I think the name of > > the Bitmapset argument for bms_member_index() should just be called > > "a". Nitpicking, I'd also put bms_member_index() immediately after > > bms_is_member() in the source, to match the header. > > I think I've already done the renames in the last patch I submitted (are > you looking at an older version of the code, perhaps?). I've moved it > right after bms_is_member - good idea. > Ah OK, I was on the 20190315 patch yesterday. I've just updated to the 20190317 patch. It looks like you forgot to update the argument name in the header file though. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sat, 16 Mar 2019 at 23:44, Tomas Vondra wrote: > > > 28). I just spotted the 1MB limit on the serialised MCV list size. I > > think this is going to be too limiting. For example, if the stats > > target is at its maximum of 1, that only leaves around 100 bytes > > for each item's values, which is easily exceeded. In fact, I think > > this approach for limiting the MCV list size isn't a good one -- > > consider what would happen if there were lots of very large values. > > Would it run out of memory before getting to that test? Even if not, > > it would likely take an excessive amount of time. > > > > True. I don't have a very good argument for a specific value, or even > having an explicit limit at all. I've initially added it mostly as a > safety for development purposes, but I think you're right we can just > get rid of it. I don't think it'd run out of memory before hitting the > limit, but I haven't tried very hard (but I recall running into the 1MB > limit in the past). > I've just been playing around a little with this and found that it isn't safely dealing with toasted values. For example, consider the following test: create or replace function random_string(x int) returns text as $$ select substr(string_agg(md5(random()::text), ''), 1, x) from generate_series(1,(x+31)/32); $$ language sql; drop table if exists t; create table t(a int, b text); insert into t values (1, random_string(1000)); create statistics s (mcv) on a,b from t; analyse t; select length(b), left(b,5), right(b,5) from t; select length(stxmcv), length((m.values::text[])[2]), left((m.values::text[])[2], 5), right((m.values::text[])[2],5) from pg_statistic_ext, pg_mcv_list_items(stxmcv) m where stxrelid = 't'::regclass; The final query returns the following: length | length | left | right +--+---+--- 250 | 1000 | c2667 | 71492 (1 row) suggesting that there's something odd about the stxmcv value. Note, also, that it doesn't hit the 1MB limit, even though the value is much bigger than that. If I then delete the value from the table, without changing the stats, and repeat the final query, it falls over: delete from t where a=1; select length(stxmcv), length((m.values::text[])[2]), left((m.values::text[])[2], 5), right((m.values::text[])[2],5) from pg_statistic_ext, pg_mcv_list_items(stxmcv) m where stxrelid = 't'::regclass; ERROR: unexpected chunk number 5008 (expected 0) for toast value 16486 in pg_toast_16480 So I suspect it was using the toast data from the table t, although I've not tried to investigate further. > > I think this part of the patch needs a bit of a rethink. My first > > thought is to do something similar to what happens for per-column > > MCVs, and set an upper limit on the size of each value that is ever > > considered for inclusion in the stats (c.f. WIDTH_THRESHOLD and > > toowide_cnt in analyse.c). Over-wide values should be excluded early > > on, and it will need to track whether or not any such values were > > excluded, because then it wouldn't be appropriate to treat the stats > > as complete and keep the entire list, without calling > > get_mincount_for_mcv_list(). > > > Which part? Serialization / deserialization? Or how we handle long > values when building the MCV list? > I was thinking (roughly) of something like the following: * When building the values array for the MCV list, strip out rows with values wider than some threshold (probably something like the WIDTH_THRESHOLD = 1024 from analyse.c would be reasonable). * When building the MCV list, if some over-wide values were previously stripped out, always go into the get_mincount_for_mcv_list() block, even if nitems == ngroups (for the same reason a similar thing happens for per-column stats -- if some items were stripped out, we're already saying that not all items will go in the MCV list, and it's not safe to assume that the remaining items are common enough to give accurate estimates). * In the serialisation code, remove the size limit entirely. We know that each value is now at most 1024 bytes, and there are at most 1 items, and at most 8 columns, so the total size is already reasonably well bounded. In the worst case, it might be around 80MB, but in practice, it's always likely to be much much smaller than that. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sat, 16 Mar 2019 at 23:44, Tomas Vondra wrote: > > > > 16). This regression test fails for me: > > > > @@ -654,11 +654,11 @@ > > -- check change of unrelated column type does not reset the MCV statistics > > ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64); > > SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = > > 1 AND b = ''1'''); > > estimated | actual > > ---+ > > -50 | 50 > > +11 | 50 > > (1 row) > > > > Maybe that's platform-dependent, given what you said about > > reltuples/relpages being reset. An easy workaround for this would be > > to modify this test (and perhaps the one that follows) to just query > > pg_statistic_ext to see if the MCV statistics have been reset. > > > > Ah, sorry for not explaining this bit - the failure is expected, due to > the reset of relpages/reltuples I mentioned. We do keep the extended > stats, but the relsize estimate changes a bit. It surprised me a bit, > and this test made the behavior apparent. The last patchset included a > piece that changes that - if we decide not to change this, I think we > can simply accept the actual output. > I don't think changing the way reltuples is reset ought to be within the scope of this patch. There might be good reasons for it being the way it is. Perhaps open a discussion on a separate thread? As far as this test goes, how about just doing this: -- check change of unrelated column type does not reset the MCV statistics ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64); SELECT stxmcv IS NOT NULL AS has_mcv FROM pg_statistic_ext WHERE stxrelid = 'mcv_lists'::regclass; -- check change of column type resets the MCV statistics ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric; SELECT stxmcv IS NOT NULL AS has_mcv FROM pg_statistic_ext WHERE stxrelid = 'mcv_lists'::regclass; Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Sun, 24 Mar 2019 at 00:17, David Rowley wrote: > > On Sun, 24 Mar 2019 at 12:41, Tomas Vondra > wrote: > > > > On 3/21/19 4:05 PM, David Rowley wrote: > > > > 29. Looking at the tests I see you're testing that you get bad > > > estimates without extended stats. That does not really seem like > > > something that should be done in tests that are meant for extended > > > statistics. > > > > > > > True, it might be a bit unnecessary. Initially the tests were meant to > > show old/new estimates for development purposes, but it might not be > > appropriate for regression tests. I don't think it's a big issue, it's > > not like it'd slow down the tests significantly. Opinions? > > My thoughts were that if someone did something to improve non-MV > stats, then is it right for these tests to fail? What should the > developer do in the case? update the expected result? remove the test? > It's not so clear. > I think the tests are fine as they are. Don't think of these as "good" and "bad" estimates. They should both be "good" estimates, but under different assumptions -- one assuming no correlation between columns, and one taking into account the relationship between the columns. If someone does do something to "improve" the non-MV stats, then the former tests ought to tell us whether it really was an improvement. If so, then the test result can be updated and perhaps whatever was done ought to be factored into the MV-stats' calculation of base frequencies. If not, the test is providing valuable feedback that perhaps it wasn't such a good improvement after all. Regards, Dean
Re: BUG #15708: RLS 'using' running as wrong user when called from a view
On Thu, 21 Mar 2019 at 00:39, PG Bug reporting form wrote: > > This fails, seemingly because the RLS on 'bar' is being checked by alice, > instead of the view owner bob: > Yes I agree, that appears to be a bug. The subquery in the RLS policy should be checked as the view owner -- i.e., we need to propagate the checkAsUser for the RTE with RLS to any subqueries in its RLS policies. It looks like the best place to fix it is in get_policies_for_relation(), since that's where all the policies to be applied for a given RTE are pulled together. Patch attached. Regards, Dean rls-perm-check-fix.patch Description: Binary data
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Mon, 25 Mar 2019 at 23:36, Tomas Vondra wrote: > > Attached is an updated patch, fixing all the issues pointed out so far. > Unless there are some objections, I plan to commit the 0001 part by the > end of this CF. Part 0002 is a matter for PG13, as previously agreed. > Yes, I think that's reasonable. It looks to be in pretty good shape. I have reviewed most of the actual code, but note that I haven't reviewed the docs changes and I didn't spend much time reading code comments. It might benefit from a quick once-over comment tidy up. I just looked through the latest set of changes and I have a couple of additional review comments: In the comment about WIDTH_THRESHOLD, s/pg_statistic/pg_statistic_ext/. In statext_mcv_build(), I'm not convinced by the logic around keeping the whole MCV list if it fits. Suppose there were a small number of very common values, and then a bunch of uniformly distributed less common values. The sample might consist of all the common values, plus one or two instances of some of the uncommon ones, leading to a list that would fit, but it would not be appropriate to keep the uncommon values on the basis of having seen them only one or two times. The fact that the list of items seen fits doesn't by itself mean that they're all common enough to justify being kept. In the per-column stats case, there are a bunch of other checks that have to pass, which are intended to test not just that the list fits, but that it believes that those are all the items in the table. For MV stats, you don't have that, and so I think it would be best to just remove that test (the "if (ngroups > nitems)" test) and *always* call get_mincount_for_mcv_list() to determine how many MCV items to keep. Otherwise there is a risk of keeping too many MCV items, with the ones at the tail end of the list producing poor estimates. Regards, Dean
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Tue, 26 Mar 2019 at 11:59, Dean Rasheed wrote: > > On Mon, 25 Mar 2019 at 23:36, Tomas Vondra > wrote: > > > > Attached is an updated patch... > > I just looked through the latest set of changes and I have a couple of > additional review comments: > I just spotted another issue while reading the code: It's possible to build an MCV list with more than STATS_MCVLIST_MAX_ITEMS = 8192 items, which then causes an error when the code tries to read it back in: create temp table foo(a int, b int); insert into foo select x,x from generate_series(1,1) g(x); insert into foo select x,x from generate_series(1,1) g(x); alter table foo alter column a set statistics 1; alter table foo alter column b set statistics 1; create statistics s (mcv) on a,b from foo; analyse foo; select * from foo where a=1 and b=1; ERROR: invalid length (1) item array in MCVList So this needs to be checked when building the MCV list. In fact, the stats targets for table columns can be as large as 1 (a hard-coded constant in tablecmds.c, which is pretty ugly, but that's a different matter), so I think STATS_MCVLIST_MAX_ITEMS probably ought to match that. There are also a couple of comments that refer to the 8k limit, which would need updating, if you change it. Regards, Dean
Re: BUG #15708: RLS 'using' running as wrong user when called from a view
On Mon, 25 Mar 2019 at 20:27, Stephen Frost wrote: > > * Dean Rasheed (dean.a.rash...@gmail.com) wrote: > > > It looks like the best place to fix it is in > > get_policies_for_relation(), since that's where all the policies to be > > applied for a given RTE are pulled together. Patch attached. > > Yes, on a quick review, that looks like a good solution to me as well. > On second thoughts, it actually needs to be in get_row_security_policies(), after making copies of the quals from the policies, otherwise it would be scribbling on the copies from the relcache. Actually that makes the code change a bit simpler too. Regards, Dean diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c new file mode 100644 index 835540c..e9f532c --- a/src/backend/rewrite/rowsecurity.c +++ b/src/backend/rewrite/rowsecurity.c @@ -47,6 +47,7 @@ #include "nodes/pg_list.h" #include "nodes/plannodes.h" #include "parser/parsetree.h" +#include "rewrite/rewriteDefine.h" #include "rewrite/rewriteHandler.h" #include "rewrite/rewriteManip.h" #include "rewrite/rowsecurity.h" @@ -382,6 +383,13 @@ get_row_security_policies(Query *root, R table_close(rel, NoLock); /* + * Copy checkAsUser to the row security quals and WithCheckOption checks, + * in case they contain any subqueries referring to other relations. + */ + setRuleCheckAsUser((Node *) *securityQuals, rte->checkAsUser); + setRuleCheckAsUser((Node *) *withCheckOptions, rte->checkAsUser); + + /* * Mark this query as having row security, so plancache can invalidate it * when necessary (eg: role changes) */ diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out new file mode 100644 index 4b53401..d764991 --- a/src/test/regress/expected/rowsecurity.out +++ b/src/test/regress/expected/rowsecurity.out @@ -3910,6 +3910,33 @@ DROP OWNED BY regress_rls_dob_role1; DROP POLICY p1 ON dob_t2; -- should succeed DROP USER regress_rls_dob_role1; DROP USER regress_rls_dob_role2; +-- Bug #15708: view + table with RLS should check policies as view owner +CREATE TABLE ref_tbl (a int); +INSERT INTO ref_tbl VALUES (1); +CREATE TABLE rls_tbl (a int); +INSERT INTO rls_tbl VALUES (10); +ALTER TABLE rls_tbl ENABLE ROW LEVEL SECURITY; +CREATE POLICY p1 ON rls_tbl USING (EXISTS (SELECT 1 FROM ref_tbl)); +GRANT SELECT ON ref_tbl TO regress_rls_bob; +GRANT SELECT ON rls_tbl TO regress_rls_bob; +CREATE VIEW rls_view AS SELECT * FROM rls_tbl; +ALTER VIEW rls_view OWNER TO regress_rls_bob; +GRANT SELECT ON rls_view TO regress_rls_alice; +SET SESSION AUTHORIZATION regress_rls_alice; +SELECT * FROM ref_tbl; -- Permission denied +ERROR: permission denied for table ref_tbl +SELECT * FROM rls_tbl; -- Permission denied +ERROR: permission denied for table rls_tbl +SELECT * FROM rls_view; -- OK + a + + 10 +(1 row) + +RESET SESSION AUTHORIZATION; +DROP VIEW rls_view; +DROP TABLE rls_tbl; +DROP TABLE ref_tbl; -- -- Clean up objects -- diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql new file mode 100644 index ea83153..df8fe11 --- a/src/test/regress/sql/rowsecurity.sql +++ b/src/test/regress/sql/rowsecurity.sql @@ -1764,6 +1764,32 @@ DROP POLICY p1 ON dob_t2; -- should succ DROP USER regress_rls_dob_role1; DROP USER regress_rls_dob_role2; +-- Bug #15708: view + table with RLS should check policies as view owner +CREATE TABLE ref_tbl (a int); +INSERT INTO ref_tbl VALUES (1); + +CREATE TABLE rls_tbl (a int); +INSERT INTO rls_tbl VALUES (10); +ALTER TABLE rls_tbl ENABLE ROW LEVEL SECURITY; +CREATE POLICY p1 ON rls_tbl USING (EXISTS (SELECT 1 FROM ref_tbl)); + +GRANT SELECT ON ref_tbl TO regress_rls_bob; +GRANT SELECT ON rls_tbl TO regress_rls_bob; + +CREATE VIEW rls_view AS SELECT * FROM rls_tbl; +ALTER VIEW rls_view OWNER TO regress_rls_bob; +GRANT SELECT ON rls_view TO regress_rls_alice; + +SET SESSION AUTHORIZATION regress_rls_alice; +SELECT * FROM ref_tbl; -- Permission denied +SELECT * FROM rls_tbl; -- Permission denied +SELECT * FROM rls_view; -- OK +RESET SESSION AUTHORIZATION; + +DROP VIEW rls_view; +DROP TABLE rls_tbl; +DROP TABLE ref_tbl; + -- -- Clean up objects --
Re: ANALYZE: ERROR: tuple already updated by self
On Sun, 28 Jul 2019 at 11:15, Tomas Vondra wrote: > > Attached is a patch fixing the error by not building extended stats for > the inh=true case (as already proposed in this thread). That's about the > simplest way to resolve this issue for v12. It should add a simple > regression test too, I guess. > Seems like a reasonable thing to do for 10, 11 and possibly also 12 (actually, as you noted, I think it's the only thing that can be done for 10 and 11). > To fix this properly we need to add a flag similar to stainherit > somewhere. And I've started working on that, thinking that maybe we > could do that even for v12 - it'd be yet another catversion bump, but > there's already been one since beta2 so maybe it would be OK. > Yeah, I think that makes sense, if it's not too hard. Since v12 is where the stats definition is split out from the stats data, this might work out quite neatly, since the inh flag would apply only to the stats data. > But it's actually a bit more complicated than just adding a column to > the catalog, for two reasons: > > 1) The optimizer part has to be tweaked to pick the right object, with > the flag set to either true/false. Not trivial, but doable. > Isn't it just a matter of passing the inh flag to get_relation_statistics() from get_relation_info(), so then the optimiser would get the right kind of stats data, depending on whether or not inheritance was requested in the query. Regards, Dean
Re: Multivariate MCV list vs. statistics target
On Thu, 1 Aug 2019 at 11:30, Tomas Vondra wrote: > > I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san > in his review, I still haven't made my mind about whether to base the use > statistics targets set for the attributes. That's what we're doing now, > but I'm not sure it's a good idea after adding separate statistics target. > I wonder what Dean's opinion on this is, as he added the current logic. > If this were being released in the same version as MCV stats first appeared, I'd say that there's not much point basing the default multivariate stats target on the per-column targets, when it has its own knob to control it. However, since this won't be released for a year, those per-column-based defaults will be in the field for that long, and so I'd say that we shouldn't change the default when adding this, otherwise users who don't use this new feature might be surprised by the change in behaviour. Regards, Dean
Re: MCV lists for highly skewed distributions
On 21 January 2018 at 07:26, John Naylor wrote: > I spent a few hours hacking on this, and it turns out calculating the > right number of MCVs taking into account both uniform and highly > non-uniform distributions is too delicate a problem for me to solve > right now. The logic suggested by Dean Rasheed in [1] always produces > no MCVs for a perfectly uniform distribution (which is good), but very > often also for other distributions, which is not good. My efforts to > tweak that didn't work, so I didn't get as far as adapting it for the > problem Jeff is trying to solve. Hmm, Tom suggested that the test based on the average frequency over all values might be too strict because the estimated number of distinct values is often too low, so that might explain what you're seeing. It occurs to me that maybe a better test to exclude a value from the MCV list would be to demand that its relative standard error not be too high. Such a test, in addition to the existing tests, might be sufficient to solve the opposite problem of too many values in the MCV list, because the real problem there is including a value after having seen relatively few occurrences of it in the sample, and thus having a wildly inaccurate estimate for it. Setting a bound on the relative standard error would mean that we could have a reasonable degree of confidence in estimates produced from the sample. Regards, Dean
Re: stricter MCV tests for uniform distributions (was Re: MCV lists for highly skewed distributions)
On 22 January 2018 at 08:07, John Naylor wrote: > On 1/21/18, Dean Rasheed wrote: >> It occurs to me that maybe a better test to exclude a value from the >> MCV list would be to demand that its relative standard error not be >> too high. Such a test, in addition to the existing tests, might be >> sufficient to solve the opposite problem of too many values in the MCV >> list, because the real problem there is including a value after having >> seen relatively few occurrences of it in the sample, and thus having a >> wildly inaccurate estimate for it. Setting a bound on the relative >> standard error would mean that we could have a reasonable degree of >> confidence in estimates produced from the sample. > > If you don't mind, what would the math look like for that? Using the same syntax as before: N = Total rows in table (population size) n = Number of rows sampled x = Number of times a particular value appears in the sample p = x/n = Frequency of the value in the sample So that the standard error of the proportion is SE = sqrt(p*(1-p)/n) * sqrt((N-n)/(N-1)) Then the relative standard error (which is usually expressed as a percentage) is RSE = 100 * SE / p So if we were to demand that the relative standard error was less than, say, 10%, then the constraint would just be SE < 0.1 * p Note: * This formula not valid if x is very small (see what I wrote about being able to approximate the distribution of p with a normal distribution). So we should also enforce the "rule of thumb" x >= 10. * The frequency p that we're storing is the count divided by the overall sample size. So the values for N and n above should not be the counts that exclude NULLs. As far as this logic is concerned, NULLs (and too-wide values) are just values not equal to value under consideration. Thus it appears, from just a quick glance at your patch, that you're using the wrong values for N and n. * The RSE is a monotonically decreasing function of p in the range [0,1], so an upper bound on the RSE equates to a lower bound on the number of occurrences of the value. This last point gives me a different idea. Instead of applying this test *in addition to* the existing tests, how about applying it *instead of* the existing tests. That is, we keep all MCVs that appear sufficiently often that we can be reasonably confident in the estimates produced from their sample frequencies, regardless of whether or not they are more common than average (the average being fairly meaningless for a highly skewed distribution anyway). This is still keeping the most common values, but now we'd be saying that we keep any value that appears sufficiently often in the sample that we can extrapolate its sample frequency to produce a reasonably accurate estimate of the population frequency, and discard values for which the estimate is likely to be inaccurate. I have not tested this idea at all, but it seems like it might be worth considering. It has the nice property that the test depends entirely on how often the value appears, rather than on other previously computed statistics, such as Ndistinct. Doing a quick test in pure SQL, using the highly skewed distribution Jeff Janes gave in the other thread, with the default sample size of 30,000: with samples as ( select floor(-log(random())/log(2))::int as who from generate_series(1,3) ), freqs as ( select who, count(*) as x, count(*)/3::float8 as p from samples group by who ), stats as ( select *, sqrt(p*(1-p)/3) * sqrt((1000-3)::float8/(1000-1)) as se from freqs ) select *, (1000*p)::int||'+/-'||(2*se*1000)::int as "95% interval", case when x >=10 and se < 0.1*p then 'KEEP' else 'DISCARD' end from stats order by p desc limit 100; it pretty consistently keeps the 8 most common values: who | x | p | se | 95% interval | case -+---+--+--+-+- 0 | 15017 |0.5005667 | 0.00288241625942075 | 5005667+/-57648 | KEEP 1 | 7607 |0.2535667 | 0.00250800597590887 | 2535667+/-50160 | KEEP 2 | 3713 |0.1237667 | 0.0018984483 | 1237667+/-37969 | KEEP 3 | 1855 | 0.0618 | 0.0013884757600711 | 618333+/-27770 | KEEP 4 | 914 | 0.03046667 | 0.000990788179299791 | 304667+/-19816 | KEEP 5 | 448 | 0.0149 | 0.000699194759916533 | 149333+/-13984 | KEEP 6 | 229 | 0.00763 | 0.000501741670388358 | 76333+/-10035 | KEEP 7 | 108 | 0.0036 | 0.000345267009604061 | 36000+/-6905| KEEP 8 |46 | 0.00153 | 0.000225565173744715 | 15333+/-4511| DISCARD 9 |34 | 0.00113 | 0.000193963300230354 | 11333+/-3879| DISCARD 10 |17 |
Re: WINDOW RANGE patch versus leakproofness
On 30 January 2018 at 16:42, Tom Lane wrote: > So I'm thinking that (a) we do not need to check for leaky functions used > in window support, and (b) therefore there's no need to avoid leaky > behavior in in_range support functions. Objections? > Yes, I concur. Since window functions can only appear in the SELECT target list and ORDER BY clauses, they should never appear in a qual that gets considered for push down, and thus contain_leaked_vars() should never see a window function. Moreover, contain_leaked_vars() is intentionally coded defensively, so if it ever does somehow see a window function (or any other unexpected node type) it will return true and the resulting qual/restrictinfo will be marked leaky, and not pushed through security barriers. Regards, Dean
Re: WINDOW RANGE patch versus leakproofness
On 31 January 2018 at 21:51, Robert Haas wrote: > On Wed, Jan 31, 2018 at 5:52 AM, Dean Rasheed > wrote: >> On 30 January 2018 at 16:42, Tom Lane wrote: >>> So I'm thinking that (a) we do not need to check for leaky functions used >>> in window support, and (b) therefore there's no need to avoid leaky >>> behavior in in_range support functions. Objections? >> >> Yes, I concur. Since window functions can only appear in the SELECT >> target list and ORDER BY clauses, they should never appear in a qual >> that gets considered for push down, and thus contain_leaked_vars() >> should never see a window function. > > What about a query that uses window functions within a subquery? > A qual containing a subquery is treated as not push down safe, so it wouldn't even be considered for pushing down into a security barrier view. On a table with RLS, contain_leaked_vars() would see a subplan on the restrictinfo's clause and return true, causing the restrictinfo to be treated as leaky. So in each case, the qual with the subquery would be executed after any security quals, regardless of what it contained. The contents of the subquery aren't currently examined, it just defaults to leaky. If they were to be examined, the window function would be found and it would still default to being leaky. Regards, Dean