Re: Order of execution for permissive RLS policies
On 24 July 2018 at 15:25, Simon Brent wrote: > I've been using postgres for a while now, and have just started looking in > to row level security. I have found something that I think is a bit strange, > and wanted to know if anyone knows how/why it is the case. > > I have a table with multiple policies, each with a USING statement. When I > run EXPLAIN ANALYSE SELECT * FROM [table], I see that the policies are OR'd > together in reverse alphabetical name order. It doesn't matter which order I > create the policies in - the order they are checked is always (for example) > zz OR yy OR xx OR ww. Hmm, the fact that permissive policies sometimes appear to be checked in reverse alphabetical order looks like an implementation artefact -- that's the order in which RLS policies are read when loading a table's metadata (see RelationBuildRowSecurity() in src/backend/commands/policy.c). I don't believe that was intentional, but any case, PostgreSQL makes no guarantees about the order of evaluation of clauses under an OR clause, and the query optimiser is free to re-order them for efficiency, provided that doing so doesn't affect the query result. A trivial example where permissive policies won't be evaluated in reverse alphabetical order would be a policy that said USING (a=1 AND b=1), and another one that said USING (a=1 AND b=2). The query optimiser would merge those together to produce a=1 AND (b=1 OR b=2), and then consider any indexes on 'a' and/or 'b'. So there is, in general, no well-defined order of evaluation of the policies. The only case where order can affect the result of a query is multiple restrictive policies used to check new data inserted into a table using INSERT or UPDATE. In that case, the error message for new data violating one of the policies depends on the order in which they are checked, and it's useful to be able to predict what the error message will be. This is why restrictive policies are sorted by name. (This is a little like multiple CHECK constraints on a table, which are also checked in name order). Again, that name-ordering of restrictive policies only applies to the checks run on new data; the clauses added to the WHERE clause to check permission to access existing data may be rearranged by the query optimiser and evaluated in any order. Regards, Dean
Re: Will there ever be support for Row Level Security on Materialized Views?
On 28 August 2018 at 01:49, Alvaro Herrera wrote: > On 2018-Aug-27, Ken Tanzer wrote: >>- In the scheme of things, is it a lot of work or not so much? > > Probably not much. > Yeah, it doesn't seem like it would be particularly difficult, but it would probably still be a reasonable amount of work to go round finding all the places in code that would need updating. I.e., I think it would be more mechanical work, than anything fundamentally challenging. On the face of it, it seems like it might be a reasonable thing to support, but I wonder, is this really just syntactic sugar for creating a security barrier view on top of the materialized view? When RLS was originally implemented, that same question was asked for tables, but the answer was "no" because RLS on a table gives you fine-grained (row-level) control over what data in the table can be modified as well as read, which a SB view doesn't give you. But for a MV view, that's not a consideration, so what would RLS on a MV actually give you? Regards, Dean
Re: Postgres trigger side-effect is occurring out of order with row-level security select policy
The real issue here is to do with the visibility of the data inserted by the trigger function from within the same command. In general, data inserted by a command is not visible from within that same command. The easiest way to see what's going on is with a simple example. Consider the following (based on the original example, but without any RLS): DROP TABLE IF EXISTS a,b; CREATE TABLE a (id text); CREATE TABLE b (id text); CREATE OR REPLACE FUNCTION reproHandler() RETURNS TRIGGER AS $$ BEGIN RAISE NOTICE USING MESSAGE = 'inside trigger handler'; INSERT INTO b (id) VALUES (NEW.id); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER reproTrigger BEFORE INSERT ON a FOR EACH ROW EXECUTE PROCEDURE reproHandler(); CREATE OR REPLACE FUNCTION check_b1(text) RETURNS boolean AS $$ BEGIN RETURN (EXISTS (SELECT * FROM b WHERE b.id = $1)); END $$ LANGUAGE plpgsql STABLE; CREATE OR REPLACE FUNCTION check_b2(text) RETURNS boolean AS $$ BEGIN RETURN (EXISTS (SELECT * FROM b WHERE b.id = $1)); END $$ LANGUAGE plpgsql VOLATILE; INSERT INTO a VALUES ('xxx') RETURNING id, check_b1(id), check_b2(id), (EXISTS (SELECT * FROM b WHERE b.id = a.id)); NOTICE: inside trigger handler id | check_b1 | check_b2 | exists -+--+--+ xxx | f| t| f (1 row) INSERT 0 1 Notice that the functions check_b1() and check_b2() are identical, except that check_b1() is declared STABLE and check_b2() is declared VOLATILE, and that makes all the difference. Quoting from the documentation for function volatility [1]: For functions written in SQL or in any of the standard procedural languages, there is a second important property determined by the volatility category, namely the visibility of any data changes that have been made by the SQL command that is calling the function. A VOLATILE function will see such changes, a STABLE or IMMUTABLE function will not. [1] https://www.postgresql.org/docs/10/static/xfunc-volatility.html Also notice that the inline EXISTS query behaves in the same way as the STABLE function -- i.e., it does not see changes made in the current query. So returning to the RLS example, because the RLS SELECT policy is defined using inline SQL, it cannot see the changes made by the trigger. If you want to see such changes, you need to define a VOLATILE function to do the RLS check. Regards, Dean
Re: Postgres trigger side-effect is occurring out of order with row-level security select policy
On Mon, 1 Oct 2018 at 21:45, Carl Sverre wrote: > Dean, > Thank you for the pointer towards visibility/volatility. I think that > completely explains the effect that I am seeing in my repro. I experimented > with using a VOLATILE function for the SELECT RLS using statement and while > it completely solves my issue, it incurs too high a cost for query execution > due to the RLS policy no longer being inlined into the scan. > > I have documented your answer and my experimentation on the stack overflow > answer: > https://stackoverflow.com/questions/52565720/postgres-trigger-side-effect-is-occurring-out-of-order-with-row-level-security-s > I had a quick look at that and found a bug in your implementation. The RLS check function is defined as follows: CREATE OR REPLACE FUNCTION rlsCheck(id text) RETURNS TABLE (id text) AS $$ select * from b where b.id = id $$ LANGUAGE sql VOLATILE; which is incorrect because of the ambiguous reference to "id". That final "id" will, by default, refer to the table column b.id, not the parameter "id". Thus that function will return every row of b, and your check won't be doing what you want. That's also going to hurt performance, but you didn't provide enough information to diagnose the actual performance problem that you are seeing. In any case, the above needs to be written as CREATE OR REPLACE FUNCTION rlsCheck(text) RETURNS TABLE (id text) AS $$ select id from b where b.id = $1 $$ LANGUAGE sql VOLATILE; to work as expected. Regards, Dean
Re: Exponentiation confusion
On Thu, 13 Oct 2022 at 18:17, Tom Lane wrote: > > I'm inclined to think that we should push the responsibility for choosing > its rscale into power_var_int(), because internally that already does > estimate the result weight, so with a little code re-ordering we won't > need duplicative estimates. Don't have time to work on that right now > though ... Dean, are you interested in fixing this? > OK, I'll take a look. The most obvious thing to do is to try to make power_var_int() choose the same result rscale as power_var() so that the results are consistent regardless of whether the exponent is an integer. It's worth noting, however, that that will cause in a *reduction* in the output rscale rather than an increase in some cases, since the power_var_int() code path currently always chooses an rscale of at least 16, whereas the other code path in power_var() uses the rscales of the 2 inputs, and produces a minimum of 16 significant digits, rather than 16 digits after the decimal point. For example: select power(5.678, 18.0001::numeric); power - 37628507689498.14987457 (1 row) select power(5.678, 18::numeric); power - 37628507036041.8454541428979479 (1 row) Regards, Dean
Re: regr_slope returning NULL
On Sun, 24 Mar 2019 at 08:01, Steve Baldwin wrote: > > Thanks Tom, > > I've tried this on 11.2 (OS X 10.14.3, installed locally) and 10.6 (AWS RDS) > instances with identical results. The values you show are identical to those > returned by Oracle so that's great but why am I seeing different results? > This is caused by the large magnitude of the ts values, which causes a cancellation error in the Sxx calculation, which is what commit e954a727f0 fixed in HEAD, and will be available in PG12 [1]. You can see that by including regr_sxx in the results. With PG11, this gives the following: select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx from sb1 group by id; id |trend | sxx --+--+- c742 | | 0 317e | | 0 5fe6 | 5.78750952760444e-06 | 19905896448 3441 | | 0 (4 rows) Those zeros for Sxx are the result of calculating the sum of the squares of ts values and then subtracting off the square of the mean, which results in a complete loss of accuracy because the intermediate values are so large they don't differ according to double precision arithmetic. A workaround in PG11 is to just offset the ts values by something close to their mean (offsetting the ts values by a constant amount shouldn't affect the mathematical result, but does eliminate the cancellation errors): select id, regr_slope(elapsed, ts-1552892914) as trend, regr_sxx(elapsed, ts-1552892914) as sxx from sb1 group by id; id |trend |sxx --+--+ c742 | 19.6077357654714 | 0.0468182563781738 317e |-1.08385104429772 | 59.2381523980035 5fe6 | 5.78750948360697e-06 | 19905896596.7403 3441 |-3.82839508895523 | 20.1098628044128 (4 rows) For PG12 the algorithm for calculating these quantities has been changed by e954a727f0, so the result should be more accurate regardless of the offset: select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx from sb1 group by id; id |trend |sxx --+--+ c742 | 19.6078587812905 | 0.0468179252929986 317e | -1.0838511987809 | 59.2381423694815 5fe6 | 5.78750948358674e-06 | 19905896596.7605 3441 |-3.82839546309736 | 20.1098619909822 (4 rows) select id, regr_slope(elapsed, ts-1552892914) as trend, regr_sxx(elapsed, ts-1552892914) as sxx from sb1 group by id; id |trend |sxx --+--+ c742 | 19.6078431374563 | 0.046817990382 317e |-1.08385109620679 | 59.2381495556381 5fe6 | 5.78750948360693e-06 | 19905896596.7403 3441 |-3.82839509931361 |20.109862749992 (4 rows) Regards, Dean [1] https://github.com/postgres/postgres/commit/e954a727f0
Re: security invoker review need full select (all columns) to do DML?
On Wed, 21 Aug 2024 at 10:08, jian he wrote: > > the following setup is extract from src/test/regress/sql/updatable_views.sql > you can search keywords: "-- ordinary view on top of security invoker > view permissions" > > CREATE TABLE base_tbl(a int, b text, c float); > INSERT INTO base_tbl VALUES (1, 'Row 1', 1.0); > > SET SESSION AUTHORIZATION regress_view_user1; > CREATE VIEW rw_view1 AS SELECT b AS bb, c AS cc, a AS aa FROM base_tbl; > ALTER VIEW rw_view1 SET (security_invoker = true); > > RESET SESSION AUTHORIZATION; > GRANT SELECT(a,b) ON base_tbl TO regress_view_user1; In updatable_views.sql that GRANT is actually GRANT SELECT ON base_tbl TO regress_view_user1; Without that, the view is effectively unusable by regress_view_user1 because it selects from column c of base_tbl, and regress_view_user1 lacks permissions on that column. This is consistent with simple subqueries: select a, b from (select a,b from base_tbl); -- ok a | b ---+--- 1 | Row 1 (1 row) select a, b from (select a,b,c from base_tbl); -- not allowed ERROR: permission denied for table base_tbl The user must have select permissions on all columns selected by the subquery/view, because we don't go through the outer query to check which columns are actually referred to. Now maybe we could, but I suspect that would be quite a lot of effort because you'd need to be sure that the column wasn't referred to anywhere in either the outer query or the subquery itself (e.g., in WHERE clauses, etc.). Regards, Dean
Re: Undocumented array_val[generate_series(...)] functionality?
On Mon, 12 Jul 2021 at 02:39, David G. Johnston wrote: > > One, the select generate_series(1,3) function call causes multiple rows to be > generated where there would usually be only one. Yes. > Two, composition results in an inside-to-outside execution order: the SRF is > evaluated first, the additional rows added, then the outer function (abs or > the subscript function respectively in these examples) is evaluated for > whatever rows are now present in the result. Yes. > Is the above something one can learn from our documentation? Yes, but only if you know where to look. > Is this syntax we are discouraging users from using and thus intentionally > not documenting it? On the contrary, I would say that this is the expected behaviour, and that it is documented, though not in the most obvious place: https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET That's probably not the first place one would go looking for it, and might also (wrongly) imply that it only works for functions written in language SQL. BTW, this is something that started working in PG10 (in 9.6, an error is thrown), and I think that it's a result of this release note item, which matches your conclusions: Change the implementation of set-returning functions appearing in a query's SELECT list (Andres Freund) Set-returning functions are now evaluated before evaluation of scalar expressions in the SELECT list, much as though they had been placed in a LATERAL FROM-clause item. This allows saner semantics for cases where multiple set-returning functions are present. If they return different numbers of rows, the shorter results are extended to match the longest result by adding nulls. Previously the results were cycled until they all terminated at the same time, producing a number of rows equal to the least common multiple of the functions' periods. In addition, set-returning functions are now disallowed within CASE and COALESCE constructs. For more information see Section 37.4.8. Another interesting consequence of that is that it's possible to do a similar thing with the array slice syntax, and a pair of generate_series() calls, for example: SELECT (array[1,2,3,4]::int[])[generate_series(1,4) : generate_series(2,4)]; array --- {1,2} {2,3} {3,4} (4 rows) Note: there are 4 rows in that result, and the last one is NULL, which is also consistent with the documentation, and the fact that the array slice function returns NULL if either subscript is NULL. I'd agree that there's an opportunity to improve the docs here. Regards, Dean