Re: Order of execution for permissive RLS policies

2018-07-25 Thread Dean Rasheed
On 24 July 2018 at 15:25, Simon Brent  wrote:
> I've been using postgres for a while now, and have just started looking in
> to row level security. I have found something that I think is a bit strange,
> and wanted to know if anyone knows how/why it is the case.
>
> I have a table with multiple policies, each with a USING statement. When I
> run EXPLAIN ANALYSE SELECT * FROM [table], I see that the policies are OR'd
> together in reverse alphabetical name order. It doesn't matter which order I
> create the policies in - the order they are checked is always (for example)
> zz OR yy OR xx OR ww.

Hmm, the fact that permissive policies sometimes appear to be checked
in reverse alphabetical order looks like an implementation artefact --
that's the order in which RLS policies are read when loading a table's
metadata (see RelationBuildRowSecurity() in
src/backend/commands/policy.c). I don't believe that was intentional,
but any case, PostgreSQL makes no guarantees about the order of
evaluation of clauses under an OR clause, and the query optimiser is
free to re-order them for efficiency, provided that doing so doesn't
affect the query result.

A trivial example where permissive policies won't be evaluated in
reverse alphabetical order would be a policy that said USING (a=1 AND
b=1), and another one that said USING (a=1 AND b=2). The query
optimiser would merge those together to produce a=1 AND (b=1 OR b=2),
and then consider any indexes on 'a' and/or 'b'. So there is, in
general, no well-defined order of evaluation of the policies.

The only case where order can affect the result of a query is multiple
restrictive policies used to check new data inserted into a table
using INSERT or UPDATE. In that case, the error message for new data
violating one of the policies depends on the order in which they are
checked, and it's useful to be able to predict what the error message
will be. This is why restrictive policies are sorted by name. (This is
a little like multiple CHECK constraints on a table, which are also
checked in name order). Again, that name-ordering of restrictive
policies only applies to the checks run on new data; the clauses added
to the WHERE clause to check permission to access existing data may be
rearranged by the query optimiser and evaluated in any order.

Regards,
Dean



Re: Will there ever be support for Row Level Security on Materialized Views?

2018-08-28 Thread Dean Rasheed
On 28 August 2018 at 01:49, Alvaro Herrera  wrote:
> On 2018-Aug-27, Ken Tanzer wrote:
>>- In the scheme of things, is it a lot of work or not so much?
>
> Probably not much.
>

Yeah, it doesn't seem like it would be particularly difficult, but it
would probably still be a reasonable amount of work to go round
finding all the places in code that would need updating. I.e., I think
it would be more mechanical work, than anything fundamentally
challenging.

On the face of it, it seems like it might be a reasonable thing to
support, but I wonder, is this really just syntactic sugar for
creating a security barrier view on top of the materialized view?

When RLS was originally implemented, that same question was asked for
tables, but the answer was "no" because RLS on a table gives you
fine-grained (row-level) control over what data in the table can be
modified as well as read, which a SB view doesn't give you. But for a
MV view, that's not a consideration, so what would RLS on a MV
actually give you?

Regards,
Dean



Re: Postgres trigger side-effect is occurring out of order with row-level security select policy

2018-10-01 Thread Dean Rasheed
The real issue here is to do with the visibility of the data inserted
by the trigger function from within the same command. In general, data
inserted by a command is not visible from within that same command.

The easiest way to see what's going on is with a simple example.
Consider the following (based on the original example, but without any
RLS):


DROP TABLE IF EXISTS a,b;

CREATE TABLE a (id text);
CREATE TABLE b (id text);

CREATE OR REPLACE FUNCTION reproHandler() RETURNS TRIGGER AS $$
BEGIN
RAISE NOTICE USING MESSAGE = 'inside trigger handler';
INSERT INTO b (id) VALUES (NEW.id);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER reproTrigger BEFORE INSERT ON a
FOR EACH ROW EXECUTE PROCEDURE reproHandler();

CREATE OR REPLACE FUNCTION check_b1(text) RETURNS boolean AS $$
BEGIN
  RETURN (EXISTS (SELECT * FROM b WHERE b.id = $1));
END
$$ LANGUAGE plpgsql STABLE;

CREATE OR REPLACE FUNCTION check_b2(text) RETURNS boolean AS $$
BEGIN
  RETURN (EXISTS (SELECT * FROM b WHERE b.id = $1));
END
$$ LANGUAGE plpgsql VOLATILE;

INSERT INTO a VALUES ('xxx')
  RETURNING id, check_b1(id), check_b2(id),
(EXISTS (SELECT * FROM b WHERE b.id = a.id));

NOTICE:  inside trigger handler
 id  | check_b1 | check_b2 | exists
-+--+--+
 xxx | f| t| f
(1 row)

INSERT 0 1


Notice that the functions check_b1() and check_b2() are identical,
except that check_b1() is declared STABLE and check_b2() is declared
VOLATILE, and that makes all the difference. Quoting from the
documentation for function volatility [1]:

For functions written in SQL or in any of the standard procedural
languages, there is a second important property determined by the
volatility category, namely the visibility of any data changes that
have been made by the SQL command that is calling the function. A
VOLATILE function will see such changes, a STABLE or IMMUTABLE
function will not.

[1] https://www.postgresql.org/docs/10/static/xfunc-volatility.html

Also notice that the inline EXISTS query behaves in the same way as
the STABLE function -- i.e., it does not see changes made in the
current query.

So returning to the RLS example, because the RLS SELECT policy is
defined using inline SQL, it cannot see the changes made by the
trigger. If you want to see such changes, you need to define a
VOLATILE function to do the RLS check.

Regards,
Dean



Re: Postgres trigger side-effect is occurring out of order with row-level security select policy

2018-10-02 Thread Dean Rasheed
On Mon, 1 Oct 2018 at 21:45, Carl Sverre  wrote:
> Dean,
> Thank you for the pointer towards visibility/volatility.  I think that 
> completely explains the effect that I am seeing in my repro.  I experimented 
> with using a VOLATILE function for the SELECT RLS using statement and while 
> it completely solves my issue, it incurs too high a cost for query execution 
> due to the RLS policy no longer being inlined into the scan.
>
> I have documented your answer and my experimentation on the stack overflow 
> answer:
> https://stackoverflow.com/questions/52565720/postgres-trigger-side-effect-is-occurring-out-of-order-with-row-level-security-s
>

I had a quick look at that and found a bug in your implementation. The
RLS check function is defined as follows:

CREATE OR REPLACE FUNCTION rlsCheck(id text) RETURNS TABLE (id text) AS $$
select * from b where b.id = id
$$ LANGUAGE sql VOLATILE;

which is incorrect because of the ambiguous reference to "id". That
final "id" will, by default, refer to the table column b.id, not the
parameter "id". Thus that function will return every row of b, and
your check won't be doing what you want. That's also going to hurt
performance, but you didn't provide enough information to diagnose the
actual performance problem that you are seeing.

In any case, the above needs to be written as

CREATE OR REPLACE FUNCTION rlsCheck(text) RETURNS TABLE (id text) AS $$
select id from b where b.id = $1
$$ LANGUAGE sql VOLATILE;

to work as expected.

Regards,
Dean



Re: Exponentiation confusion

2022-10-13 Thread Dean Rasheed
On Thu, 13 Oct 2022 at 18:17, Tom Lane  wrote:
>
> I'm inclined to think that we should push the responsibility for choosing
> its rscale into power_var_int(), because internally that already does
> estimate the result weight, so with a little code re-ordering we won't
> need duplicative estimates.  Don't have time to work on that right now
> though ... Dean, are you interested in fixing this?
>

OK, I'll take a look.

The most obvious thing to do is to try to make power_var_int() choose
the same result rscale as power_var() so that the results are
consistent regardless of whether the exponent is an integer.

It's worth noting, however, that that will cause in a *reduction* in
the output rscale rather than an increase in some cases, since the
power_var_int() code path currently always chooses an rscale of at
least 16, whereas the other code path in power_var() uses the rscales
of the 2 inputs, and produces a minimum of 16 significant digits,
rather than 16 digits after the decimal point. For example:

select power(5.678, 18.0001::numeric);
  power
-
 37628507689498.14987457
(1 row)

select power(5.678, 18::numeric);
  power
-
 37628507036041.8454541428979479
(1 row)

Regards,
Dean




Re: regr_slope returning NULL

2019-03-24 Thread Dean Rasheed
On Sun, 24 Mar 2019 at 08:01, Steve Baldwin  wrote:
>
> Thanks Tom,
>
> I've tried this on 11.2 (OS X 10.14.3, installed locally) and 10.6 (AWS RDS) 
> instances with identical results.  The values you show are identical to those 
> returned by Oracle so that's great but why am I seeing different results?
>

This is caused by the large magnitude of the ts values, which causes a
cancellation error in the Sxx calculation, which is what commit
e954a727f0 fixed in HEAD, and will be available in PG12 [1].

You can see that by including regr_sxx in the results. With PG11, this
gives the following:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |trend | sxx
--+--+-
 c742 |  |   0
 317e |  |   0
 5fe6 | 5.78750952760444e-06 | 19905896448
 3441 |  |   0
(4 rows)

Those zeros for Sxx are the result of calculating the sum of the
squares of ts values and then subtracting off the square of the mean,
which results in a complete loss of accuracy because the intermediate
values are so large they don't differ according to double precision
arithmetic.

A workaround in PG11 is to just offset the ts values by something
close to their mean (offsetting the ts values by a constant amount
shouldn't affect the mathematical result, but does eliminate the
cancellation errors):

select id, regr_slope(elapsed, ts-1552892914) as trend,
   regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |trend |sxx
--+--+
 c742 | 19.6077357654714 | 0.0468182563781738
 317e |-1.08385104429772 |   59.2381523980035
 5fe6 | 5.78750948360697e-06 |   19905896596.7403
 3441 |-3.82839508895523 |   20.1098628044128
(4 rows)


For PG12 the algorithm for calculating these quantities has been
changed by e954a727f0, so the result should be more accurate
regardless of the offset:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |trend |sxx
--+--+
 c742 | 19.6078587812905 | 0.0468179252929986
 317e | -1.0838511987809 |   59.2381423694815
 5fe6 | 5.78750948358674e-06 |   19905896596.7605
 3441 |-3.82839546309736 |   20.1098619909822
(4 rows)

select id, regr_slope(elapsed, ts-1552892914) as trend,
   regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |trend |sxx
--+--+
 c742 | 19.6078431374563 | 0.046817990382
 317e |-1.08385109620679 |   59.2381495556381
 5fe6 | 5.78750948360693e-06 |   19905896596.7403
 3441 |-3.82839509931361 |20.109862749992
(4 rows)

Regards,
Dean

[1] https://github.com/postgres/postgres/commit/e954a727f0



Re: security invoker review need full select (all columns) to do DML?

2024-08-21 Thread Dean Rasheed
On Wed, 21 Aug 2024 at 10:08, jian he  wrote:
>
> the following setup is extract from src/test/regress/sql/updatable_views.sql
> you can search keywords: "-- ordinary view on top of security invoker
> view permissions"
>
> CREATE TABLE base_tbl(a int, b text, c float);
> INSERT INTO base_tbl VALUES (1, 'Row 1', 1.0);
>
> SET SESSION AUTHORIZATION regress_view_user1;
> CREATE VIEW rw_view1 AS SELECT b AS bb, c AS cc, a AS aa FROM base_tbl;
> ALTER VIEW rw_view1 SET (security_invoker = true);
>
> RESET SESSION AUTHORIZATION;
> GRANT SELECT(a,b) ON base_tbl TO regress_view_user1;

In updatable_views.sql that GRANT is actually

GRANT SELECT ON base_tbl TO regress_view_user1;

Without that, the view is effectively unusable by regress_view_user1
because it selects from column c of base_tbl, and regress_view_user1
lacks permissions on that column.

This is consistent with simple subqueries:

select a, b from (select a,b from base_tbl); -- ok
 a |   b
---+---
 1 | Row 1
(1 row)

select a, b from (select a,b,c from base_tbl); -- not allowed
ERROR:  permission denied for table base_tbl

The user must have select permissions on all columns selected by the
subquery/view, because we don't go through the outer query to check
which columns are actually referred to. Now maybe we could, but I
suspect that would be quite a lot of effort because you'd need to be
sure that the column wasn't referred to anywhere in either the outer
query or the subquery itself (e.g., in WHERE clauses, etc.).

Regards,
Dean




Re: Undocumented array_val[generate_series(...)] functionality?

2021-07-12 Thread Dean Rasheed
On Mon, 12 Jul 2021 at 02:39, David G. Johnston
 wrote:
>
> One, the select generate_series(1,3) function call causes multiple rows to be 
> generated where there would usually be only one.

Yes.

> Two, composition results in an inside-to-outside execution order: the SRF is 
> evaluated first, the additional rows added, then the outer function (abs or 
> the subscript function respectively in these examples) is evaluated for 
> whatever rows are now present in the result.

Yes.

> Is the above something one can learn from our documentation?

Yes, but only if you know where to look.

> Is this syntax we are discouraging users from using and thus intentionally 
> not documenting it?

On the contrary, I would say that this is the expected behaviour, and
that it is documented, though not in the most obvious place:

https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET

That's probably not the first place one would go looking for it, and
might also (wrongly) imply that it only works for functions written in
language SQL.

BTW, this is something that started working in PG10 (in 9.6, an error
is thrown), and I think that it's a result of this release note item,
which matches your conclusions:

  Change the implementation of set-returning functions appearing in
  a query's SELECT list (Andres Freund)

  Set-returning functions are now evaluated before evaluation of
  scalar expressions in the SELECT list, much as though they had been
  placed in a LATERAL FROM-clause item. This allows saner semantics for
  cases where multiple set-returning functions are present. If they
  return different numbers of rows, the shorter results are extended to
  match the longest result by adding nulls. Previously the results were
  cycled until they all terminated at the same time, producing a number
  of rows equal to the least common multiple of the functions'
  periods. In addition, set-returning functions are now disallowed
  within CASE and COALESCE constructs. For more information see Section
  37.4.8.

Another interesting consequence of that is that it's possible to do a
similar thing with the array slice syntax, and a pair of
generate_series() calls, for example:

SELECT (array[1,2,3,4]::int[])[generate_series(1,4) : generate_series(2,4)];

 array
---
 {1,2}
 {2,3}
 {3,4}

(4 rows)

Note: there are 4 rows in that result, and the last one is NULL, which
is also consistent with the documentation, and the fact that the array
slice function returns NULL if either subscript is NULL.

I'd agree that there's an opportunity to improve the docs here.

Regards,
Dean