Joe Conway <[email protected]> writes:
> Possibly try using/abusing RLS?

Cute idea, but I think it doesn't reliably address the problem of
wanting to identify the specific rows that were read.  In your toy
example it'd work, because the generated plan is

regression=> explain verbose select * from t1 where c1=42;
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on public.t1  (cost=0.00..343.38 rows=2 width=36)
   Output: c1, c2
   Filter: ((t1.c1 = 42) AND audit(t1.c1))
(3 rows)

so the WHERE clause is applied before the RLS filter.  But in any
slightly-more-complicated situation, like a non-leakproof WHERE
condition, the order would be reversed so the log would overstate
which rows were read.

If the application's behavior is simple and well-defined, this might
be good enough, of course.

I thought of a way that could possibly do this reliably, but it's
vastly more work than the use-case seems worth:

1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
without changing the application, by interposing a view).  SELECT
FOR SHARE might be good enough, not sure.

2. Write a logical replication output plugin that parses the WAL log
well enough to identify the tuple locks taken by FOR UPDATE.

This should work to log only the rows actually read, because FOR
UPDATE is postponed to the top of the query plan, unlike RLS.

                        regards, tom lane


Reply via email to