Re: [HACKERS] [v9.2] Fix Leaky View Problem

Kohei KaiGai Sun, 16 Oct 2011 01:54:03 -0700

Hi Robert,

I'm a bit confusing about this sentence.


> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together.  If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index.  Or at least a
> highly selective index.
>

I tried to reproduce the scenario with enough small from/join_collapse_limit
(typically 1), but it allows to push down qualifiers into the least scan plan.

E.g)
mytest=# SET from_collapse_limit = 1;
mytest=# SET join_collapse_limit = 1;
mytest=# CREATE VIEW B AS SELECT B1.* FROM B1,B2,B3 WHERE B1.x = B2.x
AND B2.x = B3.x;
mytest=# EXPLAIN SELECT * FROM A,B,C WHERE A.x=B.x AND B.x=C.x AND f_leak(B.y);
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Merge Join  (cost=381.80..9597.97 rows=586624 width=108)
   Merge Cond: (a.x = b1.x)
   ->  Merge Join  (cost=170.85..290.46 rows=7564 width=72)
         Merge Cond: (a.x = c.x)
         ->  Sort  (cost=85.43..88.50 rows=1230 width=36)
               Sort Key: a.x
               ->  Seq Scan on a  (cost=0.00..22.30 rows=1230 width=36)
         ->  Sort  (cost=85.43..88.50 rows=1230 width=36)
               Sort Key: c.x
               ->  Seq Scan on c  (cost=0.00..22.30 rows=1230 width=36)
   ->  Materialize  (cost=210.95..528.56 rows=15510 width=44)
         ->  Merge Join  (cost=210.95..489.78 rows=15510 width=44)
               Merge Cond: (b1.x = b3.x)
               ->  Merge Join  (cost=125.52..165.40 rows=2522 width=40)
                     Merge Cond: (b1.x = b2.x)
                     ->  Sort  (cost=40.09..41.12 rows=410 width=36)
                           Sort Key: b1.x
                           ->  Seq Scan on b1  (cost=0.00..22.30
rows=410 width=36)
                                 Filter: f_leak(y)
                     ->  Sort  (cost=85.43..88.50 rows=1230 width=4)
                           Sort Key: b2.x
                           ->  Seq Scan on b2  (cost=0.00..22.30
rows=1230 width=4)
               ->  Sort  (cost=85.43..88.50 rows=1230 width=4)
                     Sort Key: b3.x
                     ->  Seq Scan on b3  (cost=0.00..22.30 rows=1230 width=4)
(25 rows)

In this example, f_leak() takes an argument come from B1 table within B view,
and it was correctly distributed to SeqScan on B1.

>From perspective of the code, the *_collapse_limit affects the contents of
joinlist being returned from deconstruct_jointree() whether its sub-portion is
flatten, or not.
However, the qualifiers are distributed on distribute_restrictinfo_to_rels() to
RelOptInfo based on its dependency of relations being referenced by
arguments. Thus, the above f_leak() was distributed to B1, not B, because
its arguments come from only B1.


I agree with the following approach to tackle this problem in 100%.
However, I'm unclear how from/join_collapse_limit affects to keep
sub-queries unflatten. It seems to me it is determined based on
the result of is_simple_subquery().

> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>

Thanks,

2011/10/11 Robert Haas <[email protected]>:
> On Mon, Oct 10, 2011 at 4:28 PM, Kohei KaiGai <[email protected]> wrote:
>> I agreed. We have been on the standpoint that tries to prevent
>> leakable functions to reference a portion of join-tree being already
>> flatten, however, it has been a tough work.
>> It seems to me it is much simple approach that enables to push
>> down only non-leaky functions into inside of sub-queries.
>>
>> An idea is to add a hack on distribute_qual_to_rels() to relocate
>> a qualifier into inside of the sub-query, when it references only
>> a particular sub-query being come from a security view, and
>> when the sub-query satisfies is_simple_subquery(), for example.
>
> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together.  If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index.  Or at least a
> highly selective index.  If we could allow the B.x qual to trickle
> down inside of the subquery, we'd get a much better plan.  Of course,
> it's still not as good as flattening, because it won't allow us to
> consider as many possible join orders - but the whole point of having
> from_collapse_limit/join_collapse_limit in the first place is that we
> can't consider all the join orders without having planning time and
> memory usage balloon wildly out of control.  And in many real-world
> cases, I think that this would probably mitigate the effects of
> exceeding from_collapse_limit/join_collapse_limit quite a bit.
>
> In order to make it work, though, you'd need to arrange things so that
> we distribute quals to rels in the parent query, then let some of them
> filter down into the subquery, then distribute quals to rels in the
> subquery (possibly adjusting RTE indexes?), then finish planning the
> subquery, then finish planning the parent query.  Not sure how
> possible/straightforward that is.
>
> It's probably a good idea to deal with this part first, because if you
> can't make it work then the whole approach is in trouble.  I'm almost
> imagining that we could break this into three independent patches,
> like this:
>
> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>
> Unlike the way you have it now, I think those patches could be
> independently committable.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
KaiGai Kohei <[email protected]>

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Fix Leaky View Problem

Reply via email to