(2010/07/21 19:35), KaiGai Kohei wrote: > (2010/07/21 19:26), Robert Haas wrote: >> 2010/7/21 KaiGai Kohei<kai...@ak.jp.nec.com>: >>>> On the other hand, if it's enough from a performance >>>> point of view to review and mark only a few built-in functions like >>>> index operators, maybe it's ok. >>>> >>> I also think it is a worthful idea to try as a proof-of-concept. >> >> Yeah. So, should we mark this patch as Returned with Feedback, and >> you can submit a proof-of-concept patch for the next CF? >> > Yes, it's fair enough. > The attached patch is a proof-of-concept one according to the suggestion from Heikki before.
Right now, it stands on a strict assumption that considers operators implemented with built-in functions are safe; it does not have no possibility to leak supplied arguments anywhere. This patch modifies the logic that the planner decides where the given qualifier clause should be distributed. If the clause does not contain any "leakable" functions, nothing were changed. In this case, the clause shall be pushed down into inside of the join, if it depends on one-side of the join. Elsewhere, if the clause contains any "leakable" functions, this patch prevents to push down the clause into join loop, because the clause need to be evaluated after the condition of join. If it would not be prevented, the "leakable" function may expose the contents to be invisible for users; due to the VIEWs for row-level security purpose. Example -------------------------------------------- postgres=# CREATE OR REPLACE FUNCTION f_policy(int) RETURNS bool LANGUAGE 'plpgsql' AS 'begin return $1 % 2 = 0; end;'; CREATE FUNCTION postgres=# CREATE OR REPLACE FUNCTION f_malicious(text) RETURNS bool LANGUAGE 'plpgsql' COST 0.001 AS 'begin raise notice ''leak: %'', $1; return true; end'; CREATE FUNCTION postgres=# CREATE TABLE t1 (a int primary key, b text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1" CREATE TABLE postgres=# CREATE TABLE t2 (x int primary key, y text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey" for table "t2" CREATE TABLE postgres=# INSERT INTO t1 VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'); INSERT 0 3 postgres=# INSERT INTO t2 VALUES (2,'xxx'), (3,'yyy'), (4,'zzz'); INSERT 0 3 postgres=# CREATE VIEW v AS SELECT * FROM t1 JOIN t2 ON f_policy(a+x); CREATE VIEW * SELECT * FROM v WHERE f_malicious(b); [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------------------ Nested Loop (cost=0.00..133685.13 rows=168100 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..24.35 rows=410 width=36) -> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36) Filter: f_malicious(b) (6 rows) ==> f_malicious() may be raise a notice about invisible tuples. [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b); QUERY PLAN ------------------------------------------------------------------- Nested Loop (cost=0.00..400969.96 rows=168100 width=72) Join Filter: (f_malicious(t1.b) AND f_policy((t1.a + t2.x))) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..28.45 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> f_malicious() is moved to outside of the join. It is evaluated earlier than f_policy() in same level due to the function cost, but it is another matter. * SELECT * FROM v WHERE a = 2; [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..353.44 rows=410 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a = 2; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..353.44 rows=410 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36) Index Cond: (a = 2) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> "a = 2" is a built-in operator, so we assume it is safe. This clause was pushed down into the join loop, then utilized to index scan. * SELECT * FROM v WHERE a::text = 'a'; [without this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2'; QUERY PLAN ---------------------------------------------------------------- Nested Loop (cost=0.00..2009.54 rows=2460 width=72) Join Filter: f_policy((t1.a + t2.x)) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..31.55 rows=6 width=36) -> Seq Scan on t1 (cost=0.00..31.52 rows=6 width=36) Filter: ((a)::text = '2'::text) (6 rows) ==> "a::text = 'a'" was pushed down into the join loop. [with this patch] postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2'; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..412312.92 rows=2521 width=72) Join Filter: (((t1.a)::text = '2'::text) AND f_policy((t1.a + t2.x))) -> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36) -> Materialize (cost=0.00..28.45 rows=1230 width=36) -> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36) (5 rows) ==> The "a::text = '2'" clause contains CoerceViaIO node, so this patch assumes it is not safe. Please note that this patch does not case about a case when a function inside a view and a function outside a view are distributed into same level and the later function has lower cost value. To fix this problem definitely, we also need to mark nest-level of the clause being originally supplied, and need to order the clauses by the nest-level with higher priority than cost. If we found f_malicious() and f_policy() in same level at the above example, f_policy() should be executed earlier because it was originally supplied in the deeper nest level. Thanks, -- KaiGai Kohei <kai...@ak.jp.nec.com>
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 9cc7b8a..82f7bfb 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -762,6 +762,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, bool pseudoconstant = false; bool maybe_equivalence; bool maybe_outer_join; + bool maybe_leakable_clause = false; Relids nullable_relids; RestrictInfo *restrictinfo; @@ -780,6 +781,19 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, elog(ERROR, "JOIN qualification cannot refer to other relations"); /* + * If the clause contains a leakable function, it may be used to + * bypass row-level security using views. In this case, we don't + * push down the clause not to evaluate the leakable clause prior + * to the row-level policy functions. + */ + if (!bms_is_empty(relids)) + { + maybe_leakable_clause = contain_leakable_functions(clause); + if (maybe_leakable_clause) + relids = (!ojscope ? bms_copy(qualscope) : bms_copy(ojscope)); + } + + /* * If the clause is variable-free, our normal heuristic for pushing it * down to just the mentioned rels doesn't work, because there are none. * @@ -1030,7 +1044,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, * If none of the above hold, pass it off to * distribute_restrictinfo_to_rels(). */ - if (restrictinfo->mergeopfamilies) + if (!maybe_leakable_clause && restrictinfo->mergeopfamilies) { if (maybe_equivalence) { diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index c8b2585..74fc64a 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -86,6 +86,7 @@ static bool contain_subplans_walker(Node *node, void *context); static bool contain_mutable_functions_walker(Node *node, void *context); static bool contain_volatile_functions_walker(Node *node, void *context); static bool contain_nonstrict_functions_walker(Node *node, void *context); +static bool contain_leakable_functions_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); @@ -1129,6 +1130,120 @@ contain_nonstrict_functions_walker(Node *node, void *context) } + +/***************************************************************************** + * Check clauses for leakable functions + *****************************************************************************/ + +/* + * contain_leakable_functions + * Recursively search for leakable functions within a clause. + * + * Returns true if any function call with side-effect is found. + * ie, some type-input/output handler will raise an error when given + * argument does not have a valid format. + * + * When people uses views for row-level security purpose, given qualifiers + * come from outside of the view should not be pushed down into the views, + * if they have side-effect, because contents of tuples to be filtered out + * may be leaked via side-effectable functions within the qualifiers. + * + * The idea here is that the planner restrain a part of optimization when + * the qualifiers contains leakable functions. + * This routine checks whether the given clause contains leakable functions, + * or not. If we return false, then the clause is clean. + */ +bool +contain_leakable_functions(Node *clause) +{ + return contain_leakable_functions_walker(clause, NULL); +} + +static bool +contain_leakable_functions_walker(Node *node, void *context) +{ + if (node == NULL) + return false; + + if (IsA(node, FuncExpr)) + { + /* + * currently, we have no way to distinguish a safe function and + * a leakable one, so all the function call shall be considered + * as leakable one. + */ + return true; + } + else if (IsA(node, OpExpr)) + { + OpExpr *expr = (OpExpr *) node; + + /* + * we assume built-in functions to implement operators are not + * leakable, so don't need to prevent optimization. + */ + set_opfuncid(expr); + if (get_func_lang(expr->opfuncid) != INTERNALlanguageId) + return true; + /* else fall through to check args */ + } + else if (IsA(node, DistinctExpr)) + { + DistinctExpr *expr = (DistinctExpr *) node; + + set_opfuncid((OpExpr *) expr); + if (get_func_lang(expr->opfuncid) != INTERNALlanguageId) + return true; + /* else fall through to check args */ + } + else if (IsA(node, ScalarArrayOpExpr)) + { + ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) node; + + set_sa_opfuncid(expr); + if (get_func_lang(expr->opfuncid) != INTERNALlanguageId) + return true; + /* else fall through to check args */ + } + else if (IsA(node, CoerceViaIO) || + IsA(node, ArrayCoerceExpr)) + { + /* + * we assume type-in/out handlers are leakable, even if built-in + * functions. + * ie, int4in() raises an error message with given argument, + * if it does not have valid format for numeric value. + */ + return true; + } + else if (IsA(node, NullIfExpr)) + { + NullIfExpr *expr = (NullIfExpr *) node; + + set_opfuncid((OpExpr *) expr); /* rely on struct equivalence */ + if (get_func_lang(expr->opfuncid) != INTERNALlanguageId) + return true; + /* else fall through to check args */ + } + else if (IsA(node, RowCompareExpr)) + { + /* RowCompare probably can't have volatile ops, but check anyway */ + RowCompareExpr *rcexpr = (RowCompareExpr *) node; + ListCell *opid; + + foreach(opid, rcexpr->opnos) + { + Oid funcId = get_opcode(lfirst_oid(opid)); + + if (get_func_lang(funcId) != INTERNALlanguageId) + return true; + } + /* else fall through to check args */ + } + return expression_tree_walker(node, contain_leakable_functions_walker, + context); +} + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 19a4a45..196dddf 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -1275,6 +1275,25 @@ get_func_namespace(Oid funcid) } /* + * get_func_lang + * Given procedure id, return the function's language + */ +Oid +get_func_lang(Oid funcid) +{ + HeapTuple tp; + Oid result; + + tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); + if (!HeapTupleIsValid(tp)) + elog(ERROR, "cache lookup failed for function %u", funcid); + + result = ((Form_pg_proc) GETSTRUCT(tp))->prolang; + ReleaseSysCache(tp); + return result; +} + +/* * get_func_rettype * Given procedure id, return the function's result type. */ diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index e979596..2f6d86d 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -67,6 +67,7 @@ extern bool contain_subplans(Node *clause); extern bool contain_mutable_functions(Node *clause); extern bool contain_volatile_functions(Node *clause); extern bool contain_nonstrict_functions(Node *clause); +extern bool contain_leakable_functions(Node *clause); extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *clause); diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 066ad76..efffbc7 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -77,6 +77,7 @@ extern RegProcedure get_oprrest(Oid opno); extern RegProcedure get_oprjoin(Oid opno); extern char *get_func_name(Oid funcid); extern Oid get_func_namespace(Oid funcid); +extern Oid get_func_lang(Oid funcid); extern Oid get_func_rettype(Oid funcid); extern int get_func_nargs(Oid funcid); extern Oid get_func_signature(Oid funcid, Oid **argtypes, int *nargs);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers