Hi Hackers,
Here is a proof-of-concept patch to inline set-returning functions (SRFs) besides those written in
SQL. We already try to inline SQL-language functions,[1] but that means you must have a static SQL
query. There is no way to get an inline-able query by dynamically building the sql in, say, plpgsql.
We also have a SupportRequestSimplify request type for functions that use SUPPORT to declare a
support function, and it can replace the FuncExpr with an arbitrary nodetree.[2] I think this was
intended for constant-substitution, but we can also use it to let functions generate dynamic SQL and
then inline it. In this patch, if a SRF replaces itself with a Query node, then
inline_set_returning_function will use that.
So far there are no tests or docs; I'm hoping to hear feedback on the idea
before going further.
Here is my concrete use-case: I wrote a function to do a temporal semijoin,[3] and I want it to be
inlined. There is a support function that builds the same SQL and lets Postgres parse it into a
Query.[4] (In practice I would rewrite the main function in C too, so it could share the
SQL-building code there, but this is just a POC.) If you build and install that extension on its
`inlined` branch,[5] then you can do this:
```
\i bench.sql
explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id',
'valid_at') j(id bigint, valid_at daterange);
explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id',
'valid_at') j(id bigint, valid_at daterange) where j.id = 10::bigint;
```
Without this patch, you get `ERROR: unrecognized node type: 58`. But with this patch you get these
plans:
```
postgres=# explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions',
'employee_id', 'valid_at') j(id bigint, valid_at daterange);
QUERY PLAN
----------------------------------------------------------------------------------------------
ProjectSet (cost=4918.47..6177.06 rows=22300 width=40)
-> Hash Join (cost=4918.47..6062.77 rows=223 width=53)
Hash Cond: (employees.id = j.employee_id)
Join Filter: (employees.valid_at && j.valid_at)
-> Seq Scan on employees (cost=0.00..1027.39 rows=44539 width=21)
-> Hash (cost=4799.15..4799.15 rows=9545 width=40)
-> Subquery Scan on j (cost=4067.61..4799.15 rows=9545
width=40)
-> HashAggregate (cost=4067.61..4703.70 rows=9545
width=40)
Group Key: positions.employee_id
Planned Partitions: 16
-> Seq Scan on positions (cost=0.00..897.99
rows=44099 width=21)
(11 rows)
postgres=# explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions',
'employee_id', 'valid_at') j(id bigint, valid_at daterange) where j.id = 10::bigint;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
ProjectSet (cost=0.56..9.22 rows=100 width=40)
-> Nested Loop (cost=0.56..8.71 rows=1 width=53)
-> GroupAggregate (cost=0.28..4.39 rows=1 width=40)
-> Index Only Scan using idx_positions_on_employee_id on positions
(cost=0.28..4.36 rows=5 width=21)
Index Cond: (employee_id = '10'::bigint)
-> Index Only Scan using employees_pkey on employees
(cost=0.28..4.30 rows=1 width=21)
Index Cond: ((id = '10'::bigint) AND (valid_at &&
(range_agg(positions.valid_at))))
(7 rows)
```
In particular I'm excited to see in the second plan that the predicate gets
pushed into the subquery.
If it seems good to let people use SupportRequestSimplify to make their SRFs be inlineable, I'm
happy to add tests and docs. We should really document the idea of inlined functions in general, so
I'll do that too.
Another approach I considered is using a separate support request, e.g. SupportRequestInlineSRF, and
just calling it from inline_set_returning_function. I didn't like having two support requests that
did almost exactly the same thing. OTOH my current approach means you'll get an error if you do this:
```
postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id',
'valid_at');
ERROR: unrecognized node type: 66
```
I'll look into ways to fix that.
I think SupportRequestSimplify is a really cool feature. It is nearly like
having macros.
I'm dreaming about other ways I can (ab)use it. Just making inline-able SRFs has many applications.
From my own client work, I could use this for a big permissions query or a query with complicated
pricing logic.
The sad part though is that SUPPORT functions must be written in C. That means few people will use
them, especially these days when so many are in the cloud. Since they take a Node and return a Node,
maybe there is no other way. But I would love to have a different mechanism that receives the
function's arguments (evaluated) and returns a string, which we parse as a SQL query and then
inline. The arguments would have to be const-reducible to strings, of course. You could specify that
function with a new INLINE keyword when you create your target function. That feature would be less
powerful, but with broader reach.
I'd be glad to hear your thoughts!
[1] https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions (I couldn't find any mention in our
docs though, so we should add that.)
[2] https://www.postgresql.org/docs/current/xfunc-optimization.html
[3] https://github.com/pjungwir/temporal_ops/blob/master/temporal_ops--1.0.0.sql
[4] https://github.com/pjungwir/temporal_ops/blob/inlined/temporal_ops.c
[5] https://github.com/pjungwir/temporal_ops/tree/inlined
Yours,
--
Paul ~{:-)
p...@illuminatedcomputing.com
From dcd58bbe27e715688d49b3d0244524c817c5c6c3 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" <p...@illuminatedcomputing.com>
Date: Thu, 27 Jun 2024 10:17:35 -0700
Subject: [PATCH v1] Inline non-SQL SRFs using SupportRequestSimplify
If the support request returns a Query node, then
inline_set_returning_function will use it in place of the FuncExpr.
---
src/backend/optimizer/util/clauses.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b4e085e9d4b..d412033dba6 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -5085,11 +5085,17 @@ inline_set_returning_function(PlannerInfo *root, RangeTblEntry *rte)
if (rte->funcordinality)
return NULL;
- /* Fail if RTE isn't a single, simple FuncExpr */
if (list_length(rte->functions) != 1)
return NULL;
rtfunc = (RangeTblFunction *) linitial(rte->functions);
+ /*
+ * If the SRF was inlined using SupportRequestInline, then we expect a
+ * Query, otherwise a FuncExpr. Either way, only one item.
+ */
+ if (IsA(rtfunc->funcexpr, Query))
+ return (Query *)rtfunc->funcexpr;
+
if (!IsA(rtfunc->funcexpr, FuncExpr))
return NULL;
fexpr = (FuncExpr *) rtfunc->funcexpr;
--
2.42.0