Dear Hackers, I figured out this email was sent at release time. The worst time to ask for thoughts on a subject IMHO. Anyway, I hope this email will pop the topic over the stack! Thank you!
Le ven. 8 sept. 2023 à 16:41, Jean-Christophe Arnu <jca...@gmail.com> a écrit : > Dear hackers, > > I recently found a weird behaviour involving FDW (postgres_fdw) and > planning. > > Here’s a simplified use-case: > > Given a remote table (say on server2) with the following definition: > > CREATE TABLE t1( > ts timestamp without time zone, > x bigint, > x2 text > ); > --Then populate t1 table:INSERT INTO t1 > SELECT > current_timestamp - 1000*random()*'1 day'::interval > ,x > ,''||x > FROM > generate_series(1,100000) as x; > > > This table is imported in a specific schema on server1 (we do not use > use_remote_estimate) also with t1 name in a specific schema: > > On server1: > > CREATE SERVER server2 > FOREIGN DATA WRAPPER postgres_fdw > OPTIONS ( > host '127.0.0.1', > port '9002', > dbname 'postgres', > use_remote_estimate 'false' > ); > CREATE USER MAPPING FOR jc > SERVER server2 > OPTIONS (user 'jc'); > CREATE SCHEMA remote; > > IMPORT FOREIGN SCHEMA public > FROM SERVER server2 > INTO remote ; > > On a classic PostgreSQL 15 version the following query using date_trunc() > is executed and results in the following plan: > > jc=# explain (verbose,analyze) select date_trunc('day',ts), count(1) from > remote.t1 group by date_trunc('day',ts) order by 1; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------------- > Sort (cost=216.14..216.64 rows=200 width=16) (actual time=116.699..116.727 > rows=1001 loops=1) > Output: (date_trunc('day'::text, ts)), (count(1)) > Sort Key: (date_trunc('day'::text, t1.ts)) > Sort Method: quicksort Memory: 79kB > -> HashAggregate (cost=206.00..208.50 rows=200 width=16) (actual > time=116.452..116.532 rows=1001 loops=1) > Output: (date_trunc('day'::text, ts)), count(1) > Group Key: date_trunc('day'::text, t1.ts) > Batches: 1 Memory Usage: 209kB > -> Foreign Scan on remote.t1 (cost=100.00..193.20 rows=2560 > width=8) (actual time=0.384..106.225 rows=100000 loops=1) > Output: date_trunc('day'::text, ts) > Remote SQL: SELECT ts FROM public.t1 > Planning Time: 0.077 ms > Execution Time: 117.028 ms > > > Whereas the same query with date_bin() > > jc=# explain (verbose,analyze) select date_bin('1day',ts,'2023-01-01'), > count(1) from remote.t1 group by 1 order by 1; > > QUERY PLAN > > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Foreign Scan (cost=113.44..164.17 rows=200 width=16) (actual > time=11.297..16.312 rows=1001 loops=1) > Output: (date_bin('1 day'::interval, ts, '2023-01-01 00:00:00'::timestamp > without time zone)), (count(1)) > Relations: Aggregate on (remote.t1) > Remote SQL: SELECT date_bin('1 day'::interval, ts, '2023-01-01 > 00:00:00'::timestamp without time zone), count(1) FROM public.t1 GROUP BY 1 > ORDER BY date_bin('1 day'::interval, ts, '2023-01-01 00:00:00'::timestamp > without time zone) ASC NULLS LAST > Planning Time: 0.114 ms > Execution Time: 16.599 ms > > > > With date_bin() the whole expression is pushed down to the remote server, > whereas with date_trunc() it’s not. > > I dived into the code and live debugged. It turns out that decisions to > pushdown or not a whole query depends on many factors like volatility and > collation. In the date_trunc() case, the problem is all about collation ( > date_trunc() on timestamp without time zone). And decision is made in the > foreign_expr_walker() in deparse.c ( > https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/postgres_fdw/deparse.c;h=efaf387890e3f85c419748ec3af5d1e9696c9c4c;hb=86648dcdaec67b83cec20a9d25b45ec089a7c624#l468 > ) > > First the function is tested as shippable (able to be pushed down) and > date_trunc() and date_bin() both are. > > Then parameters sub-expressions are evaluated with collation and > “shippability”, and they all are with both functions. > > Then we arrive at this code portion: > > if (fe->inputcollid == InvalidOid) > /* OK, inputs are all noncollatable */ ;else if (inner_cxt.state != > FDW_COLLATE_SAFE || > fe->inputcollid != inner_cxt.collation) > return false; > > For date_trunc() function : > > - > > fe variable contains the sub-expressions/arguments merged constraints > such as fe->inputcollid. This field is evaluated to 100 (default > collation) so codes jumps to else statement and evaluates the if > predicates. This 100 inputcollationid is due to text predicate 'day'. > - > > inner_cxt.state contains FDW_COLLATE_STATE but inner_cxt.collation > contains 0 (InvalidOid) so the control flow returns false thus the > function cannot be pushed down. > > For date_bin() function : > > - fe variable contains the sub-expressions/arguments merged > constraints. Here, fe->inputcollid is evaluated to 0 (InvalidOid) thus > skips the else statement and continues the control flow in the > function. > > For date_bin(), all arguments are “non-collatable” arguments (timestamp > without time zone and interval). > > So the situation is that date_trunc() is a “non-collatable” function > failing to be pushed down whereas it may be a good idea to do so. > > Maybe we could add another condition to the first if statement in order to > allow a “no-collation” function to be pushed down even if they have > “collatable” parameters. I’m not sure about the possible regressions of > behaviour of this change, but it seems to work fine with date_trunc() and > date_part() (which suffers the same problem). > > Here’s the following change > > /* > * If function's input collation is not derived from a foreign > * Var, it can't be sent to remote. > */if (fe->inputcollid == InvalidOid || > fe->funccollid == InvalidOid) > /* OK, inputs are all noncollatable */ ;else if (inner_cxt.state != > FDW_COLLATE_SAFE || > fe->inputcollid != inner_cxt.collation) > return false; > > I don’t presume this patch is free from side effects or fits all use-cases. > > A patch (tiny) is attached to this email. This patch works against > master/head at the time of writing. > Thank you for any thoughts. > > -- > Jean-Christophe Arnu > -- Jean-Christophe Arnu