On 2016/07/28 22:11, Kouhei Kaigai wrote:
I wrote:
That may be so, but my point is that the target relations involved in
the foreign join (ie, ft1 and ft2) should be printed somewhere in the
EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.
Why? According to your rule, Hash Join should take "on t0,t1,t2".
postgres=# explain select id from t0 natural join t1 natural join t2;
QUERY PLAN
-----------------------------------------------------------------------------
Hash Join (cost=6370.00..4560826.24 rows=98784048 width=4)
Hash Cond: (t0.aid = t1.aid)
-> Hash Join (cost=3185.00..3199360.58 rows=98784048 width=8)
Hash Cond: (t0.bid = t2.bid)
-> Seq Scan on t0 (cost=0.00..1833334.80 rows=100000080 width=12)
-> Hash (cost=1935.00..1935.00 rows=100000 width=4)
-> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4)
-> Hash (cost=1935.00..1935.00 rows=100000 width=4)
-> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4)
(9 rows)
I don't think it needs "on t0,t1,t2", because we can see joining
relations from inner/outer plans in that case. In a foreign-join case,
however, we can't see such relations from the EXPLAIN printed *by core*.
postgres_fdw avoids this issue by adding such relations to the EXPLAIN
using ExplainForeignScan as shown in the below example, but since such
relations are essential, I think that information should be shown by
core itself.
postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
QUERY PLAN
----------------------------------------------------------------
Hash Full Join (cost=202.06..204.12 rows=1 width=8)
Hash Cond: (ft1.a = ft3.a)
-> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
Relations: (public.ft1) LEFT JOIN (public.ft2)
-> Hash (cost=102.05..102.05 rows=1 width=4)
-> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
Relations: (public.ft3) LEFT JOIN (public.ft4)
(7 rows)
From the Relations line shown by postgres_fdw, we can see which foreign
join joins which foreign tables, but if no such lines, we couldn't.
I wrote:
Probably something like this:
Foreign Processing
Remote Operations: ...
In the Remote Operations line, the FDW/extension could print any info
about remote operations, eg, "Scan/Join + Aggregate".
"Foreign" implies this node is processed by FDW, but "Procesing" gives us
no extra information; seems to me redundant.
I intentionally chose that word and thought we could leave detailed
descriptions about remote operations to the FDW/extension; a broader
word like "Processing" seems to work well because we allow various kinds
of operations to the remote side, in addition to scans/joins, to be
performed in that one Foreign Scan node indicated by "Foreign
Processing", such as aggregation, window functions, distinct, order by,
row locking, table modification, or combinations of them.
Prior to the new invention, please explain why you don't want to by my
suggestion first? Annoying is a feel of you, but not a logic to persuade
others.
I'm not saying that the idea I proposed is better than your suggestion.
Just brain storming. I want to know what options we have and the pros
and cons of each approach.
postgres=# explain select id from t0 natural join t1 natural join t2;
QUERY PLAN
---------------------------------------------------------------------------
Custom Scan (GpuJoin) (cost=12385.67..291245.35 rows=9815465 width=4)
GPU Projection: t0.id
Depth 1: GpuHashJoin, HashKeys: (t0.bid)
JoinQuals: (t0.bid = t2.bid)
Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
Depth 2: GpuHashJoin, HashKeys: (t0.aid)
JoinQuals: (t0.aid = t1.aid)
Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
-> Seq Scan on t0 (cost=0.00..183333.96 rows=9999996 width=12)
-> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4)
-> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4)
(11 rows)
My largest concern for you proposition is, ForeignScan/CustomScan node is
enforced to print name of underlying relations, regardless of its actual
behavior. The above GpuJoin never scans tables at least, thus, it mislead
users if we have no choice to print underlying relation names.
OK, I understand we would need special handling for such custom joins.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers