On 2016/07/28 22:11, Kouhei Kaigai wrote:

I wrote:
That may be so, but my point is that the target relations involved in
the foreign join (ie, ft1 and ft2) should be printed somewhere in the
EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.

Why? According to your rule, Hash Join should take "on t0,t1,t2".

postgres=# explain select id from t0 natural join t1 natural join t2;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Hash Join  (cost=6370.00..4560826.24 rows=98784048 width=4)
   Hash Cond: (t0.aid = t1.aid)
   ->  Hash Join  (cost=3185.00..3199360.58 rows=98784048 width=8)
         Hash Cond: (t0.bid = t2.bid)
         ->  Seq Scan on t0  (cost=0.00..1833334.80 rows=100000080 width=12)
         ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
               ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
   ->  Hash  (cost=1935.00..1935.00 rows=100000 width=4)
         ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
(9 rows)

I don't think it needs "on t0,t1,t2", because we can see joining relations from inner/outer plans in that case. In a foreign-join case, however, we can't see such relations from the EXPLAIN printed *by core*. postgres_fdw avoids this issue by adding such relations to the EXPLAIN using ExplainForeignScan as shown in the below example, but since such relations are essential, I think that information should be shown by core itself.

postgres=# explain select * from (select ft1.a from ft1 left join ft2 on ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3 left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
                           QUERY PLAN
----------------------------------------------------------------
 Hash Full Join  (cost=202.06..204.12 rows=1 width=8)
   Hash Cond: (ft1.a = ft3.a)
   ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
         Relations: (public.ft1) LEFT JOIN (public.ft2)
   ->  Hash  (cost=102.05..102.05 rows=1 width=4)
         ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
               Relations: (public.ft3) LEFT JOIN (public.ft4)
(7 rows)

From the Relations line shown by postgres_fdw, we can see which foreign join joins which foreign tables, but if no such lines, we couldn't.

I wrote:
Probably something like this:

   Foreign Processing
     Remote Operations: ...

In the Remote Operations line, the FDW/extension could print any info
about remote operations, eg, "Scan/Join + Aggregate".

"Foreign" implies this node is processed by FDW, but "Procesing" gives us
no extra information; seems to me redundant.

I intentionally chose that word and thought we could leave detailed descriptions about remote operations to the FDW/extension; a broader word like "Processing" seems to work well because we allow various kinds of operations to the remote side, in addition to scans/joins, to be performed in that one Foreign Scan node indicated by "Foreign Processing", such as aggregation, window functions, distinct, order by, row locking, table modification, or combinations of them.

Prior to the new invention, please explain why you don't want to by my
suggestion first? Annoying is a feel of you, but not a logic to persuade
others.

I'm not saying that the idea I proposed is better than your suggestion. Just brain storming. I want to know what options we have and the pros and cons of each approach.

postgres=# explain select id from t0 natural join t1 natural join t2;
                                QUERY PLAN
---------------------------------------------------------------------------
 Custom Scan (GpuJoin)  (cost=12385.67..291245.35 rows=9815465 width=4)
   GPU Projection: t0.id
   Depth 1: GpuHashJoin, HashKeys: (t0.bid)
            JoinQuals: (t0.bid = t2.bid)
            Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
   Depth 2: GpuHashJoin, HashKeys: (t0.aid)
            JoinQuals: (t0.aid = t1.aid)
            Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
   ->  Seq Scan on t0  (cost=0.00..183333.96 rows=9999996 width=12)
   ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=4)
   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=4)
(11 rows)

My largest concern for you proposition is, ForeignScan/CustomScan node is
enforced to print name of underlying relations, regardless of its actual
behavior. The above GpuJoin never scans tables at least, thus, it mislead
users if we have no choice to print underlying relation names.

OK, I understand we would need special handling for such custom joins.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to