(2018/09/18 21:14), Kyotaro HORIGUCHI wrote:
At Fri, 14 Sep 2018 22:01:39 +0900, Etsuro Fujita<fujita.ets...@lab.ntt.co.jp>  wrote 
in<5b9bb133.1060...@lab.ntt.co.jp>
@@ -126,8 +173,18 @@ get_relation_info(PlannerInfo *root, Oid
relationObjectId,\
  bool inhparent,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot access temporary or unlogged relations during
                  r\
ecovery")));

+   max_attrnum = RelationGetNumberOfAttributes(relation);
+
+ /* Foreign table may have exanded this relation with junk columns */
+ if (root->simple_rte_array[varno]->relkind == RELKIND_FOREIGN_TABLE)
+   {
+ AttrNumber maxattno = max_varattno(root->parse->targetList, varno);
+       if (max_attrnum<  maxattno)
+           max_attrnum = maxattno;
+   }
+
     rel->min_attr = FirstLowInvalidHeapAttributeNumber + 1;
-   rel->max_attr = RelationGetNumberOfAttributes(relation);
+   rel->max_attr = max_attrnum;
     rel->reltablespace = RelationGetForm(relation)->reltablespace;

This breaks the fundamental assumption that rel->max_attr is equal to
RelationGetNumberOfAttributes of that table.  My concern is: this
change would probably be a wart, so it would be bug-prone in future
versions.

Hmm. I believe that once RelOptInfo is created all attributes
defined in it is safely accessed. Is it a wrong assumption?

The patch you proposed seems to fix the issue well for the current version of PG, but I'm a bit scared to have such an assumption (ie, to include columns in a rel's tlist that are not defined anywhere in the system catalogs). In future we might add eg, a lsyscache.c routine for some planning use that are given the attr number of a column as an argument, like get_attavgwidth, and if so, it would be easily conceivable that that routine would error out for such an undefined column. (get_attavgwidth would return 0, not erroring out, though.)

Actually RelationGetNumberOfAttributes is used in few distinct
places while planning.

build_physical_tlist is not used for
foreign relations.

For UPDATE/DELETE, that function would not be called for a foreign target in the posetgres_fdw case, as CTID is requested (see use_physical_tlist), but otherwise that function may be called if possible. No?

If we don't accept the expanded tupdesc for base relations, the
another way I can find is transforming the foreign relation into
something another like a subquery, or allowing expansion of
attribute list of a base relation...

Sorry, I don't understand this fully, but there seems to be the same concern as mentioned above.

Another thing on the new version:

@@ -1575,6 +1632,19 @@ build_physical_tlist(PlannerInfo *root,
RelOptInfo *rel)
             relation = heap_open(rte->relid, NoLock);

             numattrs = RelationGetNumberOfAttributes(relation);
+
+           /*
+ * Foreign tables may have expanded with some junk columns. Punt
+            * in the case.
...
I think this would disable the optimization on projection in foreign
scans, causing performance regression.

Well, in update/delete cases, create_plan_recurse on foreign scan
is called with CP_EXACT_TLIST in create_modifytable_plan

That's not necessarily true; consider UPDATE/DELETE on a local join; in that case the topmost plan node for a subplan of a ModifyTable would be a join, and if that's a NestLoop, create_plan_recurse would call create_nestloop_plan, which would recursively call create_plan_recurse for its inner/outer subplans with flag=0, not CP_EXACT_TLIST.

so the
code path is not actually used.

I think this is true for the postgres_fdw case; because use_physical_tlist would decide not to do build_physical_tlist for the reason mentioned above. BUT my question here is: why do we need the change to build_physical_tlist?

Since this uses fdw_scan_tlist so it is theoretically
back-patchable back to 9.6.

IIRC, the fdw_scan_tlist stuff was introduced in PG9.5 as part of join
pushdown infrastructure, so I think your patch can be back-patched to
PG9.5, but I don't think that's enough; IIRC, this issue was
introduced in PG9.3, so a solution for this should be back-patch-able
to PG9.3, I think.

In the previous version, fdw_scan_tlist is used to hold only
additional (junk) columns. I think that we can get rid of the
variable by scanning the full tlist for junk columns. Apparently
it's differnt patch for such versions. I'm not sure how much it
is invasive for now but will consider.

Sorry, I don't fully understand this. Could you elaborate a bit more on this?

0001-Add-test-for-postgres_fdw-foreign-parition-update.patch

   This should fail for unpatched postgres_fdw. (Just for demonstration)

+CREATE TABLE p1 (a int, b int);
+CREATE TABLE c1 (LIKE p1) INHERITS (p1);
+CREATE TABLE c2 (LIKE p1) INHERITS (p1);
+CREATE FOREIGN TABLE fp1 (a int, b int)
+ SERVER loopback OPTIONS (table_name 'p1');
+INSERT INTO c1 VALUES (0, 1);
+INSERT INTO c2 VALUES (1, 1);
+SELECT tableoid::int - (SELECT min(tableoid) FROM fp1)::int AS
toiddiff, ctid, * FROM fp1;

Does it make sense to evaluate toiddiff?  I think that should always
be 0.

Right. it is checking that the values are not those of remote
table oids.

Sorry, my explanation was not enough, but that seems to me more complicated than necessary. How about just evaluating tableoid::regclass, instead of toiddiff?

=======
0003-Fix-of-foreign-update-bug-of-PgFDW.patch

   Fix of postgres_fdw for this problem.

Sorry, I have not looked at it closely yet, but before that I'd like
to discuss the direction we go in.  I'm not convinced that your
approach is the right direction, so as promised, I wrote a patch using
the Param-based approach, and compared the two approaches.  Attached
is a WIP patch for that, which includes the 0003 patch.  I don't think
there would be any warts as discussed above in the Param-based
approach for now.  (That approach modifies the planner so that the
targetrel's tlist would contain Params as well as Vars/PHVs, so
actually, it breaks the planner assumption that a rel's tlist would
only include Vars/PHVs, but I don't find any issues on that at least
for now.  Will look into that in more detail.)  And I don't think
there would be any concern about performance regression, either.
Maybe I'm missing something, though.

What do you think about that?

Hmm. It is beyond my understanding. Great work (for me)!

I just implemented Tom's idea.  I hope I did that correctly.

I confirmed that a FOREIGN_PARAM_EXEC is evaluated and stored
into the parent node. For the mentioned Merge/Sort/ForeignScan
case, Sort node takes the parameter value via projection. I
didn't know PARAM_EXEC works that way. I consulted nodeNestLoop
but not fully understood.

So I think it works.  I still don't think expanded tupledesc is
not wart but this is smarter than that.  Addition to that, it
seems back-patchable. I must admit that yours is better.

As mentioned above, I'm a bit scared of the idea that we include columns not defined anywhere in the system catalogs in a rel's tlist. For the reason mentioned above, I think we should avoid such a thing, IMO.

Note about the attached: I tried to invent a utility for
generate_new_param like SS_make_initplan_output_param as mentioned in
[1], but since the FDW API doesn't pass PlannerInfo to the FDW, I
think the FDW can't call the utility the same way.  Instead, I
modified the planner so that 1) the FDW adds Params without setting
PARAM_EXEC Param IDs using a new function, and then 2) the core fixes
the IDs.

Agreed on not having PlannerInfo. I'll re-study this.  Some
comments on this right now are the follows.

Thanks for the comments!

It seems reserving the name remotetableoid, which doen't seem to
be used by users but not never.

This has also been suggested by Tom [2].

Maybe paramid space of FOREIGN_PARAM is not necessarily be the
same with ordinary params that needs signalling aid.

Yeah, but I modified the planner so that it can distinguish one from the other; because I think it's better to avoid unneeded SS_finalize_plan processing when only generating foreign Params, and/or minimize the cost in set_plan_references by only converting foreign Params into simple Vars using search_indexed_tlist_for_non_var, which are both expensive.

One thing I noticed is: in any approach, I think use_physical_tlist needs to be modified so that it disables doing build_physical_tlist for a foreign scan in the case where the FDW added resjunk columns for UPDATE/DELETE that are different from user/system columns of the foreign table; else such columns would not be emitted from the foreign scan.

Best regards,
Etsuro Fujita

[2] https://www.postgresql.org/message-id/8627.1526591849%40sss.pgh.pa.us

Reply via email to