RE: [POC] Fast COPY FROM command for the table with foreign partitions

tsunakawa.ta...@fujitsu.com Sun, 18 Oct 2020 21:13:04 -0700

Hello Andrey-san,


Thank you for challenging an interesting feature.  Below are my review comments.


(1)
-       /* for use by copy.c when performing multi-inserts */
+       /*
+        * The following fields are currently only relevant to copy.c.
+        *
+        * True if okay to use multi-insert on this relation
+        */
+       bool ri_usesMultiInsert;
+
+       /* Buffer allocated to this relation when using multi-insert mode */
        struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;

It's better to place the new bool member next to an existing bool member, so 
that the structure doesn't get larger.


(2)
+       Assert(rri->ri_usesMultiInsert == false);

As the above assertion represents, I'm afraid the semantics of 
ExecRelationAllowsMultiInsert() and ResultRelInfo->ri_usesMultiInsert are 
unclear.  In CopyFrom(), ri_usesMultiInsert is set by also considering the 
COPY-specific conditions:

+       if (!cstate->volatile_defexprs &&
+               !contain_volatile_functions(cstate->whereClause) &&
+               ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
+               target_resultRelInfo->ri_usesMultiInsert = true;

On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set 
purely based on the relation's characteristics.

+       leaf_part_rri->ri_usesMultiInsert =
+               ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);

In addition to these differences, I think it's a bit confusing that the 
function itself doesn't record the check result in ri_usesMultiInsert.

It's probably easy to understand to not add ri_usesMultiInsert, and the 
function just encapsulates the check logic based solely on the relation 
characteristics and returns the result.  So, the argument is just one 
ResultRelInfo.  The caller (e.g. COPY) combines the function result with other 
specific conditions.


(3)
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+                                                                               
         ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+                                                                               
   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+                                                                               
                           TupleTableSlot **slots,
+                                                                               
                           int nslots);

To align with other function groups, it's better to place the functions in 
order of Begin, Exec, and End.


(4)
+       /* COPY a bulk of tuples into a foreign relation */
+       BeginForeignCopy_function BeginForeignCopy;
+       EndForeignCopy_function EndForeignCopy;
+       ExecForeignCopy_function ExecForeignCopy;

To align with the other functions' comment, the comment should be:
        /* Support functions for COPY */


(5)
+<programlisting>
+TupleTableSlot *
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.

The return type is void.


(6)
+     <literal>nslots</literal> cis a number of tuples in the 
<literal>slots</literal>

cis -> is


(7)
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will 
fail
+     with an error message.
+    </para>

"attempts to insert into" should be "attempts to run COPY on", because it's 
used for COPY.
Furthermore, if ExecForeignCopy is NULL, COPY should use ExecForeignInsert() 
instead, right?  Otherwise, existing FDWs would become unable to be used for 
COPY.


(8)
+       bool            pipe = (filename == NULL) && (data_dest_cb == NULL);

The above pipe in BeginCopyTo() is changed to not match pipe in DoCopyTo(), 
which only refers to filename.  Should pipe in DoCopyTo() also be changed?  If 
no, the use of the same variable name for different conditions is confusing.


(9)
-        * partitions will be done later.
+-       * partitions will be done later.

This is an unintended addition of '-'?


(10)
-       if (resultRelInfo->ri_FdwRoutine != NULL &&
-               resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-               resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-                                                                               
                                 resultRelInfo);
+       if (target_resultRelInfo->ri_FdwRoutine != NULL)
+       {
+               if (target_resultRelInfo->ri_usesMultiInsert)
+                       
target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+                                                                               
                                                  resultRelInfo);
+               else if 
(target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+                       
target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+                                                                               
                                                        resultRelInfo);
+       }

BeginForeignCopy() should be called if it's defined, because BeginForeignCopy() 
is an optional function.


(11) 
+               oldcontext = 
MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+               table_multi_insert(resultRelInfo->ri_RelationDesc,
+                                                  slots,

The extra empty line seems unintended.


(12)
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
                        (void) pq_putmessage('d', fe_msgbuf->data, 
fe_msgbuf->len);
                        break;
                case COPY_CALLBACK:
-                       Assert(false);          /* Not yet supported. */
+                       CopySendChar(cstate, '\n');
+                       cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);

As in the COPY_FILENAME case, shouldn't the line terminator be sent only in 
text format, and be changed to \r\n on Windows?  I'm asking this as I'm 
probably a bit confused about in what situation COPY_CALLBACK could be used.  I 
thought the binary format and \r\n line terminator could be necessary depending 
on the FDW implementation.


(13)
@@ -1001,9 +1001,13 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
         * If the partition is a foreign table, let the FDW init itself for
         * routing tuples to the partition.
         */
-       if (partRelInfo->ri_FdwRoutine != NULL &&
-               partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-               partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, 
partRelInfo);
+       if (partRelInfo->ri_FdwRoutine != NULL)
+       {
+               if (partRelInfo->ri_usesMultiInsert)
+                       partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, 
partRelInfo);
+               else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+                       partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, 
partRelInfo);
+       }

BeginForeignCopy() should be called only if it's defined, because 
BeginForeignCopy() is an optional function.


(14)
@@ -1205,10 +1209,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
                ResultRelInfo *resultRelInfo = proute->partitions[i];
 
                /* Allow any FDWs to shut down */
-               if (resultRelInfo->ri_FdwRoutine != NULL &&
-                       resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-                       
resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-                                                                               
                                   resultRelInfo);
+               if (resultRelInfo->ri_FdwRoutine != NULL)
+               {
+                       if (resultRelInfo->ri_usesMultiInsert)
+                       {
+                               
Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+                               
resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+                                                                               
                                           resultRelInfo);
+                       }
+                       else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert 
!= NULL)
+                               
resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+                                                                               
                                           resultRelInfo);
+               }

EndForeignCopy() is an optional function, isn't it?  That is, it's called if 
it's defined.


(15)
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+       PGconn *conn = copy_fmstate->conn;
+
+       if (PQputCopyData(conn, (char *) buf, len) <= 0)
+       {
+               PGresult *res = PQgetResult(conn);
+
+               pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+       }
+}

The following page says "Use PQerrorMessage to retrieve details if the return 
value is -1."  So, it's correct to not use PGresult here and pass NULL as the 
second argument to pgfdw_report_error().

https://www.postgresql.org/docs/devel/libpq-copy.html


(16)
+               for (i = 0; i < nslots; i++)
+                       CopyOneRowTo(fmstate->cstate, slots[i]);
+
+               status = true;
+       }

I'm afraid it's not intuitive what "status is true" means.  I think 
copy_data_sent or copy_send_success would be better for the variable name.


(17)
+               if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) 
<= 0 ||
+                       PQflush(conn))
+                       ereport(ERROR,
+                                       (errmsg("error returned by 
PQputCopyEnd: %s",
+                                                       PQerrorMessage(conn))));

As the places that call PQsendQuery(), it seems preferrable to call 
pgfdw_report_error() here too.


Regards
Takayuki Tsunakawa

RE: [POC] Fast COPY FROM command for the table with foreign partitions

Reply via email to