On Tue, Jul 26, 2022 at 7:19 PM Etsuro Fujita <etsuro.fuj...@gmail.com> wrote: > Yeah, I think the patch is getting better, but I noticed some issues, > so I'm working on them. I think I can post a new version in the next > few days.
* When running AFTER ROW triggers in CopyMultiInsertBufferFlush(), the patch uses the slots passed to ExecForeignBatchInsert(), not the ones returned by the callback function, but I don't think that that is always correct, as the documentation about the callback function says: The return value is an array of slots containing the data that was actually inserted (this might differ from the data supplied, for example as a result of trigger actions.) The passed-in <literal>slots</literal> can be re-used for this purpose. postgres_fdw re-uses the passed-in slots, but other FDWs might not, so I modified the patch to reference the returned slots when running the AFTER ROW triggers. I also modified the patch to initialize the tts_tableOid. Attached is an updated patch, in which I made some minor adjustments to CopyMultiInsertBufferFlush() as well. * The patch produces incorrect error context information: create extension postgres_fdw; create server loopback foreign data wrapper postgres_fdw options (dbname 'postgres'); create user mapping for current_user server loopback; create table t1 (f1 int, f2 text); create foreign table ft1 (f1 int, f2 text) server loopback options (table_name 't1'); alter table t1 add constraint t1_f1positive check (f1 >= 0); alter foreign table ft1 add constraint ft1_f1positive check (f1 >= 0); — single insert copy ft1 from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> -1 foo >> 1 bar >> \. ERROR: new row for relation "t1" violates check constraint "t1_f1positive" DETAIL: Failing row contains (-1, foo). CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES ($1, $2) COPY ft1, line 1: "-1 foo" — batch insert alter server loopback options (add batch_size '2'); copy ft1 from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> -1 foo >> 1 bar >> \. ERROR: new row for relation "t1" violates check constraint "t1_f1positive" DETAIL: Failing row contains (-1, foo). CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES ($1, $2), ($3, $4) COPY ft1, line 3 In single-insert mode the error context information is correct, but in batch-insert mode it isn’t (i.e., the line number isn’t correct). The error occurs on the remote side, so I'm not sure if there is a simple fix. What I came up with is to just suppress error context information other than the relation name, like the attached. What do you think about that? (In CopyMultiInsertBufferFlush() your patch sets cstate->cur_lineno to buffer->linenos[i] even when running AFTER ROW triggers for the i-th row returned by ExecForeignBatchInsert(), but that wouldn’t always be correct, as the i-th returned row might not correspond to the i-th row originally stored in the buffer as the callback function returns only the rows that were actually inserted on the remote side. I think the proposed fix would address this issue as well.) * The patch produces incorrect row count in cases where some/all of the rows passed to ExecForeignBatchInsert() weren’t inserted on the remote side: create function trig_null() returns trigger as $$ begin return NULL; end $$ language plpgsql; create trigger trig_null before insert on t1 for each row execute function trig_null(); — single insert alter server loopback options (drop batch_size); copy ft1 from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 0 foo >> 1 bar >> \. COPY 0 — batch insert alter server loopback options (add batch_size '2'); copy ft1 from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 0 foo >> 1 bar >> \. COPY 2 The row count is correct in single-insert mode, but isn’t in batch-insert mode. The reason is that in batch-insert mode the row counter is updated immediately after adding the row to the buffer, not after doing ExecForeignBatchInsert(), which might ignore the row. To fix, I modified the patch to delay updating the row counter (and the progress of the COPY command) until after doing the callback function. For consistency, I also modified the patch to delay it even when batching into plain tables. IMO I think that that would be more consistent with the single-insert mode, as in that mode we update them after writing the tuple out to the table or sending it to the remote side. * I modified the patch so that when batching into foreign tables we skip useless steps in CopyMultiInsertBufferInit() and CopyMultiInsertBufferCleanup(). That’s all I have for now. Sorry for the delay. Best regards, Etsuro Fujita
v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-2.patch
Description: Binary data