On 2022-08-25 01:54, Damir Belyalov wrote:
+ /* Buffer was filled, commit subtransaction and
prepare to replay */
+ ReleaseCurrentSubTransaction();
What is actually being committed by this
ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled
before
this commit.
All tuples are collected in replay_buffer, which is created in
''oldcontext'' of CopyFrom() (not context of a subtransaction). That's
why it didn't clean up when you used
RollbackAndReleaseCurrentSubTransaction().
Subtransactions are needed for better safety. There is no error when
copying from a file to the replay_buffer, but an error may occur at
the next stage - when adding a tuple to the table. Also there may be
other errors that are not obvious at first glance.
Thanks for the explanation and updating patch.
I now understand that the data being COPYed are not the target of
COMMIT.
Although in past discussions[1] it seems the data to be COPYed were also
subject to COMMIT, but I understand this patch has adopted another
design.
350 + /* Buffer was filled, commit subtransaction and
prepare to replay */
351 + ReleaseCurrentSubTransaction();
352 + CurrentResourceOwner = safecstate->oldowner;
353 +
354 + safecstate->replay_is_active = true;
BTW in v4 patch, data are loaded into the buffer one by one, and when
the buffer fills up, the data in the buffer are 'replayed' also one by
one, right?
Wouldn't this have more overhead than a normal COPY?
As a test, I COPYed slightly larger data with and without ignore_errors
option.
There might be other reasons, but I found a performance difference.
```
=# copy test from '/tmp/10000000.data' ;
COPY 10000000
Time: 6001.325 ms (00:06.001)
=# copy test from '/tmp/10000000.data' with (ignore_errors);
NOTICE: FIND 0 ERRORS
COPY 10000000
Time: 7711.555 ms (00:07.712)
```
I feel it might be better to have it as a parameter rather than
fixed at
1000.
Yes, I thought about it too. So I created another version with the GUC
parameter - replay_buffer_size. Attached it. But I think users won't
need to change replay_buffer_size.
Also replay_buffer does the same thing as MultiInsert buffer does and
MultiInsert buffer defined by const = 1000.
Yeah, when the data being COPYed are not the target of COMMIT, I also
think users won't neet to change it.
NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in
copyfrom.c.
Since safeNextCopyFrom() is analog of NextCopyFrom() as commented,
would
it be natural to put them in the same file?
Sure, corrected it.
Thanks.
188 + safecstate->errors++;
189 + if (safecstate->errors <= 100)
190 + ereport(WARNING,
191 + (errcode(errdata->sqlerrcode),
192 + errmsg("%s", errdata->context)));
It would be better to state in the manual that errors exceeding 100
are
not displayed.
Or, it might be acceptable to output all errors that exceed 100.
It'll be too complicated to create a new parameter just for showing
the given number of errors. I think 100 is an optimal size.
Yeah, I may have misled you, but I also don't think this needs a new
parameter.
I just thought 'either' of the following would be better:
- Put in the documentation that the warnings will not be output for more
than 101 cases.
- Output all the warnings.
It would be helpful to add comments about skip_row, etc.
Corrected it.
Thanks.
WARNING: FIND 0 ERRORS
When there were no errors, this WARNING seems not necessary.
Corrected it.
Thanks.
I applied v4 patch and when canceled the COPY, there was a case I found
myself left in a transaction.
Should this situation be prevented from occurring?
```
=# copy test from '/tmp/10000000.data' with (ignore_errors );
^CCancel request sent
ERROR: canceling statement due to user request
=# truncate test;
ERROR: current transaction is aborted, commands ignored until end of
transaction block
```
[1]
https://www.postgresql.org/message-id/1197677930.1536.18.camel%40dell.linuxdev.us.dell.com
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION