Re: [PATCH] Performance Improvement For Copy From Binary Files

Amit Langote Thu, 09 Jul 2020 20:22:11 -0700

Hi Bharath,

On Thu, Jul 9, 2020 at 7:33 PM Bharath Rupireddy
<bharath.rupireddyforpostg...@gmail.com> wrote:
> Thanks Amit for buying into the idea. I agree that your patch looks
> clean and simple compared to mine and I'm okay with your patch.
>
> I reviewed and tested your patch, below are few comments:


Thanks for checking it out.

> I think we can remove(and delete the function from the code) the
> CopyGetInt16() and have the code directly to save the function call
> cost. It gets called for each attribute/column for each row/tuple to
> just call CopyReadFromRawBuf() and set the byte order. From a
> readability perspective it's okay to have this function, but cost wise
> I feel no need for that function at all. In one of our earlier
> work(parallel copy), we observed that having a new function or few
> extra statements in this copy from path which gets hit for each row,
> incurs noticeable execution cost.
>
> The same way, we can also avoid using CopyGetInt32() function call in
> CopyReadBinaryAttribute() for the same reason stated above.

I agree that removing the function call overhead in this case is worth
the slight loss of readability.

> In CopyReadFromRawBuf(), can the "saveTo" parameter be named "dest"
> and use that with (char *) typecast directly, instead of having a
> local variable? Though it may/may not be a standard practice, let's
> have the parameter name all lower case to keep it consistent with
> other function's parameters in the copy.c file.

Agreed.

> Seems like there's a bug in below part of the code. Use case is
> simple, have some junk value at the end of the binary file, then with
> your patch the query succeeds, but it should report the below error.
> Here, on fld_count == -1 instead of reading from file, we must be
> reading it from the buffer, as we would have already read all the data
> from the file into the buffer.
>            if (cstate->copy_dest != COPY_OLD_FE &&
>                 CopyGetData(cstate, &dummy, 1, 1) > 0)
>                 ereport(ERROR,
>                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>                          errmsg("received copy data after EOF marker")));
>
> I also tried with some intentionally corrupted binary datasets, (apart
> from above issue) patch works fine.

Yeah, I see the bug.  I should've checked all the call sites of
CopyGetData() and made sure there is only one left, that is,
CopyLoadRawBuffer().

> For the case where required nbytes may not fit into the buffer in
> CopyReadFromRawBuf, I'm sure this can happen only for field data,
> (field count , and field size are of fixed length and can fit in the
> buffer), instead of reading them in parts of buff size into the buffer
> (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the
> destination, I think we can detect this condition using requested
> bytes and the buffer size and directly read from the file to the
> destination buffer and then reload the raw_buffer for further
> processing. I think this way, it will be good.

Hmm, I'm afraid that this will make the code more complex for
apparently small benefit.  Is this really that much of a problem
performance wise?

> I have few synthesized test cases where fields can be of larger size.
> I executed them on your patch, but didn't debug to see whether
> actually we hit the code where required nbytes can't fit in the entire
> buffer. I will try this on the next version of the patch.
>
> >
> >             HEAD    patched
> > foo5        8.5     6.5
> > foo10       14      10
> > foo20       25      18
> >
>
> Numbers might improve a bit, if we remove the extra function calls as
> stated above.

Here the numbers with the updated patch:

            HEAD    patched (v2)
foo5         8.5        6.1
foo10         14        9.4
foo20         25       16.7

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

CopyFrom-binary-buffering_v2.patch
Description: Binary data

Re: [PATCH] Performance Improvement For Copy From Binary Files

Reply via email to