Hi Bharath, On Mon, Jun 29, 2020 at 2:21 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > For Copy From Binary files, there exists below information for each tuple/row. > 1. field count(number of columns) > 2. for every field, field size(column data length) > 3. field data of field size(actual column data) > > Currently, all the above data required at each step is read directly from > file using fread() and this happens for all the tuples/rows. > > One observation is that in the total execution time of a copy from binary > file, the fread() call is taking upto 20% of time and the fread() function > call count is also too high. > > For instance, with a dataset of size 5.3GB, 10million tuples with 10 columns, > total exec time in sec total time taken for fread() fread() function call > count > 101.193 21.33 210000005 > 101.345 21.436 210000005 > > The total time taken for fread() and the corresponding function call count > may increase if we have more number of columns for instance 1000. > > One solution to this problem is to read data from binary file in > RAW_BUF_SIZE(64KB) chunks to avoid repeatedly calling fread()(thus possibly > avoiding few disk IOs). This is similar to the approach followed for csv/text > files.
I agree that having the buffer in front of the file makes sense, although we do now have an extra memcpy, that is, from raw_buf to attribute_buf.data. Currently, fread() reads directly into attribute_buf.data. But maybe that's okay as I don't see the new copy being all that bad. > Attaching a patch, implementing the above solution for binary format files. > > Below is the improvement gained. > total exec time in sec total time taken for fread() fread() function call > count > 75.757 2.73 160884 > 75.351 2.742 160884 > > Execution is 1.36X times faster, fread() time is reduced by 87%, fread() call > count is reduced by 99%. > > Request the community to take this patch for review if this approach and > improvement seem beneficial. > > Any suggestions to improve further are most welcome. Noticed the following misbehaviors when trying to test the patch: create table foo5 (a text, b text, c text, d text, e text); insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc', 'd', 'eee' from generate_series(1, 10000000); copy foo5 to '/tmp/foo5.bin' binary; truncate foo5; copy foo5 from '/tmp/foo5.bin' binary; ERROR: unexpected EOF in COPY data CONTEXT: COPY foo5, line 33, column a create table bar (a numeric); insert into bar select sqrt(a) from generate_series(1, 10000) a; copy bar to '/tmp/bar.bin' binary; copy bar from '/tmp/bar.bin' binary; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Trying to figure what was going wrong in each of these cases, I found the new code a bit hard to navigate and debug :(. Here are a couple of points that I think could have made things a bit easier: * Avoid spreading the new buffering logic in multiple existing functions, with similar pieces of code repeated in multiple places. I would add a single new function that takes care of the various buffering details and call it where CopyGetData() is being used currently. * You could've reused CopyLoadRawBuffer()'s functionality instead of reimplementing it. I also see multiple instances of special case handling, which often suggests that bugs are lurking. Considering these points, I came up with the attached patch with a much smaller footprint. As far as I can see, it implements the same basic idea as your patch. With it, I was able to see an improvement in loading time consistent with your numbers. I measured the time of loading 10 million rows into tables with 5, 10, 20 text columns as follows: create table foo5 (a text, b text, c text, d text, e text); insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc', 'd', 'eee' from generate_series(1, 10000000); copy foo5 to '/tmp/foo5.bin' binary; truncate foo5; copy foo5 from '/tmp/foo5.bin' binary; create table foo10 (a text, b text, c text, d text, e text, f text, g text, h text, i text, j text); insert into foo10 select repeat('a', (random()*100)::int), 'bbb', 'cc', 'd', 'eee', 'f', 'gg', 'hh', 'i', 'jjj' from generate_series(1, 10000000); copy foo10 to '/tmp/foo10.bin' binary; truncate foo10; copy foo10 from '/tmp/foo10.bin' binary; create table foo20 (a text, b text, c text, d text, e text, f numeric, g text, h text, i text, j text, k text, l text, m text, n text, o text, p text, q text, r text, s text, t text); insert into foo20 select repeat('a', (random()*100)::int), 'bbb', 'cc', 'd', 'eee', '123.456', 'gg', 'hh', 'ii', 'jjjj', 'kkk', 'llll', 'mm', 'n', 'ooooo', 'pppp', 'q', 'rrrrr', 'ss', 'tttttttttttt' from generate_series(1, 10000000); copy foo20 to '/tmp/foo20.bin' binary; truncate foo20; copy foo20 from '/tmp/foo20.bin' binary; The median times for the COPY FROM commands above, with and without the patch, are as follows: HEAD patched foo5 8.5 6.5 foo10 14 10 foo20 25 18 A few more points to remember in the future: * Commenting style: + /* If readbytes are lesser than the requested bytes, then initialize the + * remaining bytes in the raw_buf to 0. This will be useful for checking + * error "received copy data after EOF marker". + */ Multi-line comments are started like this: /* * <Start here> */ * As also mentioned above, it's a good idea in general to avoid having special cases like these in the code: + if (cstate->cur_lineno == 1) { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; + /* This is for the first time, so read in buff size amount + * of data from file. + */ ... + + /* Move bytes can either be 0, 1, or 2. */ + movebytes = RAW_BUF_SIZE - cstate->raw_buf_index; ... + uint8 movebytes = 0; + + /* Move bytes can either be 0, 1, 2, 3 or 4. */ + movebytes = RAW_BUF_SIZE - cstate->raw_buf_index; * Please try to make variable names short if you can or follow the guidelines around long names: + int32 remainingbytesincurrdatablock = RAW_BUF_SIZE - cstate->raw_buf_index; Maybe, remaining_bytes would've sufficed here, because "in the current data block" might be clear to most readers by looking at the surrounding code. * The above point also helps avoid long code lines that don't fit within 78 characters, like these: + memcpy(&cstate->attribute_buf.data[0], &cstate->raw_buf[cstate->raw_buf_index], remainingbytesincurrdatablock); + + if (CopyGetData(cstate, &cstate->attribute_buf.data[remainingbytesincurrdatablock], + (fld_size - remainingbytesincurrdatablock), (fld_size - remainingbytesincurrdatablock)) != (fld_size - remainingbytesincurrdatablock)) -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
CopyFrom-binary-buffering.patch
Description: Binary data