> > > This is due to the recent commit > > cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the > > raw_buf and line_buf allocations for binary files. Since we are using > > raw_buf for this performance improvement feature, now, it's enough to > > restrict only line_buf for binary files. I made the changes > > accordingly in the v3 patch attached here. > > > > Regression tests(make check & make check-world) ran cleanly with the v3 > > patch. > > Thank you Bharath. I was a bit surprised that you had also submitted > a patch to NOT allocate raw_buf for COPY FROM ... BINARY. :-) >
Yes that was by me, before I started to work on this feature. I think we can backpatch that change(assuming we don't backpatch this feature), I will make the request accordingly. Anyways, now we don't allow line_buf allocation for binary files, which is also a good thing. > > > > > For the case where required nbytes may not fit into the buffer in > > > > CopyReadFromRawBuf, I'm sure this can happen only for field data, > > > > (field count , and field size are of fixed length and can fit in the > > > > buffer), instead of reading them in parts of buff size into the buffer > > > > (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the > > > > destination, I think we can detect this condition using requested > > > > bytes and the buffer size and directly read from the file to the > > > > destination buffer and then reload the raw_buffer for further > > > > processing. I think this way, it will be good. > > > > > > Hmm, I'm afraid that this will make the code more complex for > > > apparently small benefit. Is this really that much of a problem > > > performance wise? > > > > > > > Yes it makes CopyReadFromRawBuf(), code a bit complex from a > > readability perspective. I'm convinced not to have the > > abovementioned(by me) change, due to 3 reasons,1) the > > readability/understandability 2) how many use cases can we have where > > requested field size greater than RAW_BUF_SIZE(64KB)? I think very few > > cases. I may be wrong here. 3) Performance wise it may not be much as > > we do one extra memcpy only in situations where field sizes are > > greater than 64KB(as we have already seen and observed by you as well > > in one of the response [1]) that memcpy cost for this case may be > > negligible. > > Actually, an extra memcpy is incurred on every call of > CopyReadFromRawBuf(), but I haven't seen it to be very problematic. > Yes. > > CopyReadFromRawBuf as a name for the new function might be misleading > as long as we are only using it for binary data. Maybe > CopyReadBinaryData is more appropriate? See attached v4 with these > and a few other cosmetic changes. > CopyReadBinaryData() looks meaningful. +1. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com