On Sun, Jul 26, 2020 at 6:06 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Amit Langote <amitlangot...@gmail.com> writes: > > [ v7-0001-Improve-performance-of-binary-COPY-FROM-with-buff.patch ] > > Pushed with cosmetic changes.
Thanks for that. > I'd always supposed that stdio does enough internal buffering that short > fread()s shouldn't be much worse than memcpy(). But I reproduced your > result of ~30% speedup for data with a lot of narrow text columns, using > RHEL 8.2. Thinking this an indictment of glibc, I also tried it on > current macOS, and saw an even bigger speedup, approaching 50%. So > there's definitely something to this. I wonder if we ought to check > other I/O-constrained users of fread and fwrite, like pg_dump/pg_restore. Ah, maybe a good idea to check that. > A point that I did not see addressed in the thread is whether this > has any negative impact on the copy-from-frontend code path, where > there's no fread() to avoid; short reads from CopyGetData() are > already going to be satisfied by memcpy'ing from the fe_msgbuf. > However, a quick check suggests that this patch is still a small > win for that case too --- apparently the control overhead in > CopyGetData() is not negligible. Indeed. > So the patch seems fine functionally, but there were some cosmetic > things I didn't like: > > * Removing CopyGetInt32 and CopyGetInt16 seemed like a pretty bad > idea, because it made the callers much uglier and more error-prone. > This is a particularly bad example: > > /* Header extension length */ > - if (!CopyGetInt32(cstate, &tmp) || > - tmp < 0) > + if (CopyReadBinaryData(cstate, (char *) &tmp, sizeof(tmp)) != > + sizeof(tmp) || (tmp = (int32) pg_ntoh32(tmp)) < 0) > > Putting side-effects into late stages of an if-condition is just > awful coding practice. They're easy for a reader to miss and they > are magnets for bugs, because of the possibility that control doesn't > reach that part of the condition. > > You can get the exact same speedup without any of those disadvantages > by marking these two functions "inline", so that's what I did. > > * I dropped the DRAIN_COPY_RAW_BUF macro too, as in my estimation it was > a net negative for readability. With only two use-cases, having it made > the code longer not shorter; I was also pretty unconvinced about the > wisdom of having some of the loop's control logic inside the macro and > some outside. > > * BTW, the macro definitions weren't particularly per project style > anyway. We generally put at least one space before line-ending > backslashes. I don't think pgindent will fix this for you; IME > it doesn't touch macro definitions at all. > > * Did some more work on the comments. Thanks for these changes. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com