On 14 September 2017 at 07:34, Andres Freund <and...@anarazel.de> wrote: > Hi, > > When running workloads that include fast queries with a lot of columns, > SendRowDescriptionMessage(), and the routines it calls, becomes a > bottleneck. Besides syscache lookups (see [1] and [2]) a major cost of > that is manipulation of the StringBuffer and the version specific > branches in the per-attribute loop. As so often, the performance > differential of this patch gets bigger when the other performance > patches are applied. > > The issues in SendRowDescriptionMessage() are the following: > > 1) All the pq_sendint calls, and also the pq_sendstring, are > expensive. The amount of calculations done for a single 2/4 byte > addition to the stringbuffer (particularly enlargeStringInfo()) is > problematic, as are the reallocations themselves. > > I addressed this by adding pq_send*_pre() wrappers, implemented as > inline functions, that require that memory is pre-allocated. > Combining that with doing a enlargeStringInfo() in > SendRowDescriptionMessage() that pre-allocates the maximum required > memory, yields pretty good speedup. > > I'm not yet super sure about the implementation. For one, I'm not > sure this shouldn't instead be stringinfo.h functions, with very very > tiny pqformat.h wrappers. But conversely I think it'd make a lot of > sense for the pqformat integer functions to get rid of the > continually maintained trailing null-byte - I was hoping the compiler > could optimize that away, but alas, no luck. As soon as a single > integer is sent, you can't rely on 0 terminated strings anyway. > > 2) It creates a new StringInfo in every iteration. That causes > noticeable memory management overhead, and restarts the size of the > buffer every time. When the description is relatively large, that > leads to a number of reallocations for every > SendRowDescriptionMessage() call. > > My solution here was to create persistent StringInfo for > SendRowDescriptionMessage() that never gets deallocated (just > reset). That in combination with new versions of > pq_beginmessage/endmessage that keep the buffer alive, yields a nice > speedup. > > Currently I'm using a static variable to allocate a string buffer for > the function. It'd probably better to manage that outside of a single > function - I'm also wondering why we're allocating a good number of > stringinfos in various places, rather than reuse them. Most of them > can't be entered recursively, and even if that's a concern, we could > have one reusable per portal or such. > > 3) The v2/v3 branches in the attribute loop are noticeable (others too, > but well...). I solved that by splitting out the v2 and v3 > per-attribute loops into separate functions. Imo also a good chunk > more readable. > > Comments?
I've run a fairly basic test with a table with 101 columns, selecting a single row from the table and I get the following results: Columns with 1-character names: master (80 jobs, 80 connections, 60 seconds): transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transactions actually processed: 559275 latency average = 8.596 ms tps = 9306.984058 (including connections establishing) tps = 11144.622096 (excluding connections establishing) patched: transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transactions actually processed: 585526 latency average = 8.210 ms tps = 9744.240847 (including connections establishing) tps = 11454.339301 (excluding connections establishing) master (80 jobs, 80 connections, 120 seconds): transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 120 s number of transactions actually processed: 1108312 latency average = 8.668 ms tps = 9229.356994 (including connections establishing) tps = 9987.385281 (excluding connections establishing) patched: transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 120 s number of transactions actually processed: 1130313 latency average = 8.499 ms tps = 9412.904876 (including connections establishing) tps = 10319.404302 (excluding connections establishing) Columns with at least 42-character names: master (80 jobs, 80 connections, 60 seconds): transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transactions actually processed: 197815 latency average = 24.308 ms tps = 3291.157486 (including connections establishing) tps = 3825.309489 (excluding connections establishing) patched: transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 60 s number of transactions actually processed: 198549 latency average = 24.214 ms tps = 3303.896651 (including connections establishing) tps = 3895.738024 (excluding connections establishing) master (80 jobs, 80 connections, 120 seconds): transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 120 s number of transactions actually processed: 391312 latency average = 24.551 ms tps = 3258.493026 (including connections establishing) tps = 3497.581844 (excluding connections establishing) patched: transaction type: /tmp/test.sql scaling factor: 1 query mode: simple number of clients: 80 number of threads: 80 duration: 120 s number of transactions actually processed: 391733 latency average = 24.526 ms tps = 3261.805060 (including connections establishing) tps = 3552.604408 (excluding connections establishing) This is just using the patches you posted on this thread. Does this exercise the patch in the way you intended? Regards Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers