Hey, So I finally found the culprit. Turns out to be the THP fighting with itself.
After running on Ubuntu echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag It instantly went from a loadavg of 30 to 3 Also make sure you re-enable on reboot. Anyway just wanted to give a followup on the issue incase anyone else is having the same problem. On Mon, Mar 4, 2019 at 12:03 PM Jeff Janes <jeff.ja...@gmail.com> wrote: > On Wed, Feb 27, 2019 at 5:01 PM Michael Lewis <mle...@entrata.com> wrote: > >> If those 50-100 connections are all active at once, yes, that is high. >>> They can easily spend more time fighting each other over LWLocks, >>> spinlocks, or cachelines rather than doing useful work. This can be >>> exacerbated when you have multiple sockets rather than all cores in a >>> single socket. And these problems are likely to present as high Sys times. >>> >>> Perhaps you can put up a connection pooler which will allow 100 >>> connections to all think they are connected at once, but forces only 12 or >>> so to actually be active at one time, making the others transparently queue. >>> >> >> Can you expound on this or refer me to someplace to read up on this? >> > > Just based on my own experimentation. This is not a blanket > recommendation, but specific to the situation that we already suspect > there is contention, and the server is too old to have > pg_stat_actvity.wait_event > column. > > >> Context, I don't want to thread jack though: I think I am seeing similar >> behavior in our environment at times with queries that normally take >> seconds taking 5+ minutes at times of high load. I see many queries showing >> buffer_mapping as the LwLock type in snapshots but don't know if that may >> be expected. >> > > It sounds like your processes are fighting to reserve buffers in > shared_buffers in which to read data pages. But those data pages are > probably already in the OS page cache, otherwise reading it from disk would > be slow enough that you would be seeing some type of IO wait, or buffer_io, > rather than buffer_mapping as the dominant wait type. So I think that > means you have most of your data in RAM, but not enough of it in > shared_buffers. You might be in a rare situation where setting > shared_buffers to a high fraction of RAM, rather than the usual low > fraction, is called for. Increasing NUM_BUFFER_PARTITIONS might also be > useful, but that requires a recompilation of the server. But do these > spikes correlate with anything known at the application level? A change in > the mix of queries, or a long report or maintenance operation? Maybe the > query plans briefly toggle over to using seq scans rather than index scans > or vice versa, which drastically changes the block access patterns? > > >> In our environment PgBouncer will accept several hundred connections and >> allow up to 100 at a time to be active on the database which are VMs with >> ~16 CPUs allocated (some more, some less, multi-tenant and manually >> sharded). It sounds like you are advocating for connection max very close >> to the number of cores. I'd like to better understand the pros/cons of that >> decision. >> > > There are good reasons to allow more than that. For example, your > application holds some transactions open briefly while it does some > cogitation on the application-side, rather than immediately committing and > so returning the connection to the connection pool. Or your server has a > very high IO capacity and benefits from lots of read requests in the queue > at the same time, so it can keep every spindle busy and every rotation > productive. But, if you have no reason to believe that any of those > situations apply to you, but do have evidence that you have lock contention > between processes, then I think that limiting the number active processes > to the number of cores is a good starting point. > > Cheers, > > Jeff > -- T: @Thaumion IG: Thaumion scot...@gmail.com