Hey,
So I finally found the culprit. Turns out to be the THP fighting with
itself.

After running on Ubuntu
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

It instantly went from a loadavg of 30 to 3

Also make sure you re-enable on reboot.

Anyway just wanted to give a followup on the issue incase anyone else is
having the same problem.

On Mon, Mar 4, 2019 at 12:03 PM Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Wed, Feb 27, 2019 at 5:01 PM Michael Lewis <mle...@entrata.com> wrote:
>
>> If those 50-100 connections are all active at once, yes, that is high.
>>> They can easily spend more time fighting each other over LWLocks,
>>> spinlocks, or cachelines rather than doing useful work.  This can be
>>> exacerbated when you have multiple sockets rather than all cores in a
>>> single socket.  And these problems are likely to present as high Sys times.
>>>
>>> Perhaps you can put up a connection pooler which will allow 100
>>> connections to all think they are connected at once, but forces only 12 or
>>> so to actually be active at one time, making the others transparently queue.
>>>
>>
>> Can you expound on this or refer me to someplace to read up on this?
>>
>
> Just based on my own experimentation.  This is not a blanket
> recommendation,  but specific to the situation that we already suspect
> there is contention, and the server is too old to have 
> pg_stat_actvity.wait_event
> column.
>
>
>> Context, I don't want to thread jack though: I think I am seeing similar
>> behavior in our environment at times with queries that normally take
>> seconds taking 5+ minutes at times of high load. I see many queries showing
>> buffer_mapping as the LwLock type in snapshots but don't know if that may
>> be expected.
>>
>
> It sounds like your processes are fighting to reserve buffers in
> shared_buffers in which to read data pages.  But those data pages are
> probably already in the OS page cache, otherwise reading it from disk would
> be slow enough that you would be seeing some type of IO wait, or buffer_io,
> rather than buffer_mapping as the dominant wait type.  So I think that
> means you have most of your data in RAM, but not enough of it in
> shared_buffers.  You might be in a rare situation where setting
> shared_buffers to a high fraction of RAM, rather than the usual low
> fraction, is called for.  Increasing NUM_BUFFER_PARTITIONS might also be
> useful, but that requires a recompilation of the server.  But do these
> spikes correlate with anything known at the application level?  A change in
> the mix of queries, or a long report or maintenance operation?  Maybe the
> query plans briefly toggle over to using seq scans rather than index scans
> or vice versa, which drastically changes the block access patterns?
>
>
>> In our environment PgBouncer will accept several hundred connections and
>> allow up to 100 at a time to be active on the database which are VMs with
>> ~16 CPUs allocated (some more, some less, multi-tenant and manually
>> sharded). It sounds like you are advocating for connection max very close
>> to the number of cores. I'd like to better understand the pros/cons of that
>> decision.
>>
>
> There are good reasons to allow more than that.  For example, your
> application holds some transactions open briefly while it does some
> cogitation on the application-side, rather than immediately committing and
> so returning the connection to the connection pool.  Or your server has a
> very high IO capacity and benefits from lots of read requests in the queue
> at the same time, so it can keep every spindle busy and every rotation
> productive.  But, if you have no reason to believe that any of those
> situations apply to you, but do have evidence that you have lock contention
> between processes, then I think that limiting the number active processes
> to the number of cores is a good starting point.
>
> Cheers,
>
> Jeff
>


-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com

Reply via email to