On 09/13/2013 12:55 PM, Merlin Moncure wrote: > what are the specific symptoms of your problem? anything interesting > in pg_locks? is $client willing to experiment with custom patches?
3 servers: 1 master, two replicas. 32-core Xeon, hyperthreaded to 64 cores 512GB RAM each s_b set to 8GB Load-balanced between all 3 ~~ 11 different databases combined database size around 600GB using pgbouncer Irregularly, during periods of high activity (although not necessarily peak activity) one or another of the systems will go into paralysis, with all backends apparently waiting on LWLocks (we need more tracing information to completely confirm this). Activity at this time is usually somewhere between 50 and 100 concurrent queries (and 80 to 150 connections). pg_locks doesn't think anything is waiting on a lock. What's notable is that sometimes it's just *one* of the replicas which goes into paralysis. If the master gets this issue though, the replicas experience it soon afterwards. Increasing wal_buffers from 16GB to 64GB seems to make this issue happen less frequently, but it doesn't go away entirely. Only a restart of the server, or killing all backend, ends the lockup. The workload is OLTP, essentially, around 20/80 write/read. They use PostGIS. The other notable thing about their workload is that due to an ORM defect, they get idle-in-transactions which last from 5 to 15 seconds several times a minute. They are willing to use experimental patches, but only if those patches can be applied only to a replica. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers