Our set up:

*         Db version: postgres 9.3.3

*         OS: CentOS 6.5

*         kernel Version - Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 
21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

*         cpu - 24 proc

*         memory - 768 GB

*         The disks are SAN fiber.

*         We have streaming replication set up from this server to the 
secondary server.

*         Some of the DB parameters:
max_connections  - 1500
shared_buffers - 4GB
work_mem = 130MB
 maintenance_work_mem = 1GB

Issue this morning:
At 9:04 - FATAL:  remaining connection slots are reserved for non-replication 
superuser connections
These were the things we noticed and have question on each one of them:
pg_stat_activity showed a total of 850 connections but a 'ps -ef|grep postgers' 
showed 1500 connections and DB wasn't accepting new connections. How is it 
possible to have such discrepancy in count between pg_stat_activity and ps?
200 were in waiting status, 50 were in authentication status, 748 were in IDLE 
status and the rest were all doing some DML operations. Why will 50 connections 
be in authentication status in the database and stay there?
 Some existing sessions were completing the sqls successfully (though not as 
fast). So, the database was not completely hung or unresponsive.
Trying to kill the session at the DB level (pg_terminate_backend) were just 
hanging. Why would this be the case?
During this time, server had a very high load average of 130 (normally we have 
< 5).
We had to restart the database to clear it all up.
 Note:
We are having this issue of max_connections being reached for a couple of 
months now. At one point, we tried increasing the max_connections to 2500 to 
see if we will still run out of connections, and we did.
We noticed that generic-segmentation-offload was set to ON on this server. We 
changed it to OFF.

Recent Changes in the environment:

*         We turned off THP on the server a couple of months ago to see if that 
will fix the issue.

*         We recently (2 weeks ago) moved from RedHat ext 2 to xfs because we 
were having the same problem.

Please let us know if you need more information on our environment or on the 
issue this morning. Any help will be greatly appreciated.

Thank you,
________________
Ramya Vasudevan
Database Administrator
CLASSMATES

Reply via email to