On Fri, Jan 10, 2014 at 12:19 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Preston Hagar <prest...@gmail.com> writes: >> >>> tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors >> >>> despite the server now having 32 GB instead of 4 GB of RAM and the >> workload >> >>> and number of clients remaining the same. >> >> > Here are a couple of examples from the incident we had this morning: >> > 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for >> > connection: Cannot allocate memory >> > 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for >> > connection: Cannot allocate memory >> >> That's odd ... ENOMEM from fork() suggests that you're under system-wide >> memory pressure. >> >> > [ memory map dump showing no remarkable use of memory at all ] >> > 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production >> > 10.1.1.6(36680)ERROR: out of memory >> > 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production >> > 10.1.1.6(36680)DETAIL: Failed on request of size 500. >> >> I think that what you've got here isn't really a Postgres issue, but >> a system-level configuration issue: the kernel is being unreasonably >> stingy about giving out memory, and it's not clear why. >> >> It might be worth double-checking that the postmaster is not being >> started under restrictive ulimit settings; though offhand I don't >> see how that theory could account for fork-time failures, since >> the ulimit memory limits are per-process. >> >> Other than that, you need to burrow around in the kernel settings >> and see if you can find something there that's limiting how much >> memory it will give to Postgres. It might also be worth watching >> the kernel log when one of these problems starts. Plain old "top" >> might also be informative as to how much memory is being used. >> > Thanks for the response. I think it might have been the lack of a swapfile (I replied as such in another response)
> That said, we have been using this site as a guide to try to figure things > out about postgres and memory: > > http://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/ > > we came up with the following for all our current processes (we aren't out > of memory and new connections are being accepted right now, but memory > seems low) > > 1. List of RSS usage for all postgres processes: > > http://pastebin.com/J7vy846k > > 2. List of all memory segments for postgres checkpoint process (pid 30178) > > grep -B1 -E '^Size: *[0-9]{6}' /proc/30178/smaps > 7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473 > /dev/zero (deleted) > Size: 8067312 kB > > 3. Info on largest memory allocation for postgres checkpoint process. It > is using 5GB of RAM privately. > > cat /proc/30178/smaps | grep 7f208acec000 -B 0 -A 20 > > Total RSS: 11481148 > 7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473 > /dev/zero (deleted) > Size: 8067312 kB > Rss: 5565828 kB > Pss: 5284432 kB > Shared_Clean: 0 kB > Shared_Dirty: 428840 kB > Private_Clean: 0 kB > Private_Dirty: 5136988 kB > Referenced: 5559624 kB > Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize: 4 kB > MMUPageSize: 4 kB > Locked: 0 kB > 7f2277328000-7f22775f1000 r--p 00000000 09:00 2889301 > /usr/lib/locale/locale-archive > Size: 2852 kB > Rss: 8 kB > Pss: 0 kB > Shared_Clean: 8 kB > Shared_Dirty: 0 kB > > If I am understanding all this correctly, the postgres checkpoint process > has around 5GB of RAM "Private_Dirty" allocated (not shared buffers). Is > this normal? Any thoughts as to why this would get so high? > > I'm still trying to dig in further to figure out exactly. We are running > on Ubuntu 12.04.3 (Kernel 3.5.0-44). We set vm.overcommit_memory = 2 but > didn't have a swap partition we have since added one and are seeing if that > helps. > > > >> >> > >> We had originally copied our shared_buffers, work_mem, wal_buffers and >> >> other similar settings from our old config, but after getting the >> memory >> >> errors have tweaked them to the following: >> > >> > shared_buffers = 7680MB >> > temp_buffers = 12MB >> > max_prepared_transactions = 0 >> > work_mem = 80MB >> > maintenance_work_mem = 1GB >> > wal_buffers = 8MB >> > max_connections = 350 >> >> That seems like a dangerously large work_mem for so many connections; >> but unless all the connections were executing complex queries, which >> doesn't sound to be the case, that isn't the immediate problem. >> >> > Thanks for the heads up. We had come about the value originally using > pgtune and I think 250 connections and I forgot to lower work_mem when I > upped the connections. I now have it set to 45 MB, does that seem more > reasonable? > > > > >> >> The weird thing is that our old server had 1/8th the RAM, was set to >> >> max_connections = 600 and had the same clients connecting in the same >> way >> >> to the same databases and we never saw any errors like this in the >> several >> >> years we have been using it. >> >> This reinforces the impression that something's misconfigured at the >> kernel level on the new server. >> >> regards, tom lane >> > > Forgot to copy the list on the reply, so I am here. > Thanks for your help and time. > > Preston >