On Fri, Jan 10, 2014 at 12:19 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> Preston Hagar <prest...@gmail.com> writes:
>> >>> tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors
>> >>> despite the server now having 32 GB instead of 4 GB of RAM and the
>> workload
>> >>> and number of clients remaining the same.
>>
>> > Here are a couple of examples from the incident we had this morning:
>> > 2014-01-10 06:14:40 CST  30176    LOG:  could not fork new process for
>> > connection: Cannot allocate memory
>> > 2014-01-10 06:14:40 CST  30176    LOG:  could not fork new process for
>> > connection: Cannot allocate memory
>>
>> That's odd ... ENOMEM from fork() suggests that you're under system-wide
>> memory pressure.
>>
>> > [ memory map dump showing no remarkable use of memory at all ]
>> > 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
>> >  10.1.1.6(36680)ERROR:  out of memory
>> > 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
>> >  10.1.1.6(36680)DETAIL:  Failed on request of size 500.
>>
>> I think that what you've got here isn't really a Postgres issue, but
>> a system-level configuration issue: the kernel is being unreasonably
>> stingy about giving out memory, and it's not clear why.
>>
>> It might be worth double-checking that the postmaster is not being
>> started under restrictive ulimit settings; though offhand I don't
>> see how that theory could account for fork-time failures, since
>> the ulimit memory limits are per-process.
>>
>> Other than that, you need to burrow around in the kernel settings
>> and see if you can find something there that's limiting how much
>> memory it will give to Postgres.  It might also be worth watching
>> the kernel log when one of these problems starts.  Plain old "top"
>> might also be informative as to how much memory is being used.
>>
>
   Thanks for the response.  I think it might have been the lack of a
swapfile (I replied as such in another response)


> That said, we have been using this site as a guide to try to figure things
> out about postgres and memory:
>
> http://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/
>
> we came up with the following for all our current processes (we aren't out
> of memory and new connections are being accepted right now, but memory
> seems low)
>
> 1. List of RSS usage for all postgres processes:
>
> http://pastebin.com/J7vy846k
>
> 2. List of all memory segments for postgres checkpoint process (pid 30178)
>
> grep -B1 -E '^Size: *[0-9]{6}' /proc/30178/smaps
> 7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473
> /dev/zero (deleted)
> Size:            8067312 kB
>
> 3. Info on largest memory allocation for postgres checkpoint process. It
> is using 5GB of RAM privately.
>
> cat /proc/30178/smaps | grep 7f208acec000 -B 0 -A 20
>
> Total RSS: 11481148
> 7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473
> /dev/zero (deleted)
> Size:            8067312 kB
> Rss:             5565828 kB
> Pss:             5284432 kB
> Shared_Clean:          0 kB
> Shared_Dirty:     428840 kB
> Private_Clean:         0 kB
> Private_Dirty:   5136988 kB
> Referenced:      5559624 kB
> Anonymous:             0 kB
> AnonHugePages:         0 kB
> Swap:                  0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> 7f2277328000-7f22775f1000 r--p 00000000 09:00 2889301
>  /usr/lib/locale/locale-archive
> Size:               2852 kB
> Rss:                   8 kB
> Pss:                   0 kB
> Shared_Clean:          8 kB
> Shared_Dirty:          0 kB
>
> If I am understanding all this correctly, the postgres checkpoint process
> has around 5GB of RAM "Private_Dirty" allocated (not shared buffers).  Is
> this normal?  Any thoughts as to why this would get so high?
>
> I'm still trying to dig in further to figure out exactly.  We are running
> on Ubuntu 12.04.3 (Kernel 3.5.0-44).  We set vm.overcommit_memory = 2 but
> didn't have a swap partition we have since added one and are seeing if that
> helps.
>
>
>
>>
>>
>  >> We had originally copied our shared_buffers, work_mem, wal_buffers and
>> >> other similar settings from our old config, but after getting the
>> memory
>> >> errors have tweaked them to the following:
>> >
>> > shared_buffers            = 7680MB
>> > temp_buffers              = 12MB
>> > max_prepared_transactions = 0
>> > work_mem                  = 80MB
>> > maintenance_work_mem      = 1GB
>> > wal_buffers = 8MB
>> > max_connections = 350
>>
>> That seems like a dangerously large work_mem for so many connections;
>> but unless all the connections were executing complex queries, which
>> doesn't sound to be the case, that isn't the immediate problem.
>>
>>
> Thanks for the heads up.  We had come about the value originally using
> pgtune and I think 250 connections and I forgot to lower work_mem when I
> upped the connections.  I now have it set to 45 MB, does that seem more
> reasonable?
>
>
>
>
>> >> The weird thing is that our old server had 1/8th the RAM, was set to
>> >> max_connections = 600 and had the same clients connecting in the same
>> way
>> >> to the same databases and we never saw any errors like this in the
>> several
>> >> years we have been using it.
>>
>> This reinforces the impression that something's misconfigured at the
>> kernel level on the new server.
>>
>>                         regards, tom lane
>>
>
>
Forgot to copy the list on the reply, so I am here.



> Thanks for your help and time.
>
> Preston
>

Reply via email to