Hi Steve, Did you check if your OS allows you to limit memory usage per user / process basis?
Did you either profiled or debugged it to see what is going on? When memory consumption grows , Doesn't ever swap out? Thanks, A.A. On Tue, Apr 8, 2014 at 10:48 AM, Steve Kehlet <steve.keh...@gmail.com>wrote: > I'm running into some scenarios where errant postgres queries are > consuming all memory and bringing down production boxes. I'm running > Postgres 9.3.4 on CentOS 5.x VMs under XenServer. My custom config is here: > https://gist.github.com/skehlet/9984666 > > The incidents may be related to reports of an array_agg memory leak, and > another recent incident seems like it was due to the xpath() function > provided by --with-xml. There's some more information on the former in my > post at: > http://dba.stackexchange.com/questions/62587/why-does-this-query-cause-the-postgres-worker-process-to-consume-so-much-memory, > if it's interesting. I'd like to dig into that further, maybe on another > thread. > > But for now, the devs are asking me for some way to put a cap on a > postgres query's total memory consumption. I'm familiar with the available > settings on the "Resource Consumption" docs (and you can see my settings in > my gist above, including work_mem turned way down to 1MB for testing), but > it seems like there are things like Materialize that remain uncappable, > since they're not constrained by the shared_buffers and work_mem limits. > For example in my post to dba.stackexchange above, I found by doing a "set > enable_material=false;" I could stop a particularly nasty query from > consuming all memory, narrowing the problem somewhat, but that query is one > we have to live with in the near term. > > With all that laid out, can anyone help with the following questions: > > - Is there any way to set to total memory cap on a worker processes memory > consumption? > - Are there other (even undocumented) settings to cap memory usage? > - Other ideas? > > In the meantime, to stop the bleeding I'm looking into using the postgres > feature to coerce the OOM killer to kill rogue child processes first (- > DLINUX_OOM_ADJ). Unfortunately, another setback, even the latest kernel > on CentOS 5.x doesn't allow non-root processes to write > to /proc/self/oom_adj, so it isn't working. So I'll either need to patch > the CentOS kernel and rebuild, move to a newer kernel, or maybe do > something hacky like run a background process to reassign the oom_adj value > for all postgres workers found. > > Thanks for any help/ideas! >