Re: [HACKERS] emergency outage requiring database restart

Oskari Saarenmaa Mon, 31 Oct 2016 08:32:45 -0700

27.10.2016, 21:53, Merlin Moncure kirjoitti:

As noted earlier, I was not able to reproduce the issue with
crashme.sh, which was:


NUM_FORKS=16
do_parallel psql -p 5432  -c"select PushMarketSample('1740')" castaging_test
do_parallel psql -p 5432  -c"select PushMarketSample('4400')" castaging_test
do_parallel psql -p 5432  -c"select PushMarketSample('2160')" castaging_test
do_parallel psql -p 5432  -c"select PushMarketSample('6680')" castaging_test
<snip>

(do_parallel is simple wrapper to executing the command in parallel up
to NUM_FORKS).   This is on the same server and cluster as above.
This kind of suggests that either
A) there is some concurrent activity from another process that is
tripping the issue
or
B) there is something particular to the session invoking the function
that is participating in the problem.  As the application is
structured, a single threaded node.js app is issuing the query that is
high traffic and long lived.  It's still running in fact and I'm kind
of tempted to find some downtime to see if I can still reproduce via
the UI.

Your production system's postgres backends probably have a lot more openfiles associated with them than the simple test case does. SincePostgres likes to keep files open as long as possible and only closesthem when you need to free up fds to open new files, it's possible thatyour production backends have almost all allowed fds used when youexecute your pl/sh function.

If that's the case, the sqsh process that's executed may not have enoughfds to do what it wanted to do and because of busted error handlingcould end up writing to fds that were opened by Postgres and point to$PGDATA files.


/ Oskari


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

Reply via email to