The function is a complicated plpgsql function that makes numerous database queries, all read-only. (Other parts of that program may make changes to the database.)
The first database shutdown and the shutdown/reboot later on were both 'clean' shutdowns, so there shouldn't have been any kind of transaction rollback. I has sar running on that server, if that provides any useful data. Mostly I'm just trying to make up a list of what to look for and what to log in case it happens again. (It runs again on Tuesday, and I already know it will be going back to review 2004 data so it'll be an even longer run than this week's was.) -- Mike Nolan On Sat, Jan 10, 2015 at 12:55 PM, Andy Colson <a...@squeakycode.net> wrote: > On 01/09/2015 07:52 PM, Tomas Vondra wrote: > >> On 9.1.2015 23:14, Michael Nolan wrote: >> >>> I'm running 9.3.5 on a virtual machine with 5 cores and 24 GB of >>> memory. Disk is on a SAN. >>> >>> I have a task that runs weekly that processes possibly as many as >>> 120 months worth of data, one month at a time. Since moving to 9.3.5 >>> (from 8.2!!) the average time for a month has been 3 minutes or less. >>> >> >> Congrats to migrating to a supported version! >> >> Please, comparison of the configuration used on 8.2 and 9.3.5 would be >> helpful (i.e. how you've updated the config on the new version?). >> >> However, when this job ran this Tuesday, it ran fine for a number of >>> months, but then started slowing down dramatically, 300 minutes for >>> one month and then 167 minutes for the next. I stopped and restarted >>> postgresql, the next block also ran really slow (157 minutes.) I >>> then rebooted the server and the remaining blocks ran at the usual >>> fast speed again, so restarting postgresql didn't fix the problem >>> but rebooting the server did. >>> >> >> What amounts of data are we talking about? Gigabytes? Tens of gigabytes? >> >> >>> Looking at the logs, I see queries with a function call that would >>> normally take no more than 100-200 milliseconds, usually far less, >>> that were taking 100 seconds or longer. This function gets called >>> thousands of times for each month, so that appears to be one source >>> of the slowdown. >>> >> >> But why are the functions taking so much longer? Are they eating CPU, >> I/O or are generally waiting for something (e.g. locks)? >> >> >>> I don't suspect a memory leak in the calling program (in php), >>> because since moving to this server in December this weekly task has >>> run several times over the same range of months, making pretty much >>> the same function calls each time. I also ran the entire range >>> several times during testing. >>> >>> One change made to the server since the previous week's run was that >>> I moved up to the latest Centos kernel (Linux version >>> 3.10.0-123.13.2.el7.x86_64). >>> >> >> And what was the previous kernel version? >> >> However, if it worked fine after rebooting the server, it may not be a >> kernel issue (unless it somehow depends on uptime). Is there something >> in the /var/log/messages? >> >> > At first, I was thinking, lots of activity within one transaction was > messing up the stats and the planner started getting it wrong. But a > reboot wouldn't fix that. Would it? What if the reboot rolled back the > db, would that stats make sense again? > > I have a process that makes a big temp table (with indexes). After its > built if I dont run a quick analyze on it the planner never uses the > indexes right. > > Another thing I can think of is never commiting. If it started collecting > lots and lots of row versions it could get slower and slower. But, then, > you'd see the same thing on 8.2, so, that's probably not it. Do you have > any "Idle in transaction" connections? > > -Andy > > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >