Re: [GENERAL] How to analyze a slowdown in 9.3.5?

Michael Nolan Sat, 10 Jan 2015 13:01:05 -0800

The function is a complicated plpgsql function that makes numerous database
queries, all read-only. (Other parts of that program may make changes to
the database.)


The first database shutdown and the shutdown/reboot later on were both
'clean' shutdowns, so there shouldn't have been any kind of transaction
rollback.

I has sar running on that server, if that provides any useful data.  Mostly
I'm just trying to make up a list of what to look for and what to log in
case it happens again.  (It runs again on Tuesday, and I already know it
will be going back to review 2004 data so it'll be an even longer run than
this week's was.)
--
Mike Nolan

On Sat, Jan 10, 2015 at 12:55 PM, Andy Colson <a...@squeakycode.net> wrote:

> On 01/09/2015 07:52 PM, Tomas Vondra wrote:
>
>> On 9.1.2015 23:14, Michael Nolan wrote:
>>
>>> I'm running 9.3.5 on a virtual machine with 5 cores and 24 GB of
>>> memory.  Disk is on a SAN.
>>>
>>> I have a task that runs weekly that processes possibly as many as
>>> 120 months worth of data, one month at a time. Since moving to 9.3.5
>>> (from 8.2!!) the average time for a month has been 3 minutes or less.
>>>
>>
>> Congrats to migrating to a supported version!
>>
>> Please, comparison of the configuration used on 8.2 and 9.3.5 would be
>> helpful (i.e. how you've updated the config on the new version?).
>>
>>  However, when this job ran this Tuesday, it ran fine for a number of
>>> months, but then started slowing down dramatically, 300 minutes for
>>> one month and then 167 minutes for the next. I stopped and restarted
>>> postgresql, the next block also ran really slow (157 minutes.) I
>>> then rebooted the server and the remaining blocks ran at the usual
>>> fast speed again, so restarting postgresql didn't fix the problem
>>> but rebooting the server did.
>>>
>>
>> What amounts of data are we talking about? Gigabytes? Tens of gigabytes?
>>
>>
>>> Looking at the logs, I see queries with a function call that would
>>> normally take no more than 100-200 milliseconds, usually far less,
>>> that were taking 100 seconds or longer. This function gets called
>>> thousands of times for each month, so that appears to be one source
>>> of the slowdown.
>>>
>>
>> But why are the functions taking so much longer? Are they eating CPU,
>> I/O or are generally waiting for something (e.g. locks)?
>>
>>
>>> I don't suspect a memory leak in the calling program (in php),
>>> because since moving to this server in December this weekly task has
>>> run several times over the same range of months, making pretty much
>>> the same function calls each time. I also ran the entire range
>>> several times during testing.
>>>
>>> One change made to the server since the previous week's run was that
>>> I moved up to the latest Centos kernel (Linux version
>>> 3.10.0-123.13.2.el7.x86_64).
>>>
>>
>> And what was the previous kernel version?
>>
>> However, if it worked fine after rebooting the server, it may not be a
>> kernel issue (unless it somehow depends on uptime). Is there something
>> in the /var/log/messages?
>>
>>
> At first, I was thinking, lots of activity within one transaction was
> messing up the stats and the planner started getting it wrong.  But a
> reboot wouldn't fix that.  Would it?  What if the reboot rolled back the
> db, would that stats make sense again?
>
> I have a process that makes a big temp table (with indexes).  After its
> built if I dont run a quick analyze on it the planner never uses the
> indexes right.
>
> Another thing I can think of is never commiting.  If it started collecting
> lots and lots of row versions it could get slower and slower.  But, then,
> you'd see the same thing on 8.2, so, that's probably not it.  Do you have
> any "Idle in transaction" connections?
>
> -Andy
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] How to analyze a slowdown in 9.3.5?

Reply via email to