[PERFORM] Process 11812 still waiting for ExclusiveLock on extension of relation

2012-07-17 Thread David Kerr
Howdy,

I've got a couple of tables that are taking a little longer than normal to 
extend, resulting 
in some slow inserts.

They're fairly large tables, ~200GB pg_total_relation_size (90GB for just the 
table)

I suspect that this is related to a sustained heavy load that would stop 
autovacuum from
getting at this table... Does that sound plausible? 

I'm wondering what options I have to smooth over these episodes / speed up the 
extensions.
I'm thinking of something like, CLUSTER or VACUUM FULL (those take quite a run 
so I'd like 
some direction on it before i TiaS =) )

I suspect that Partitioning would help.  Any other ideas?


Jul 17 08:11:52 perf: [3-1] user=test,db=perf LOG:  process 11812 still waiting 
for ExclusiveLock 
on extension of relation 60777 of database 16387 after 1000.270 ms

System resouces were fine:

PGDATA
--
07/17/12 08:11:48
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
dm-2  1.20  3085.20   77.20 3994.20 15363.20 56680.0017.69
15.573.82   0.06  26.22

07/17/12 08:11:53
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
dm-2  0.40  2097.20   51.80 2610.20 10344.00 37659.2018.03 
5.231.96   0.05  14.28


PGXLOG
--
07/17/12 08:11:48
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
dm-4  0.00  3958.200.00  600.40 0.00 36449.6060.71 
0.440.74   0.73  43.54

07/17/12 08:11:53
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
dm-4  0.00  2905.200.00  403.40 0.00 26492.8065.67 
0.320.80   0.79  31.96

CPU
--
CPU %user %nice   %system   %iowait%steal %idle
08:11:48all 24.49  0.00  3.19  1.17  0.00 71.15
08:11:53all 17.53  0.00  3.13  0.68  0.00 78.65


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very very slow inserts into very large table

2012-07-17 Thread Ants Aasma
On Tue, Jul 17, 2012 at 6:30 AM, Craig Ringer  wrote:
> On 07/17/2012 01:56 AM, Jon Nelson wrote:
> To perform reasonably well, Pg would need to be able to defer index updates
> when bulk-loading data in a single statement (or even transaction), then
> apply them when the statement finished or transaction committed. Doing this
> at a transaction level would mean you'd need a way to mark indexes as
> 'lazily updated' and have Pg avoid using them once they'd been dirtied
> within a transaction. No such support currently exists, and it'd be
> non-trivial to implement, especially since people loading huge amounts of
> data often want to do it with multiple concurrent sessions. You'd need some
> kind of 'DISABLE INDEX' and 'ENABLE INDEX' commands plus a transactional
> backing table of pending index updates.

It seems to me that if the insertion is done as a single statement it
wouldn't be a problem to collect up all btree insertions and apply
them before completing the statement. I'm not sure how much that would
help though. If the new rows have uniform distribution you end up
reading in the whole index anyway. Because indexes are not stored in
logical order you don't get to benefit from sequential I/O.

The lazy merging approach (the paper that Claudio linked) on the other
hand seems promising but a lot trickier to implement.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very very slow inserts into very large table

2012-07-17 Thread Jeff Janes
On Tue, Jul 17, 2012 at 8:59 AM, Ants Aasma  wrote:
> On Tue, Jul 17, 2012 at 6:30 AM, Craig Ringer  wrote:
>> On 07/17/2012 01:56 AM, Jon Nelson wrote:
>> To perform reasonably well, Pg would need to be able to defer index updates
>> when bulk-loading data in a single statement (or even transaction), then
>> apply them when the statement finished or transaction committed. Doing this
>> at a transaction level would mean you'd need a way to mark indexes as
>> 'lazily updated' and have Pg avoid using them once they'd been dirtied
>> within a transaction. No such support currently exists, and it'd be
>> non-trivial to implement, especially since people loading huge amounts of
>> data often want to do it with multiple concurrent sessions. You'd need some
>> kind of 'DISABLE INDEX' and 'ENABLE INDEX' commands plus a transactional
>> backing table of pending index updates.
>
> It seems to me that if the insertion is done as a single statement it
> wouldn't be a problem to collect up all btree insertions and apply
> them before completing the statement. I'm not sure how much that would
> help though. If the new rows have uniform distribution you end up
> reading in the whole index anyway. Because indexes are not stored in
> logical order you don't get to benefit from sequential I/O.

In this case, he is loading new data that is 5% of the current data
size.  A leaf page probably has much more than 20 entries, so by
sorting them you could turn many scattered accesses to the same page
to one access (or many accesses that immediately follow each other,
and so are satisfied by the cache).

Also, while indexes are not formally kept in logical order, but they
do tend to be biased in that direction in most cases.  I've found that
even if you are only inserting one row for every 4 or 5 leaf pages,
you still get substantial improvement by doing so in sorted order.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Mike Blackwell
We're seeing slow application performance on a PostgreSQL 9.1 server which
appears to be relatively lightly loaded.  Some graphs from pgstatview are
at  http://www2.uptimeforce.com/pgstatview/e35ba4e7db0842a1b9cf2e10a4c03d91/
 These cover approximately 40 minutes, during which there was some activity
from a web application and two bulk loads in process.

The machine running the bulk loads (perl scripts) is also running at about
70% idle with very little iowait.  That seems to suggest network latency to
me.

Am I missing something in the server stats that would indicate a problem?
 If not, where should I look next?

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *


Re: [PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Scott Marlowe
On Tue, Jul 17, 2012 at 10:27 AM, Mike Blackwell  wrote:
> We're seeing slow application performance on a PostgreSQL 9.1 server which
> appears to be relatively lightly loaded.  Some graphs from pgstatview are at
> http://www2.uptimeforce.com/pgstatview/e35ba4e7db0842a1b9cf2e10a4c03d91/
> These cover approximately 40 minutes, during which there was some activity
> from a web application and two bulk loads in process.
>
> The machine running the bulk loads (perl scripts) is also running at about
> 70% idle with very little iowait.  That seems to suggest network latency to
> me.
>
> Am I missing something in the server stats that would indicate a problem?
> If not, where should I look next?

I'd run vmstat and look for high cs or int numbers (100k and above) to
see if you're maybe seeing an issue with that.  A lot of times a
"slow" server is just too much process switching.  But yeah, the
graphs you've posted don't seem overly bad.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very very slow inserts into very large table

2012-07-17 Thread Claudio Freire
On Tue, Jul 17, 2012 at 1:24 PM, Jeff Janes  wrote:
> Also, while indexes are not formally kept in logical order, but they
> do tend to be biased in that direction in most cases.  I've found that
> even if you are only inserting one row for every 4 or 5 leaf pages,
> you still get substantial improvement by doing so in sorted order.

Yep, I do the same. Whenever I have to perform massive updates, I sort them.

Although "massive" for me is nowhere near what "massive" for the OP is.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Mike Blackwell
On Tue, Jul 17, 2012 at 11:35 AM, Scott Marlowe 
 wrote:

I'd run vmstat and look for high cs or int numbers (100k and above) to
> see if you're maybe seeing an issue with that.  A lot of times a
> "slow" server is just too much process switching.  But yeah, the
> graphs you've posted don't seem overly bad.
>


Thanks for the tip.  Here's a quick look at those numbers under that same
load.  Watching it for a while longer didn't show any spikes.  That doesn't
seem to be it, either.

$ vmstat 5
procs ---memory-- ---swap-- -io --system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id
wa
 3  0  11868  34500  16048 393143600 4 20 0  6  2
91  1
 2  0  11868  21964  16088 393139600 0   212 8667  8408 15  3
80  2
 0  0  11868  37772  16112 393215200 2   249 9109  8811 34  2
62  1
 2  0  11868  34068  16124 393240000 1   168 9142  9165 12  3
84  1
 1  0  11868  38036  16124 393292000 8   155 9995 10904 16  4
80  1
 1  0  11868  40212  16124 393344000 0   146 9586  9825 13  3
83  1

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com





Re: [PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Scott Marlowe
On Tue, Jul 17, 2012 at 11:37 AM, Mike Blackwell  wrote:
>
> On Tue, Jul 17, 2012 at 11:35 AM, Scott Marlowe 
> wrote:
>
>> I'd run vmstat and look for high cs or int numbers (100k and above) to
>> see if you're maybe seeing an issue with that.  A lot of times a
>> "slow" server is just too much process switching.  But yeah, the
>> graphs you've posted don't seem overly bad.
>
>
>
> Thanks for the tip.  Here's a quick look at those numbers under that same
> load.  Watching it for a while longer didn't show any spikes.  That doesn't
> seem to be it, either.

Yep it all looks good to me.  Are you sure you're not getting network
lag or something like that?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Mike Blackwell
I'm wondering about that.  However, the database server and the server
doing the bulk loads are on the same subnet.  Traceroute shows only a
single hop.  Traceroute and ping both show reply times in the area of .25 -
.50 ms or so.  Is that reasonable?

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *


On Tue, Jul 17, 2012 at 12:49 PM, Scott Marlowe wrote:

> On Tue, Jul 17, 2012 at 11:37 AM, Mike Blackwell 
> wrote:
> >
> > On Tue, Jul 17, 2012 at 11:35 AM, Scott Marlowe  >
> > wrote:
> >
> >> I'd run vmstat and look for high cs or int numbers (100k and above) to
> >> see if you're maybe seeing an issue with that.  A lot of times a
> >> "slow" server is just too much process switching.  But yeah, the
> >> graphs you've posted don't seem overly bad.
> >
> >
> >
> > Thanks for the tip.  Here's a quick look at those numbers under that same
> > load.  Watching it for a while longer didn't show any spikes.  That
> doesn't
> > seem to be it, either.
>
> Yep it all looks good to me.  Are you sure you're not getting network
> lag or something like that?
>


Re: [PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Scott Marlowe
Yeah seems reasonable.  The last thing I'd look at is something like
improperly configured dns service.  Are you connecting by IP or by
host name?

On Tue, Jul 17, 2012 at 12:10 PM, Mike Blackwell  wrote:
> I'm wondering about that.  However, the database server and the server doing
> the bulk loads are on the same subnet.  Traceroute shows only a single hop.
> Traceroute and ping both show reply times in the area of .25 - .50 ms or so.
> Is that reasonable?
>
> __
> Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management
> | RR Donnelley
> 1750 Wallace Ave | St Charles, IL 60174-3401
> Office: 630.313.7818
> mike.blackw...@rrd.com
> http://www.rrdonnelley.com
>
>
>
>
>
> On Tue, Jul 17, 2012 at 12:49 PM, Scott Marlowe 
> wrote:
>>
>> On Tue, Jul 17, 2012 at 11:37 AM, Mike Blackwell 
>> wrote:
>> >
>> > On Tue, Jul 17, 2012 at 11:35 AM, Scott Marlowe
>> > 
>> > wrote:
>> >
>> >> I'd run vmstat and look for high cs or int numbers (100k and above) to
>> >> see if you're maybe seeing an issue with that.  A lot of times a
>> >> "slow" server is just too much process switching.  But yeah, the
>> >> graphs you've posted don't seem overly bad.
>> >
>> >
>> >
>> > Thanks for the tip.  Here's a quick look at those numbers under that
>> > same
>> > load.  Watching it for a while longer didn't show any spikes.  That
>> > doesn't
>> > seem to be it, either.
>>
>> Yep it all looks good to me.  Are you sure you're not getting network
>> lag or something like that?
>
>



-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Mike Blackwell
On Tue, Jul 17, 2012 at 2:36 PM, Scott Marlowe 
 wrote:


> Yeah seems reasonable.  The last thing I'd look at is something like
> improperly configured dns service.  Are you connecting by IP or by
> host name?
>
>
Interesting possibility.  We're currently connecting by host name.  I could
try temporarily using the IP from one of the servers to see if that helps.
 I'm not familiar enough with DNS services to do any diagnostics other than
using dig to see where something points.

Thanks for your help, BTW!

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com


Re: [PERFORM] Slow application response on lightly loaded server?

2012-07-17 Thread Scott Marlowe
Well if it suddenly gets faster when connecting by IP, you'll know
where your problem lies.  DNS issues are more common in windows
installs, due to Windows having more interesting ways to misconfigure
dns etc.

On Tue, Jul 17, 2012 at 1:48 PM, Mike Blackwell  wrote:
> On Tue, Jul 17, 2012 at 2:36 PM, Scott Marlowe 
> wrote:
>
>
>>
>> Yeah seems reasonable.  The last thing I'd look at is something like
>> improperly configured dns service.  Are you connecting by IP or by
>> host name?
>>
>
> Interesting possibility.  We're currently connecting by host name.  I could
> try temporarily using the IP from one of the servers to see if that helps.
> I'm not familiar enough with DNS services to do any diagnostics other than
> using dig to see where something points.
>
> Thanks for your help, BTW!
>
> __
> Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management
> | RR Donnelley
> 1750 Wallace Ave | St Charles, IL 60174-3401
> Office: 630.313.7818
> mike.blackw...@rrd.com
> http://www.rrdonnelley.com
>
>



-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Linux memory zone reclaim

2012-07-17 Thread Greg Smith
Newer Linux systems with lots of cores have a problem I've been running 
into a lot more lately I wanted to share initial notes on.  My "newer" 
means running the 2.6.32 kernel or later, since I mostly track 
"enterprise" Linux distributions like RHEL6 and Debian Squeeze.  The 
issue is around Linux's zone_reclaim feature.  When it pops up, turning 
that feature off help a lot.  Details on what I understand of the 
problem are below, and as always things may have changed already in even 
newer kernels.


zone_reclaim tries to optimize memory speed on NUMA systems with more 
than one CPU socket.  There some banks of memory that can be "closer" to 
a particular socket, as measured by transfer rate, because of how the 
memory is routed to the various cores on each socket.  There is no true 
default for this setting.  Linux checks the hardware and turns this 
on/off based on what transfer rate it sees between NUMA nodes, where 
there are more than one and its test shows some distance between them.  
You can tell if this is turned on like this:


echo /proc/sys/vm/zone_reclaim_mode

Where 1 means it's enabled.  Install the numactl utility and you can see 
why it's made that decision:


# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 73718 MB
node 0 free: 419 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 73728 MB
node 1 free: 30 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

Note how the "distance" for a transfer from node 0->0 or 1->1 is 10 
units, while 0->1 or 1->0 is 21.  That what's tested at boot time, where 
the benchmarked speed is turned into this abstract distance number.  And 
if there is a large difference in cross-zone timing, then zone reclaim 
is enabled.


Scott Marlowe has been griping about this on the mailing lists here for 
a while now, and it's increasingly trouble for systems I've been seeing 
lately too.  This is a well known problem with MySQL:  
http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ 
and NUMA issues have impacted Oracle too.  On PostgreSQL shared_buffers 
isn't normally set as high as MySQL's buffer cache, making it a bit less 
vulnerable to this class of problem.  But it's surely still a big 
problem for PostgreSQL on some systems.


I've taken to disabling /proc/sys/vm/zone_reclaim_mode on any Linux 
system where it's turned on now.  I'm still working through whether it 
also makes sense in all cases to use the more complicated memory 
interleaving suggestions that MySQL users have implemented, something 
most people would need to push into their PostgreSQL server started up 
scripts in /etc/init.d  (That will be a fun rpm/deb packaging issue to 
deal with if this becomes more wide-spread)  Suggestions on whether that 
is necessary, or if just disabling zone_reclaim is enough, are welcome 
from anyone who wants to try and benchmark it.


Note that this is all tricky to test because some of the bad behavior 
only happens when the server runs this zone reclaim method, which isn't 
a trivial situation to create at will.  Servers that have this problem 
tend to have it pop up intermittently, you'll see one incredibly slow 
query periodically while most are fast.  All depends on exactly what 
core is executing, where the memory it needs is at, and whether the 
server wants to reclaim memory (and just what that means its own 
complicated topic) as part of that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux memory zone reclaim

2012-07-17 Thread Scott Marlowe
On Tue, Jul 17, 2012 at 7:52 PM, Greg Smith  wrote:
> Newer Linux systems with lots of cores have a problem I've been running into
> a lot more lately I wanted to share initial notes on.  My "newer" means
> running the 2.6.32 kernel or later, since I mostly track "enterprise" Linux
> distributions like RHEL6 and Debian Squeeze.  The issue is around Linux's
> zone_reclaim feature.  When it pops up, turning that feature off help a lot.
> Details on what I understand of the problem are below, and as always things
> may have changed already in even newer kernels.

SNIP

> Scott Marlowe has been griping about this on the mailing lists here for a
> while now, and it's increasingly trouble for systems I've been seeing lately
> too.  This is a well known problem with MySQL:
> http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/

Thanks for the link, I'll read up on it.  I do have access to large
(24 to 40 core) NUMA machines so I might try some benchmarking on them
to see how they work.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux memory zone reclaim

2012-07-17 Thread Dave Crooke
On the larger, cellular Itanium systems with multiple motherboards (rx6600
to Superdome) Oracle has done a lot of tuning with the HP-UX kernel calls
to optimize for NUMA issues. Will be interesting to see what they bring to
Linux.
On Jul 17, 2012 9:01 PM, "Scott Marlowe"  wrote:

> On Tue, Jul 17, 2012 at 7:52 PM, Greg Smith  wrote:
> > Newer Linux systems with lots of cores have a problem I've been running
> into
> > a lot more lately I wanted to share initial notes on.  My "newer" means
> > running the 2.6.32 kernel or later, since I mostly track "enterprise"
> Linux
> > distributions like RHEL6 and Debian Squeeze.  The issue is around Linux's
> > zone_reclaim feature.  When it pops up, turning that feature off help a
> lot.
> > Details on what I understand of the problem are below, and as always
> things
> > may have changed already in even newer kernels.
>
> SNIP
>
> > Scott Marlowe has been griping about this on the mailing lists here for a
> > while now, and it's increasingly trouble for systems I've been seeing
> lately
> > too.  This is a well known problem with MySQL:
> >
> http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
>
> Thanks for the link, I'll read up on it.  I do have access to large
> (24 to 40 core) NUMA machines so I might try some benchmarking on them
> to see how they work.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Linux memory zone reclaim

2012-07-17 Thread Claudio Freire
On Tue, Jul 17, 2012 at 11:00 PM, Scott Marlowe  wrote:
>
> Thanks for the link, I'll read up on it.  I do have access to large
> (24 to 40 core) NUMA machines so I might try some benchmarking on them
> to see how they work.

It must have been said already, but I'll repeat it just in case:

I think postgres has an easy solution. Spawn the postmaster with
"interleave", to allocate shared memory, and then switch to "local" on
the backends.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance