[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]

2012-12-06 Thread John Lister



on this box:


in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
(sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
(sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
of 512 MB). (ubuntu 12.04)


on the same machine with the same configuration,
having PGDATA on a xfs formatted partition gives me
a much better TPS.

e.g. pgbench  -c 20 -t 5000 gives me 6305 TPS
(3 runs with "echo 3 > /proc/sys/vm/drop_caches && 
/etc/init.d/postgresql-9.2 restart"

in between).
Hi, I found this interesting as I'm trying to do some benchmarks on my 
box which is very similar to the above but I don't believe the tps is 
any where near what it should be. Is the 6305 figure from xfs? I'm 
assuming that your main data array is just 2 15k sas drives, are you 
putting the WAL on the data array or is that stored somewhere else? Can 
I ask what scaling params, etc you used to build the pgbench tables and 
look at your postgresql.conf file to see if I missed something (offline 
if you wish)


I'm running 8x SSDs in RAID 10 for the data and pull just under 10k on a 
xfs system which is much lower than I'd expect for that setup and isn't 
significantly greater than your reported results, so something must be 
very wrong.


Thanks

John


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Ubuntu 12.04 / 3.2 Kernel Bad for PostgreSQL Performance

2012-12-06 Thread John Lister

On 05/12/2012 18:28, Shaun Thomas wrote:

Hey guys,

This isn't a question, but a kind of summary over a ton of investigation
I've been doing since a recent "upgrade". Anyone else out there with
"big iron" might want to confirm this, but it seems pretty reproducible.
This seems to affect the latest 3.2 mainline and by extension, any
platform using it. My tests are restricted to Ubuntu 12.04, but it may
apply elsewhere.

Very interesting results, I've been trying to benchmark my box but 
getting what I would call poor performance given the setup, but I am 
running 3.2 on ubuntu 12.04. Could I ask what hardware (offline if you 
wish) was used for the results below?

Comparing the latest official 3.2 kernel to the latest official 3.4
kernel (both Ubuntu), there are some rather striking differences. I'll
start with some pgbench tests.

* This test is 800 read-only clients, with 2 controlling threads on a
55GB database (scaling factor of 3600) for 3 minutes.
  * With 3.4:
* Max TPS was 68933.
* CPU was between 50 and 55% idle.
* Load average was between 10 and 15.
  * With 3.2:
* Max TPS was 17583. A total loss of 75% performance.
* CPU was between 12 and 25% idle.
* Load average was between 10 and 60---effectively random.
  * Next, we checked minimal write tests. This time, with only two
clients. All other metrics are the same.
* With 3.4:
  * Max TPS was 4548.
  * CPU was between 88 and 92% idle.
  * Load average was between 1.7 and 2.5.
* With 3.2:
  * Max TPS was 4639.
  * CPU was between 88 and 92% idle.
  * Load average was between 3 and 4.


TIme to see what a 3.4 kernel does to my setup I think?

Thanks

John



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]

2012-12-06 Thread Andrea Suisani

Hi John,

On 12/06/2012 09:29 AM, John Lister wrote:



on this box:


in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
(sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
(sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
of 512 MB). (ubuntu 12.04)


on the same machine with the same configuration,
having PGDATA on a xfs formatted partition gives me
a much better TPS.

e.g. pgbench  -c 20 -t 5000 gives me 6305 TPS
(3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 
restart"
in between).




Hi, I found this interesting as I'm trying to do some benchmarks on my box 
which is

> very similar to the above but I don't believe the tps is any where near what 
it should be.
> Is the 6305 figure from xfs?

yes, it is.


I'm assuming that your main data array is just 2 15k sas drives,


correct


are you putting the WAL on the data array or is that stored somewhere else?


pg_xlog is placed in the data array.


Can I ask what scaling params,


sure, I've initialized pgbench db issuing:

pgbench -i -s 10 pgbench


etc you used to build the pgbench tables and look at your postgresql.conf file 
to see if I missed something (offline if you wish)


those are non default values in postgresql.conf

listen_addresses = '*'
max_connections = 100
shared_buffers = 3200MB
work_mem = 30MB
maintenance_work_mem = 800MB
synchronous_commit = off
full_page_writes = off
checkpoint_segments = 40
checkpoint_completion_target = 0.9
random_page_cost = 3.5
effective_cache_size = 10GB
log_timezone = 'localtime'
stats_temp_directory = 'pg_stat_tmp_ram'
autovacuum_naptime = 5min

and then OS tweaks:

HT bios disabled
/sbin/blockdev --setra 8192 /dev/sdb
echo deadline > /sys/block/sdb/queue/scheduler
vm.swappiness=0
vm.overcommit_memory=2
vm.dirty_ratio=2
vm.dirty_background_ratio=1
kernel.shmmax=3454820352
kernel.shmall=2048341
$PGDATA is on xfs (rw,noatime)
tmpfs on /db/9.2/pg_stat_tmp_ram type tmpfs (rw,size=50M,uid=1001,gid=1001)
kernel 3.2.0-32-generic


Andrea




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]

2012-12-06 Thread Andrea Suisani

[added performance list back]

On 12/06/2012 10:04 AM, John Lister wrote:

Thanks for the info, I'll have a play and see what values I get with similar 
settings, etc


you're welcome


Still think something is wrong with my config, but we'll see.


which kind of ssd disks do you have ?
maybe they are of the same typeShaun Thomas is having problem with here:
http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php

Andrea



john

On 06/12/2012 08:44, Andrea Suisani wrote:

Hi John,

On 12/06/2012 09:29 AM, John Lister wrote:



on this box:


in a brief: the box is dell a PowerEdge r720 with 16GB of RAM,
the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid
(sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array
(sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache
of 512 MB). (ubuntu 12.04)


on the same machine with the same configuration,
having PGDATA on a xfs formatted partition gives me
a much better TPS.

e.g. pgbench  -c 20 -t 5000 gives me 6305 TPS
(3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 
restart"
in between).




Hi, I found this interesting as I'm trying to do some benchmarks on my box 
which is

> very similar to the above but I don't believe the tps is any where near what 
it should be.
> Is the 6305 figure from xfs?

yes, it is.


I'm assuming that your main data array is just 2 15k sas drives,


correct


are you putting the WAL on the data array or is that stored somewhere else?


pg_xlog is placed in the data array.


Can I ask what scaling params,


sure, I've initialized pgbench db issuing:

pgbench -i -s 10 pgbench


etc you used to build the pgbench tables and look at your postgresql.conf file 
to see if I missed something (offline if you wish)


those are non default values in postgresql.conf

listen_addresses = '*'
max_connections = 100
shared_buffers = 3200MB
work_mem = 30MB
maintenance_work_mem = 800MB
synchronous_commit = off
full_page_writes = off
checkpoint_segments = 40
checkpoint_completion_target = 0.9
random_page_cost = 3.5
effective_cache_size = 10GB
log_timezone = 'localtime'
stats_temp_directory = 'pg_stat_tmp_ram'
autovacuum_naptime = 5min

and then OS tweaks:

HT bios disabled
/sbin/blockdev --setra 8192 /dev/sdb
echo deadline > /sys/block/sdb/queue/scheduler
vm.swappiness=0
vm.overcommit_memory=2
vm.dirty_ratio=2
vm.dirty_background_ratio=1
kernel.shmmax=3454820352
kernel.shmall=2048341
$PGDATA is on xfs (rw,noatime)
tmpfs on /db/9.2/pg_stat_tmp_ram type tmpfs (rw,size=50M,uid=1001,gid=1001)
kernel 3.2.0-32-generic


Andrea









--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]

2012-12-06 Thread John Lister

On 06/12/2012 09:33, Andrea Suisani wrote:


which kind of ssd disks do you have ?
maybe they are of the same typeShaun Thomas is having problem with here:
http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php
Yeah i saw that post, I'm running the same version of ubuntu with the 
3.2 kernel, so when I get a chance to take it down will try the new 
kernels, although ubuntu are on 3.5 now... Shaun didn't post what 
hardware he was running on, so it would be interesting to see how it 
compares. They are intel 320s, which while not the newest should offer 
some protection against power failure, etc



John


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: bitmap scan troubles

2012-12-06 Thread postgresql
That is very interesting indeed, these indexes are quite large!

I will apply that patch and try it out this evening and let you know.

Thank you very much everyone for your time, the support has been amazing.

PS: Just looked at this thread on the archives page and realised I don't
have my name in FROM: field, which is a misconfiguration of my email client,
but figured I would leave it to prevent confusion, sorry about that.

All the best,

Philip Scott

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: 05 December 2012 18:05
To: Jeff Janes
Cc: postgre...@foo.me.uk; postgres performance list
Subject: Re: [PERFORM] Slow query: bitmap scan troubles

Jeff Janes  writes:
> I now see where the cost is coming from.  In commit 21a39de5809 (first 
> appearing in 9.2) the "fudge factor" cost estimate for large indexes 
> was increased by about 10 fold, which really hits this index hard.

> This was fixed in commit bf01e34b556 "Tweak genericcostestimate's 
> fudge factor for index size", by changing it to use the log of the 
> index size.  But that commit probably won't be shipped until 9.3.

Hm.  To tell you the truth, in October I'd completely forgotten about the
January patch, and was thinking that the 1/1 cost had a lot of history
behind it.  But if we never shipped it before 9.2 then of course that idea
is false.  Perhaps we should backpatch the log curve into 9.2 --- that would
reduce the amount of differential between what
9.2 does and what previous branches do for large indexes.

It would definitely be interesting to know if applying bf01e34b556 helps the
OP's example.

regards, tom lane




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]

2012-12-06 Thread Andrea Suisani

On 12/06/2012 12:37 PM, John Lister wrote:

On 06/12/2012 09:33, Andrea Suisani wrote:


which kind of ssd disks do you have ?
maybe they are of the same typeShaun Thomas is having problem with here:
http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php

Yeah i saw that post, I'm running the same version of ubuntu with the 3.2 
kernel, so when I get a chance to take it down will try the new kernels, 
although ubuntu are on 3.5 now... Shaun didn't post what hardware he was 
running on, so it would be interesting to see how it compares. They are intel
320s, which while not the newest should offer some protection against power 
failure, etc


reading again the thread I realized Shaun is using
fusionIO driver and he said that the regression is due
to "some recent 3.2 kernel patch borks the driver in
some horrible way".

so maybe you're not on the same boat (since you're
using intel 320), or maybe the kernel regression
he's referring to is related to the kernel subsystem
that deal with ssd disks independently from brands.
In the latter case testing a different kernel would be worthy.

Andrea



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: bitmap scan troubles

2012-12-06 Thread postgresql
> I also wonder if increasing (say x10) of default_statistics_target or just
doing ALTER TABLE SET STATISTICS for particular tables will help.
> It will make planned to produce more precise estimations. Do not forget
ANALYZE afer changing it.

Thanks Sergey, I will try this too.

I think the bother here is that this statistics are pretty good (we do
analyse regularly and default_statistics_target is already 1000), but once I
start filtering the two tables the correlations alter quite a bit. I don't
think there is that much that can be done about that :)

- Phil




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: bitmap scan troubles

2012-12-06 Thread postgresql
Hi Jeff

> It kind of does.  The expected speed is predicated on the number of rows
being 200 fold higher.  If the number of rows actually was that much higher,
the two speeds might be closer together.  That is why it would be
interesting to see a more typical case where the actual number of rows is
closer to the 2000 estimate.

Ah, I see of course. Makes a lot of sense when you think about it. This has
been quite an enlightening adventure into the guts of postgres for me :)

> But I am curious about how the cost estimate for the primary key look up
is arrived at:
( Delt with in your next reply, thanks for figuring that out! I will
certainly try the patch)


> I've heard good things about Greg Smith's book, but I don't know if it
covers this particular thing.

A copy is on its way, thank you.

>> We are in the rather pleasant situation here in that we are willing to 
>> spend money on the box (up to a point, but quite a large point) to get 
>> it up to the spec so that it should hardly ever need to touch the 
>> disk, the trick is figuring out how to let our favourite database server
know that.
> Well, that part is fairly easy.  Make random_page_cost and seq_page_cost
much smaller than their defaults.  Like, 0.04 and 0.03, for example.

Yes, I have been playing a lot with that it makes a lot of difference. When
I tweak them down I end up getting a lot of nested loops instead of hash or
merge joins and they are much faster (presumably we might have gotten a
nested loop out of the planner if it could correctly estimate the low number
of rows returned).

I've got plenty of ammunition now to dig deeper, you guys have been
invaluable.

Cheers,

Phil




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: bitmap scan troubles

2012-12-06 Thread Jeff Janes
On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire  wrote:
> On Wed, Dec 5, 2012 at 2:39 PM, Jeff Janes  wrote:
>> I'm not sure that this change would fix your problem, because it might
>> also change the costs of the alternative plans in a way that
>> neutralizes things.  But I suspect it would fix it.  Of course, a
>> correct estimate of the join size would also fix it--you have kind of
>> a perfect storm here.
>
> As far as I can see on the explain, the misestimation is 3x~4x not 200x.

It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index
Only Scan using idx_trade_id_book on trade".

But for the join of both tables it is estimate 2120 vs actual 11.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: bitmap scan troubles

2012-12-06 Thread Claudio Freire
On Thu, Dec 6, 2012 at 2:27 PM, Jeff Janes  wrote:
> On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire  wrote:
>> On Wed, Dec 5, 2012 at 2:39 PM, Jeff Janes  wrote:
>>> I'm not sure that this change would fix your problem, because it might
>>> also change the costs of the alternative plans in a way that
>>> neutralizes things.  But I suspect it would fix it.  Of course, a
>>> correct estimate of the join size would also fix it--you have kind of
>>> a perfect storm here.
>>
>> As far as I can see on the explain, the misestimation is 3x~4x not 200x.
>
> It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index
> Only Scan using idx_trade_id_book on trade".
>
> But for the join of both tables it is estimate 2120 vs actual 11.

But the final result set isn't further worked on (except for the
aggregate), which means it doesn't affect the cost by much.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: bitmap scan troubles

2012-12-06 Thread Jeff Janes
On Thu, Dec 6, 2012 at 12:05 PM, Claudio Freire  wrote:
> On Thu, Dec 6, 2012 at 2:27 PM, Jeff Janes  wrote:
>> On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire  
>> wrote:
>>> As far as I can see on the explain, the misestimation is 3x~4x not 200x.
>>
>> It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index
>> Only Scan using idx_trade_id_book on trade".
>>
>> But for the join of both tables it is estimate 2120 vs actual 11.
>
> But the final result set isn't further worked on (except for the
> aggregate), which means it doesn't affect the cost by much.

Good point.  Both the NL and hash join do about the same amount of
work probing for success whether the success is actually there or not.

So scratch what I said about the correlation being important, in this
case it is not.

The 3x error is enough to push it over the edge, but the fudge factor
is what gets it so close to that edge in the first place.

And I'm now pretty sure the fudge factor change would fix this.  The
truly-fast NL plan is getting overcharged by the fudge-factor once per
each 14,085 of the loopings, while the truly-slow bitmap scan is
overcharged only once for the entire scan.  So the change is by no
means neutralized between the two plans.

I don't know if my other theory that the bitmap scan is overflowing
work_mem (but not costed for doing so) is also contributing.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance