[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
on this box: in a brief: the box is dell a PowerEdge r720 with 16GB of RAM, the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache of 512 MB). (ubuntu 12.04) on the same machine with the same configuration, having PGDATA on a xfs formatted partition gives me a much better TPS. e.g. pgbench -c 20 -t 5000 gives me 6305 TPS (3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 restart" in between). Hi, I found this interesting as I'm trying to do some benchmarks on my box which is very similar to the above but I don't believe the tps is any where near what it should be. Is the 6305 figure from xfs? I'm assuming that your main data array is just 2 15k sas drives, are you putting the WAL on the data array or is that stored somewhere else? Can I ask what scaling params, etc you used to build the pgbench tables and look at your postgresql.conf file to see if I missed something (offline if you wish) I'm running 8x SSDs in RAID 10 for the data and pull just under 10k on a xfs system which is much lower than I'd expect for that setup and isn't significantly greater than your reported results, so something must be very wrong. Thanks John -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Ubuntu 12.04 / 3.2 Kernel Bad for PostgreSQL Performance
On 05/12/2012 18:28, Shaun Thomas wrote: Hey guys, This isn't a question, but a kind of summary over a ton of investigation I've been doing since a recent "upgrade". Anyone else out there with "big iron" might want to confirm this, but it seems pretty reproducible. This seems to affect the latest 3.2 mainline and by extension, any platform using it. My tests are restricted to Ubuntu 12.04, but it may apply elsewhere. Very interesting results, I've been trying to benchmark my box but getting what I would call poor performance given the setup, but I am running 3.2 on ubuntu 12.04. Could I ask what hardware (offline if you wish) was used for the results below? Comparing the latest official 3.2 kernel to the latest official 3.4 kernel (both Ubuntu), there are some rather striking differences. I'll start with some pgbench tests. * This test is 800 read-only clients, with 2 controlling threads on a 55GB database (scaling factor of 3600) for 3 minutes. * With 3.4: * Max TPS was 68933. * CPU was between 50 and 55% idle. * Load average was between 10 and 15. * With 3.2: * Max TPS was 17583. A total loss of 75% performance. * CPU was between 12 and 25% idle. * Load average was between 10 and 60---effectively random. * Next, we checked minimal write tests. This time, with only two clients. All other metrics are the same. * With 3.4: * Max TPS was 4548. * CPU was between 88 and 92% idle. * Load average was between 1.7 and 2.5. * With 3.2: * Max TPS was 4639. * CPU was between 88 and 92% idle. * Load average was between 3 and 4. TIme to see what a 3.4 kernel does to my setup I think? Thanks John -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
Hi John, On 12/06/2012 09:29 AM, John Lister wrote: on this box: in a brief: the box is dell a PowerEdge r720 with 16GB of RAM, the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache of 512 MB). (ubuntu 12.04) on the same machine with the same configuration, having PGDATA on a xfs formatted partition gives me a much better TPS. e.g. pgbench -c 20 -t 5000 gives me 6305 TPS (3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 restart" in between). Hi, I found this interesting as I'm trying to do some benchmarks on my box which is > very similar to the above but I don't believe the tps is any where near what it should be. > Is the 6305 figure from xfs? yes, it is. I'm assuming that your main data array is just 2 15k sas drives, correct are you putting the WAL on the data array or is that stored somewhere else? pg_xlog is placed in the data array. Can I ask what scaling params, sure, I've initialized pgbench db issuing: pgbench -i -s 10 pgbench etc you used to build the pgbench tables and look at your postgresql.conf file to see if I missed something (offline if you wish) those are non default values in postgresql.conf listen_addresses = '*' max_connections = 100 shared_buffers = 3200MB work_mem = 30MB maintenance_work_mem = 800MB synchronous_commit = off full_page_writes = off checkpoint_segments = 40 checkpoint_completion_target = 0.9 random_page_cost = 3.5 effective_cache_size = 10GB log_timezone = 'localtime' stats_temp_directory = 'pg_stat_tmp_ram' autovacuum_naptime = 5min and then OS tweaks: HT bios disabled /sbin/blockdev --setra 8192 /dev/sdb echo deadline > /sys/block/sdb/queue/scheduler vm.swappiness=0 vm.overcommit_memory=2 vm.dirty_ratio=2 vm.dirty_background_ratio=1 kernel.shmmax=3454820352 kernel.shmall=2048341 $PGDATA is on xfs (rw,noatime) tmpfs on /db/9.2/pg_stat_tmp_ram type tmpfs (rw,size=50M,uid=1001,gid=1001) kernel 3.2.0-32-generic Andrea -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
[added performance list back] On 12/06/2012 10:04 AM, John Lister wrote: Thanks for the info, I'll have a play and see what values I get with similar settings, etc you're welcome Still think something is wrong with my config, but we'll see. which kind of ssd disks do you have ? maybe they are of the same typeShaun Thomas is having problem with here: http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php Andrea john On 06/12/2012 08:44, Andrea Suisani wrote: Hi John, On 12/06/2012 09:29 AM, John Lister wrote: on this box: in a brief: the box is dell a PowerEdge r720 with 16GB of RAM, the cpu is a Xeon 5620 with 6 core, the OS is installed on a raid (sata disk 7.2k rpm) and the PGDATA is on separate RAID 1 array (sas 15K rpm) and the controller is a PERC H710 (bbwc with a cache of 512 MB). (ubuntu 12.04) on the same machine with the same configuration, having PGDATA on a xfs formatted partition gives me a much better TPS. e.g. pgbench -c 20 -t 5000 gives me 6305 TPS (3 runs with "echo 3 > /proc/sys/vm/drop_caches && /etc/init.d/postgresql-9.2 restart" in between). Hi, I found this interesting as I'm trying to do some benchmarks on my box which is > very similar to the above but I don't believe the tps is any where near what it should be. > Is the 6305 figure from xfs? yes, it is. I'm assuming that your main data array is just 2 15k sas drives, correct are you putting the WAL on the data array or is that stored somewhere else? pg_xlog is placed in the data array. Can I ask what scaling params, sure, I've initialized pgbench db issuing: pgbench -i -s 10 pgbench etc you used to build the pgbench tables and look at your postgresql.conf file to see if I missed something (offline if you wish) those are non default values in postgresql.conf listen_addresses = '*' max_connections = 100 shared_buffers = 3200MB work_mem = 30MB maintenance_work_mem = 800MB synchronous_commit = off full_page_writes = off checkpoint_segments = 40 checkpoint_completion_target = 0.9 random_page_cost = 3.5 effective_cache_size = 10GB log_timezone = 'localtime' stats_temp_directory = 'pg_stat_tmp_ram' autovacuum_naptime = 5min and then OS tweaks: HT bios disabled /sbin/blockdev --setra 8192 /dev/sdb echo deadline > /sys/block/sdb/queue/scheduler vm.swappiness=0 vm.overcommit_memory=2 vm.dirty_ratio=2 vm.dirty_background_ratio=1 kernel.shmmax=3454820352 kernel.shmall=2048341 $PGDATA is on xfs (rw,noatime) tmpfs on /db/9.2/pg_stat_tmp_ram type tmpfs (rw,size=50M,uid=1001,gid=1001) kernel 3.2.0-32-generic Andrea -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
On 06/12/2012 09:33, Andrea Suisani wrote: which kind of ssd disks do you have ? maybe they are of the same typeShaun Thomas is having problem with here: http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php Yeah i saw that post, I'm running the same version of ubuntu with the 3.2 kernel, so when I get a chance to take it down will try the new kernels, although ubuntu are on 3.5 now... Shaun didn't post what hardware he was running on, so it would be interesting to see how it compares. They are intel 320s, which while not the newest should offer some protection against power failure, etc John -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: bitmap scan troubles
That is very interesting indeed, these indexes are quite large! I will apply that patch and try it out this evening and let you know. Thank you very much everyone for your time, the support has been amazing. PS: Just looked at this thread on the archives page and realised I don't have my name in FROM: field, which is a misconfiguration of my email client, but figured I would leave it to prevent confusion, sorry about that. All the best, Philip Scott -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: 05 December 2012 18:05 To: Jeff Janes Cc: postgre...@foo.me.uk; postgres performance list Subject: Re: [PERFORM] Slow query: bitmap scan troubles Jeff Janes writes: > I now see where the cost is coming from. In commit 21a39de5809 (first > appearing in 9.2) the "fudge factor" cost estimate for large indexes > was increased by about 10 fold, which really hits this index hard. > This was fixed in commit bf01e34b556 "Tweak genericcostestimate's > fudge factor for index size", by changing it to use the log of the > index size. But that commit probably won't be shipped until 9.3. Hm. To tell you the truth, in October I'd completely forgotten about the January patch, and was thinking that the 1/1 cost had a lot of history behind it. But if we never shipped it before 9.2 then of course that idea is false. Perhaps we should backpatch the log curve into 9.2 --- that would reduce the amount of differential between what 9.2 does and what previous branches do for large indexes. It would definitely be interesting to know if applying bf01e34b556 helps the OP's example. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: xfs perform a lot better than ext4 [WAS: Re: Two identical systems, radically different performance]
On 12/06/2012 12:37 PM, John Lister wrote: On 06/12/2012 09:33, Andrea Suisani wrote: which kind of ssd disks do you have ? maybe they are of the same typeShaun Thomas is having problem with here: http://archives.postgresql.org/pgsql-performance/2012-12/msg00030.php Yeah i saw that post, I'm running the same version of ubuntu with the 3.2 kernel, so when I get a chance to take it down will try the new kernels, although ubuntu are on 3.5 now... Shaun didn't post what hardware he was running on, so it would be interesting to see how it compares. They are intel 320s, which while not the newest should offer some protection against power failure, etc reading again the thread I realized Shaun is using fusionIO driver and he said that the regression is due to "some recent 3.2 kernel patch borks the driver in some horrible way". so maybe you're not on the same boat (since you're using intel 320), or maybe the kernel regression he's referring to is related to the kernel subsystem that deal with ssd disks independently from brands. In the latter case testing a different kernel would be worthy. Andrea -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: bitmap scan troubles
> I also wonder if increasing (say x10) of default_statistics_target or just doing ALTER TABLE SET STATISTICS for particular tables will help. > It will make planned to produce more precise estimations. Do not forget ANALYZE afer changing it. Thanks Sergey, I will try this too. I think the bother here is that this statistics are pretty good (we do analyse regularly and default_statistics_target is already 1000), but once I start filtering the two tables the correlations alter quite a bit. I don't think there is that much that can be done about that :) - Phil -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: bitmap scan troubles
Hi Jeff > It kind of does. The expected speed is predicated on the number of rows being 200 fold higher. If the number of rows actually was that much higher, the two speeds might be closer together. That is why it would be interesting to see a more typical case where the actual number of rows is closer to the 2000 estimate. Ah, I see of course. Makes a lot of sense when you think about it. This has been quite an enlightening adventure into the guts of postgres for me :) > But I am curious about how the cost estimate for the primary key look up is arrived at: ( Delt with in your next reply, thanks for figuring that out! I will certainly try the patch) > I've heard good things about Greg Smith's book, but I don't know if it covers this particular thing. A copy is on its way, thank you. >> We are in the rather pleasant situation here in that we are willing to >> spend money on the box (up to a point, but quite a large point) to get >> it up to the spec so that it should hardly ever need to touch the >> disk, the trick is figuring out how to let our favourite database server know that. > Well, that part is fairly easy. Make random_page_cost and seq_page_cost much smaller than their defaults. Like, 0.04 and 0.03, for example. Yes, I have been playing a lot with that it makes a lot of difference. When I tweak them down I end up getting a lot of nested loops instead of hash or merge joins and they are much faster (presumably we might have gotten a nested loop out of the planner if it could correctly estimate the low number of rows returned). I've got plenty of ammunition now to dig deeper, you guys have been invaluable. Cheers, Phil -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: bitmap scan troubles
On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire wrote: > On Wed, Dec 5, 2012 at 2:39 PM, Jeff Janes wrote: >> I'm not sure that this change would fix your problem, because it might >> also change the costs of the alternative plans in a way that >> neutralizes things. But I suspect it would fix it. Of course, a >> correct estimate of the join size would also fix it--you have kind of >> a perfect storm here. > > As far as I can see on the explain, the misestimation is 3x~4x not 200x. It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index Only Scan using idx_trade_id_book on trade". But for the join of both tables it is estimate 2120 vs actual 11. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: bitmap scan troubles
On Thu, Dec 6, 2012 at 2:27 PM, Jeff Janes wrote: > On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire wrote: >> On Wed, Dec 5, 2012 at 2:39 PM, Jeff Janes wrote: >>> I'm not sure that this change would fix your problem, because it might >>> also change the costs of the alternative plans in a way that >>> neutralizes things. But I suspect it would fix it. Of course, a >>> correct estimate of the join size would also fix it--you have kind of >>> a perfect storm here. >> >> As far as I can see on the explain, the misestimation is 3x~4x not 200x. > > It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index > Only Scan using idx_trade_id_book on trade". > > But for the join of both tables it is estimate 2120 vs actual 11. But the final result set isn't further worked on (except for the aggregate), which means it doesn't affect the cost by much. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: bitmap scan troubles
On Thu, Dec 6, 2012 at 12:05 PM, Claudio Freire wrote: > On Thu, Dec 6, 2012 at 2:27 PM, Jeff Janes wrote: >> On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire >> wrote: >>> As far as I can see on the explain, the misestimation is 3x~4x not 200x. >> >> It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index >> Only Scan using idx_trade_id_book on trade". >> >> But for the join of both tables it is estimate 2120 vs actual 11. > > But the final result set isn't further worked on (except for the > aggregate), which means it doesn't affect the cost by much. Good point. Both the NL and hash join do about the same amount of work probing for success whether the success is actually there or not. So scratch what I said about the correlation being important, in this case it is not. The 3x error is enough to push it over the edge, but the fudge factor is what gets it so close to that edge in the first place. And I'm now pretty sure the fudge factor change would fix this. The truly-fast NL plan is getting overcharged by the fudge-factor once per each 14,085 of the loopings, while the truly-slow bitmap scan is overcharged only once for the entire scan. So the change is by no means neutralized between the two plans. I don't know if my other theory that the bitmap scan is overflowing work_mem (but not costed for doing so) is also contributing. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance