Re: [PERFORM] x206-x225
Jim C. Nasby wrote: I think you mean this... http://www.postgresql.org/docs/8.1/static/runtime-config-wal.html commit_delay (integer) No, that's not what I mean at all. On a system doing a large number of WAL-generating transactions per second, it's certainly possible for multiple transactions to commit in the period of time it takes for the platter to rotate back into position to allow for writing of the WAL data. What I don't know is if those multiple transactions would actually make it to the platter on that rotation, or if they'd serialize, resulting in one commit per revolution. I do know that there's no theoretical reason that they couldn't, it's just a matter of putting enough intelligence in the drive. Perhaps this is something that SCSI supports and (S)ATA doesn't, since SCSI allows multiple transactions to be 'in flight' on the bus at once. SCSI Command queueing: http://www.storagereview.com/guide2000/ref/hdd/if/scsi/protCQR.html SATA "native command queuing": http://www.tomshardware.com/2004/11/16/can_command_queuing_turbo_charge_sata/ But since you mention commit_delay, this does lead to an interesting possible use: set it equal to the effective rotational period of the drive. If you know your transaction load well enough, you could possibly gain some benefit here. But of course a RAID controller with a BBU would be a better bet... I suppose as long as you always have several transactions trying to commit, have a separate spindle(s) for the WAL then you could improve throughput at the cost of the shortest transaction times. Of course, it might be that the increase in lock duration etc. might outweigh any benefits. I'd suspect the cost/gain would be highly variable with changes in workload, and as you say write-cache+BBU seems more sensible. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] VACUUM FULL hangs
Hello list. I recently tried to do a slony replica of my database, and doing it falied. I retried, and then it succeeded (why it failed is another story). This caused that in the replica there is a lot of dead tuples ( If i understand correctly, a failure in creating the replica means a HUGE aborted transaction - and Slony should TRUNCATE the table, getting rid of dead tuples, but that is a subject for another list). so I did vacuum full verbose analyze (does it make sense ?) This hanged on a (quite large) table: INFO: vacuuming "public.calls" INFO: "calls": found 7980456 removable, 3989705 nonremovable row versions in 296943 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 136 to 224 bytes long. There were 891 unused item pointers. Total free space (including removable row versions) is 1594703944 bytes. 197958 pages are or will become empty, including 0 at the end of the table. 212719 pages containing 1588415680 free bytes are potential move destinations. CPU 7.25s/3.28u sec elapsed 144.95 sec. INFO: index "calls_pkey" now contains 3989705 row versions in 8975 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.35s/0.59u sec elapsed 39.03 sec. INFO: index "calls_cli" now contains 3989705 row versions in 13504 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.51s/0.60u sec elapsed 58.60 sec. INFO: index "calls_dnis" now contains 3989705 row versions in 13600 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.60s/0.90u sec elapsed 27.05 sec. INFO: index "calls_u" now contains 3989705 row versions in 23820 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.92s/0.78u sec elapsed 80.51 sec. INFO: index "calls_z" now contains 3989705 row versions in 13607 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.60s/0.85u sec elapsed 39.77 sec. It was hanging in this state for more than 3 hours, and I had to kill the vacuum process. >From iostat I saw that there was continuous write activity, steadilly about 1.3 MB/s (the disk system can do about 40 MB/s), and there were iowait processes. There was no read activity. There were no other clients for that database (but there were clients in other databases in the instance). version is 8.1.0 . Autovacuum is off. I upped maintenance_work_mem to 512 MB . Any hints? If nothing comes up today, I am scratching that replica. telefony=# \d calls Table "public.calls" Column |Type | Modifiers ---+-+-- -- dt| timestamp without time zone | machine_ip| integer | port | integer | filename | character varying(15) | account | character(11) | duration | integer | ani | character(32) | application | character(32) | dnis | integer | z | integer | client| integer | taryfa| integer | operator | character varying(20) | id| integer | not null default nextval(('seq_calls_id'::text)::regclass) outgoing | character(12) | release_cause | text| waiting | integer | oper_pin | integer | Indexes: "calls_pkey" PRIMARY KEY, btree (id) "calls_u" UNIQUE, btree (dt, dnis, port, machine_ip, account) "calls_cli" btree (client, dt) "calls_dnis" btree (dnis, dt) "calls_z" btree (z, dt) Triggers: _ctele_denyaccess_5 BEFORE INSERT OR DELETE OR UPDATE ON calls FOR EACH ROW EXECUTE PROCEDURE _ctele.denyaccess('_ctele') Pozdrawiam Marcin Mańk ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] VACUUM FULL hangs
Marcin Mańk wrote: Hello list. I recently tried to do a slony replica of my database, and doing it falied. I retried, and then it succeeded (why it failed is another story). This caused that in the replica there is a lot of dead tuples ( If i understand correctly, a failure in creating the replica means a HUGE aborted transaction - and Slony should TRUNCATE the table, getting rid of dead tuples, but that is a subject for another list). so I did vacuum full verbose analyze (does it make sense ?) Fair enough. If you want empty tables TRUNCATE is probably a better bet though. This hanged on a (quite large) table: INFO: vacuuming "public.calls" INFO: "calls": found 7980456 removable, 3989705 nonremovable row versions in 296943 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 136 to 224 bytes long. There were 891 unused item pointers. Total free space (including removable row versions) is 1594703944 bytes. 197958 pages are or will become empty, including 0 at the end of the table. 212719 pages containing 1588415680 free bytes are potential move destinations. OK, so there are 7.9 million removable rows and 3.9 million nonremovable so truncate isn't an option since you have data you presumably want to keep. It estimates about 200,000 pages will become empty, but none of them are at the end of the table. This represents 1.5GB of unused disk-space. I'm a bit puzzled as to how you managed to get so much free space at the start of the table. Did the replication work on the second try? CPU 7.25s/3.28u sec elapsed 144.95 sec. INFO: index "calls_pkey" now contains 3989705 row versions in 8975 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.35s/0.59u sec elapsed 39.03 sec. INFO: index "calls_cli" now contains 3989705 row versions in 13504 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.51s/0.60u sec elapsed 58.60 sec. INFO: index "calls_dnis" now contains 3989705 row versions in 13600 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.60s/0.90u sec elapsed 27.05 sec. INFO: index "calls_u" now contains 3989705 row versions in 23820 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.92s/0.78u sec elapsed 80.51 sec. INFO: index "calls_z" now contains 3989705 row versions in 13607 pages DETAIL: 108927 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.60s/0.85u sec elapsed 39.77 sec. It's done all the indexes (and seems to have done them quite quickly), and is presumably working on the data now. It was hanging in this state for more than 3 hours, and I had to kill the vacuum process. From iostat I saw that there was continuous write activity, steadilly about 1.3 MB/s (the disk system can do about 40 MB/s), and there were iowait processes. There was no read activity. There were no other clients for that database (but there were clients in other databases in the instance). OK, so you might well be getting the vacuum writing one page, then WAL, then vacuum, etc. That will mean the disk spends most of its time seeking back and fore. How many disks do you have, and is the WAL on a separate set of disks? I think it's just taking a long time because you have so many pages to move and not enough disk bandwidth. Of course the root of the problem is that you had so many dead rows after a failed replication, but you're right and that's another email. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] BETWEEN optimizer problems with single-value range
Kevin Grittner <[EMAIL PROTECTED]> schrieb: > Attached is a simplified example of a performance problem we have seen, Odd. Can you tell us your PG-Version? Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly."(unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] BETWEEN optimizer problems with single-value range
Attached is a simplified example of a performance problem we have seen, with a workaround and a suggestion for enhancement (hence both the performance and hackers lists). Our software is allowing users to specify the start and end dates for a query. When they enter the same date for both, the optimizer makes a very bad choice. We can work around it in application code by using an equality test if both dates match. I think the planner should be able to make a better choice here. (One obvious way to fix it would be to rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it seems like there is some underlying problem which should be fixed instead (or in addition to) this. The first query uses BETWEEN with the same date for both min and max values. The second query uses an equality test for the same date. The third query uses BETWEEN with a two-day range. In all queries, there are less than 4,600 rows for the specified cotfcNo value out of over 18 million rows in the table. We tried boosting the statistics samples for the columns in the selection, which made the estimates of rows more accurate, but didn't change the choice of plans. -Kevin between-optimization-problem.txt Description: Binary data ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] VACUUM FULL hangs
> I'm a bit puzzled as to how you managed to get so much free space at the > start of the table. Did the replication work on the second try? It actually worked on third try, I guess. > OK, so you might well be getting the vacuum writing one page, then WAL, > then vacuum, etc. That will mean the disk spends most of its time > seeking back and fore. How many disks do you have, and is the WAL on a > separate set of disks? It is 2 spindles software RAID1 . Till now there were no performance problems with this machine that would mandate trying anything more fancy, this machine is low traffic. Greetings Marcin Mańk ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] BETWEEN optimizer problems with single-value
>>> On Wed, Mar 15, 2006 at 12:17 pm, in message <[EMAIL PROTECTED]>, Andreas Kretschmer <[EMAIL PROTECTED]> wrote: > Kevin Grittner <[EMAIL PROTECTED]> schrieb: > >> Attached is a simplified example of a performance problem we have seen, > > Odd. Can you tell us your PG- Version? I know we really should move to 8.1.3, but I haven't gotten to it yet. We're on a build from the 8.1 stable branch as of February 10th, with a patch to allow ANSI standard interpretation of string literals. (So this is 8.1.2 with some 8.1.3 changes plus the string literal patch.) If there are any changes in that time frame which might affect this issue, I could deploy a standard release and make sure that I see the same behavior. Let me know. -Kevin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] BETWEEN optimizer problems with single-value
"Kevin Grittner" <[EMAIL PROTECTED]> writes: >> Odd. Can you tell us your PG- Version? > this is 8.1.2 with some 8.1.3 changes plus the string literal patch.) 8.1 is certainly capable of devising the plan you want, for example in the regression database: regression=# explain select * from tenk1 where thousand = 10 and tenthous between 42 and 144; QUERY PLAN Index Scan using tenk1_thous_tenthous on tenk1 (cost=0.00..6.01 rows=1 width=244) Index Cond: ((thousand = 10) AND (tenthous >= 42) AND (tenthous <= 144)) (2 rows) It looks to me like this is a matter of bad cost estimation, ie, it's thinking the other index is cheaper to use. Why that is is not clear. Can we see the pg_stats rows for ctofcNo and calDate? Also, try to force it to generate the plan you want, so we can see what it thinks the cost is for that. If you temporarily drop the wrong index you should be able to get there: begin; drop index "Cal_CalDate"; explain analyze select ... ; -- repeat as needed if it chooses some other wrong index rollback; I hope you have a play copy of the database to do this in --- although it would be safe to do the above in a live DB, the DROP would exclusive-lock the table until you finish the experiment and rollback, which probably is not good for response time ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Slow SELECTS after large update cycle
Hello, After fixing the hanging problems I reported here earlier (by uninstalling W2K3 SP1), I'm running into another weird one. After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), the first 'reporting' type query on tables involved in that write cycle is very slow. As an example, I have a query which according to EXPLAIN ANALYZE takes about 1.1s taking 46s. After this one hit, everything is back to normal, and subsequent executions of the same query are in fact subsecond. Restarting the appserver and pgsql does not make the slowness re-appear, only running another batch will. During the 'write'/batch cycle, a large number of rows in various tables are inserted and subsequently (repeatedly) updated. The reporting type queries after that are basically searches on those tables. Anybody any ideas? Thanks, jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Background writer configuration
We were seeing clusters of query timeouts with our web site, which were corrected by adjusting the configuration of the background writer. I'm posting just to provide information which others might find useful -- I don't have any problem I'm trying to solve in this regard. The web site gets 1 to 2 million hits per day, with about the same number of select queries run to provide data for the web pages. The load is distributed across multiple databases. (We have four, but the load is easily handled by any two of them, and we often take one or two out of web use for maintenance or special statistical runs.) Each database gets the same stream of modification requests -- about 2.7 million database transactions per day. Each transaction can contain multiple inserts, updates, or deletes. The peak times for both the web requests and the data modifications are in the afternoon on business days. Most web queries run under a timeout limit of 20 seconds. During peak times, we would see clusters of timeouts (where queries exceeded the 20 second limit) on very simple queries which normally run in a few milliseconds. The pattern suggested that checkpoints were at fault. I boosted the settings for the background writer from the defaults to the values below, and we saw a dramatic reduction in these timeouts. We also happened to have one machine which had been out of the replication mix which was in "catch up" mode, processing the transaction stream as fast as the database could handle it, without any web load. We saw the transaction application rate go up by a factor of four when I applied these changes: bgwriter_lru_percent = 2.0 bgwriter_lru_maxpages = 250 bgwriter_all_percent = 1.0 bgwriter_all_maxpages = 250 This was with shared_buffers = 2, so that last value was effectively limited to 200 by the percentage. I then did some calculations, based on the sustained write speed of our drive array (as measured by copying big files to it), and we tried this: bgwriter_lru_percent = 20.0 bgwriter_lru_maxpages = 200 bgwriter_all_percent = 10.0 bgwriter_all_maxpages = 600 This almost totally eliminated the clusters of timeouts, and caused the transaction application rate to increase by a factor of eight over the already-improved speed. (That is, we were running 30 to 35 times as many transactions per minute into the database, compared to the default background writer configuration.) I'm going to let these settings settle in for a week or two before we try adjusting them further (to see if we can eliminate those last few timeouts of this type). I guess my point is that people shouldn't be shy about boosting these numbers by a couple orders of magnitude from the default values. It may also be worth considering whether the defaults should be something more aggressive. -Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Background writer configuration
> I then did some calculations, based on the sustained write speed of our > drive array (as measured by copying big files to it), and we tried > this: > > bgwriter_lru_percent = 20.0 > bgwriter_lru_maxpages = 200 > bgwriter_all_percent = 10.0 > bgwriter_all_maxpages = 600 > > This almost totally eliminated the clusters of timeouts, and caused the > transaction application rate to increase by a factor of eight over the > already-improved speed. (That is, we were running 30 to 35 times as > many transactions per minute into the database, compared to the default > background writer configuration.) I'm going to let these settings > settle in for a week or two before we try adjusting them further (to see > if we can eliminate those last few timeouts of this type). Can you tell us what type of array you have? Joshua D. Drake > > I guess my point is that people shouldn't be shy about boosting these > numbers by a couple orders of magnitude from the default values. It may > also be worth considering whether the defaults should be something more > aggressive. > > -Kevin > > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: PLphp, PLperl - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] BETWEEN optimizer problems with single-value range
On 3/15/06, Kevin Grittner <[EMAIL PROTECTED]> wrote: > Attached is a simplified example of a performance problem we have seen, > with a workaround and a suggestion for enhancement (hence both the > performance and hackers lists). Hi Kevin. In postgres 8.2 you will be able to use the row-wise comparison for your query which should guarantee good worst case performance without having to maintain two separate query forms. it is also a more elegant syntax as you will see. SELECT "CA"."calDate", "CA"."startTime" FROM "Cal" "CA" WHERE ("CA"."ctofcNo", "CA"."calDate") BETWEEN (2192, '2006-03-15') and (2192, '2006-03-15') ORDER BY "ctofcNo", "calDate", "startTime"; Be warned this will not work properly in pg < 8.2. IMO, row-wise is the best way to write this type of a query. Please note the row constructor and the addition of ctofcNo into the order by clause to force use of the index. Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BETWEEN optimizer problems with single-value
>>> On Wed, Mar 15, 2006 at 1:17 pm, in message <[EMAIL PROTECTED]>, Tom Lane <[EMAIL PROTECTED]> wrote: > > 8.1 is certainly capable of devising the plan you want, for example > in the regression database: > > regression=# explain select * from tenk1 where thousand = 10 and tenthous > between 42 and 144; > QUERY PLAN > > Index Scan using tenk1_thous_tenthous on tenk1 (cost=0.00..6.01 rows=1 > width=244) >Index Cond: ((thousand = 10) AND (tenthous >= 42) AND (tenthous <= 144)) > (2 rows) That matches one of the examples where it optimized well. I only saw the bad plan when low and high ends of the BETWEEN range were equal. > It looks to me like this is a matter of bad cost estimation, ie, it's > thinking the other index is cheaper to use. Why that is is not clear. > Can we see the pg_stats rows for ctofcNo and calDate? schemaname | tablename | attname | null_frac | avg_width | n_distinct |most_common_vals | most_common_freqs | histogram_bounds | correlation +---+-+---+---++-+-++- public | Cal | calDate | 0 | 4 | 2114 | {2003-06-02,2000-06-20,2001-04-16,2003-06-17,2003-12-01,2004-10-12,2001-04-23,2001-10-15,2002-03-06,2002-05-03} | {0.0033,0.0023,0.0023,0.0023,0.0023,0.0023,0.002,0.002,0.002,0.002} | {1986-03-14,1999-06-11,2000-07-14,2001-05-18,2002-03-21,2002-12-04,2003-08-12,2004-05-13,2005-02-01,2005-09-28,2080-12-31} | 0.0545768 public | Cal | ctofcNo | 0 | 8 |669 | {0793,1252,1571,0964,0894,1310,"DA ",0944,1668,0400} | {0.024,0.019,0.015,0.012,0.012,0.011,0.0106667,0.01,0.0097,0.0087} | {,0507,0733,0878,1203,1336,14AG,1633,1971,3705,YVJO} | -0.0179665 (2 rows) > Also, try to force it to generate the plan you want, so we can see what > it thinks the cost is for that. If you temporarily drop the wrong index > you should be able to get there: > > begin; > drop index "Cal_CalDate"; > explain analyze select ... ; > -- repeat as needed if it chooses some other wrong index > rollback; Sort (cost=4.03..4.03 rows=1 width=12) (actual time=48.484..48.486 rows=4 loops=1) Sort Key: "calDate", "startTime" -> Index Scan using "Cal_CtofcNo" on "Cal" "CA" (cost=0.00..4.02 rows=1 width=12) (actual time=36.750..48.228 rows=4 loops=1) Index Cond: ((("ctofcNo")::bpchar = '2192'::bpchar) AND (("calDate")::date >= '2006-03-15'::date) AND (("calDate")::date <= '2006-03-15'::date)) Total runtime: 56.616 ms ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Background writer configuration
>>> On Wed, Mar 15, 2006 at 1:54 pm, in message <[EMAIL PROTECTED]>, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: >> I then did some calculations, based on the sustained write speed of our >> drive array (as measured by copying big files to it), and we tried >> this: >> >> bgwriter_lru_percent = 20.0 >> bgwriter_lru_maxpages = 200 >> bgwriter_all_percent = 10.0 >> bgwriter_all_maxpages = 600 >> >> This almost totally eliminated the clusters of timeouts, and caused the >> transaction application rate to increase by a factor of eight over the >> already- improved speed. (That is, we were running 30 to 35 times as >> many transactions per minute into the database, compared to the default >> background writer configuration.) I'm going to let these settings >> settle in for a week or two before we try adjusting them further (to see >> if we can eliminate those last few timeouts of this type). > > > Can you tell us what type of array you have? Each machine has a RAID5 array of 13 (plus one hot spare) 15,000 RPM Ultra 320 SCSI drives 2 machines using IBM ServRaid6M battery backed caching controllers 2 machines using IBM ServRaid4MX battery backed caching controllers ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] BETWEEN optimizer problems with single-value range
Merlin Moncure <[EMAIL PROTECTED]> schrieb: > On 3/15/06, Kevin Grittner <[EMAIL PROTECTED]> wrote: > > Attached is a simplified example of a performance problem we have seen, > > with a workaround and a suggestion for enhancement (hence both the > > performance and hackers lists). > > > Hi Kevin. In postgres 8.2 you will be able to use the row-wise 8.2? AFAIK, Feature freeze in juni/juli this year... Release august/september. > comparison for your query which should guarantee good worst case > performance without having to maintain two separate query forms. it Perhaps, a bitmap index scan (since 8.1) are useful for such querys. Thats why i asked which version. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly."(unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BETWEEN optimizer problems with single-value range
On 3/15/06, Andreas Kretschmer <[EMAIL PROTECTED]> wrote: > Merlin Moncure <[EMAIL PROTECTED]> schrieb: > > > On 3/15/06, Kevin Grittner <[EMAIL PROTECTED]> wrote: > > > Attached is a simplified example of a performance problem we have seen, > > > with a workaround and a suggestion for enhancement (hence both the > > > performance and hackers lists). > > > > > > Hi Kevin. In postgres 8.2 you will be able to use the row-wise > > 8.2? AFAIK, Feature freeze in juni/juli this year... > Release august/september. yes, but I was addressing kevin's point about enhancing the server... > > comparison for your query which should guarantee good worst case > > performance without having to maintain two separate query forms. it > > Perhaps, a bitmap index scan (since 8.1) are useful for such querys. > Thats why i asked which version. I think you will find that reading a range of records from a table ordered by an index utilizing the 8.2 comparison feature is much faster than a bitmap index scan. Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] BETWEEN optimizer problems with single-value range
On Wed, 2006-03-15 at 11:56 -0600, Kevin Grittner wrote: > Attached is a simplified example of a performance problem we have seen, > with a workaround and a suggestion for enhancement (hence both the > performance and hackers lists). > > Our software is allowing users to specify the start and end dates for a > query. When they enter the same date for both, the optimizer makes a > very bad choice. We can work around it in application code by using an > equality test if both dates match. I think the planner should be able > to make a better choice here. > (One obvious way to fix it would be to > rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it seems > like there is some underlying problem which should be fixed instead (or > in addition to) this. That might work, but I'm not sure if that is in itself the problem and it would be mostly wasted overhead in 99% of cases. The main issue appears to be that the planner chooses "Cal_CalDate" index rather than "Cal_CtofcNo" index when the BETWEEN values match. It seems that the cost of the first and third EXPLAINs is equal, yet for some reason it chooses different indexes in each case. My understanding was that it would pick the first index created if plan costs were equal. Is that behaviour repeatable with each query? ISTM that if we have equal plan costs then we should be choosing the index for which we have more leading columns, since that is more likely to lead to a more selective answer. But the plan selection is a simple "pick the best, or if they're equal pick the best sort order". > The first query uses BETWEEN with the same date for both min and max > values. The second query uses an equality test for the same date. The > third query uses BETWEEN with a two-day range. In all queries, there > are less than 4,600 rows for the specified cotfcNo value out of over 18 > million rows in the table. We tried boosting the statistics samples for > the columns in the selection, which made the estimates of rows more > accurate, but didn't change the choice of plans. The selectivity seems the same in both - clamped to a minimum of 1 row, so changing that doesn't look like it would help. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow SELECTS after large update cycle
Jan de Visser wrote: Hello, After fixing the hanging problems I reported here earlier (by uninstalling W2K3 SP1), I'm running into another weird one. After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), the first 'reporting' type query on tables involved in that write cycle is very slow. As an example, I have a query which according to EXPLAIN ANALYZE takes about 1.1s taking 46s. After this one hit, everything is back to normal, and subsequent executions of the same query are in fact subsecond. Restarting the appserver and pgsql does not make the slowness re-appear, only running another batch will. During the 'write'/batch cycle, a large number of rows in various tables are inserted and subsequently (repeatedly) updated. The reporting type queries after that are basically searches on those tables. After a large batch you need to run 'analyze' over the tables involved to get postgresql to update it's statistics so it can work out which indexes etc it should use. -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow SELECTS after large update cycle
On Wed, 2006-03-15 at 14:39 -0500, Jan de Visser wrote: > After fixing the hanging problems I reported here earlier (by uninstalling > W2K3 SP1), I'm running into another weird one. > > After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), > the first 'reporting' type query on tables involved in that write cycle is > very slow. As an example, I have a query which according to EXPLAIN ANALYZE > takes about 1.1s taking 46s. After this one hit, everything is back to > normal, and subsequent executions of the same query are in fact subsecond. > Restarting the appserver and pgsql does not make the slowness re-appear, only > running another batch will. > > During the 'write'/batch cycle, a large number of rows in various tables are > inserted and subsequently (repeatedly) updated. The reporting type queries > after that are basically searches on those tables. > > Anybody any ideas? This is caused by updating the commit status hint bits on each row touched by the SELECTs. This turns the first SELECT into a write operation. Try running a scan of the whole table to take the hit before you give it back to the users. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] BETWEEN optimizer problems with single-value
>>> On Wed, Mar 15, 2006 at 5:05 pm, in message <[EMAIL PROTECTED]>, Simon Riggs <[EMAIL PROTECTED]> wrote: > On Wed, 2006- 03- 15 at 11:56 - 0600, Kevin Grittner wrote: > >> (One obvious way to fix it would be to >> rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it seems >> like there is some underlying problem which should be fixed instead (or >> in addition to) this. > > That might work, but I'm not sure if that is in itself the problem and > it would be mostly wasted overhead in 99% of cases. It sounds like we agree. > The main issue appears to be that the planner chooses "Cal_CalDate" > index rather than "Cal_CtofcNo" index when the BETWEEN values match. Agreed. > It seems that the cost of the first and third EXPLAINs is equal, yet for > some reason it chooses different indexes in each case. My understanding > was that it would pick the first index created if plan costs were equal. > Is that behaviour repeatable with each query? It seems to be a consistent pattern, although strictly speaking our evidence is anecdotal. We've got hundreds of known failures with the BETWEEN variant on equal dates and no known successes. We have a few dozen tests of the equality variant with 100% success in those tests. > ISTM that if we have equal plan costs then we should be choosing the > index for which we have more leading columns, since that is more likely > to lead to a more selective answer. But the plan selection is a simple > "pick the best, or if they're equal pick the best sort order". > The selectivity seems the same in both - clamped to a minimum of 1 row, > so changing that doesn't look like it would help. The fact that it costs these as equivalent is surprising in itself, and might be worth examining. This might be an example of something I suggested a while ago -- that the rounding a row estimate to an integer on the basis that "you can't read half a row" is not necessarily wise, because you can have a 50% chance of reading a row versus a higher or lower percentage. -Kevin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] BETWEEN optimizer problems with single-value
On Wed, 2006-03-15 at 14:17 -0500, Tom Lane wrote: > It looks to me like this is a matter of bad cost estimation, ie, it's > thinking the other index is cheaper to use. Why that is is not clear. > Can we see the pg_stats rows for ctofcNo and calDate? ISTM that when the BETWEEN constants match we end up in this part of clauselist_selectivity()... ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] BETWEEN optimizer problems with single-value
On Thu, 2006-03-16 at 00:07 +, Simon Riggs wrote: > On Wed, 2006-03-15 at 14:17 -0500, Tom Lane wrote: > > > It looks to me like this is a matter of bad cost estimation, ie, it's > > thinking the other index is cheaper to use. Why that is is not clear. > > Can we see the pg_stats rows for ctofcNo and calDate? > > ISTM that when the BETWEEN constants match we end up in this part of > clauselist_selectivity()... (and now for the whole email...) /* * It's just roundoff error; use a small positive * value */ s2 = 1.0e-10; so that the planner underestimates the cost of using "Cal_CalDate" so that it ends up the same as "Cal_CtofcNo", and then we pick "Cal_CalDate" because it was created first. Using 1.0e-10 isn't very useful... the selectivity for a range should never be less than the selectivity for an equality, so we should simply put in a test against one of the pseudo constants and use that as the minimal value. That should lead to raising the apparent cost of Cal_CalDate so that Cal_CtofcNo can take precedence. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow SELECTS after large update cycle
On Wednesday 15 March 2006 18:21, Simon Riggs wrote: > On Wed, 2006-03-15 at 14:39 -0500, Jan de Visser wrote: > > After fixing the hanging problems I reported here earlier (by > > uninstalling W2K3 SP1), I'm running into another weird one. > > > > After doing a +/- 8hr cycle of updates and inserts (what we call a > > 'batch'), the first 'reporting' type query on tables involved in that > > write cycle is very slow. As an example, I have a query which according > > to EXPLAIN ANALYZE takes about 1.1s taking 46s. After this one hit, > > everything is back to normal, and subsequent executions of the same query > > are in fact subsecond. Restarting the appserver and pgsql does not make > > the slowness re-appear, only running another batch will. > > > > During the 'write'/batch cycle, a large number of rows in various tables > > are inserted and subsequently (repeatedly) updated. The reporting type > > queries after that are basically searches on those tables. > > > > Anybody any ideas? > > This is caused by updating the commit status hint bits on each row > touched by the SELECTs. This turns the first SELECT into a write > operation. > > Try running a scan of the whole table to take the hit before you give it > back to the users. Thanks Simon. I didn't know about the cause, but I expected the answer to be 'deal with it', as it is. At least I can explain it now... > > Best Regards, Simon Riggs jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] BETWEEN optimizer problems with single-value
Simon Riggs <[EMAIL PROTECTED]> writes: >> ISTM that when the BETWEEN constants match we end up in this part of >> clauselist_selectivity()... Yeah, I think you are right. > so that the planner underestimates the cost of using "Cal_CalDate" so > that it ends up the same as "Cal_CtofcNo", and then we pick > "Cal_CalDate" because it was created first. No, it doesn't end up the same --- but the difference is small enough to be in the roundoff-error regime. The real issue here is that we're effectively assuming that one row will be fetched from the index in both cases, and this is clearly not the case for the Cal_CalDate index. So we need a more accurate estimate for the boundary case. > Using 1.0e-10 isn't very useful... the selectivity for a range should > never be less than the selectivity for an equality, so we should simply > put in a test against one of the pseudo constants and use that as the > minimal value. That's easier said than done, because you'd first have to find the appropriate equality operator to use (ie, one having semantics that agree with the inequality operators). Another point is that the above statement is simply wrong, consider calDate BETWEEN '2006-03-15' AND '2006-03-14' for which an estimate of zero really is correct. Possibly we could drop this code's reliance on seeing SCALARLTSEL/SCALARGTSEL as the estimators, and instead try to locate a common btree opclass for the operators --- which would then let us identify the right equality operator to use, and also let us distinguish > from >= etc. If we're trying to get the boundary cases right I suspect we have to account for that. I could see such an approach being tremendously slow though :-(, because we'd go looking for btree opclasses even for operators that have nothing to do with < or >. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq