Re: [PERFORM] Simple join optimized badly?
> Brian Herlihy <[EMAIL PROTECTED]> writes: > > What would it take for hints to be added to postgres? > > A *whole lot* more thought and effort than has been expended on the > subject to date. > > Personally I have no use for the idea of "force the planner to do > exactly X given a query of exactly Y". You don't have exactly Y > today, tomorrow, and the day after (if you do, you don't need a > hint mechanism at all, you need a mysql-style query cache). > IMHO most of the planner mistakes we see that could be fixed via > hinting are really statistical estimation errors, and so the right > level to be fixing them at is hints about how to estimate the number > of rows produced for given conditions. Mind you that's still a plenty > hard problem, but you could at least hope that a hint of that form > would be useful for more than one query. > Do I understand correctly that you're suggesting it might not be a bad idea to allow users to provide statistics? Is this along the lines of "I'm loading a big table and touching every row of data, so I may as well collect some stats along the way" and "I know my data contains these statistical properties, but the analyzer wasn't able to figure that out (or maybe can't figure it out efficiently enough)"? While it seems like this would require more knowledge from the user (e.g. more about their data, how the planner works, and how it uses statistics) this would actually be helpful/required for those who really care about performance. I guess it's the difference between a tool advanced users can get long term benefit from, or a quick fix that will probably come back to bite you. I've been pleased with Postgres' thoughtful design; recently I've been doing some work with MySQL, and can't say I feel the same way. Also, I'm guessing this has already come up at some point, but what about allowing PG to do some stat collection during queries? If you're touching a lot of data (such as an import process) wouldn't it be more efficient (and perhaps more accurate) to collect stats then, rather than having to re-scan? It would be nice to be able to turn this on/off on a per query basis, seeing as it could have pretty negative impacts on OLTP performance... - Bucky ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Simple join optimized badly?
Bucky Jordan wrote: Is this along the lines of "I'm loading a big table and touching every row of data, so I may as well collect some stats along the way" and "I know my data contains these statistical properties, but the analyzer wasn't able to figure that out (or maybe can't figure it out efficiently enough)"? While it seems like this would require more knowledge from the user (e.g. more about their data, how the planner works, and how it uses statistics) this would actually be helpful/required for those who really care about performance. ... The user would have to know his data, but he wouldn't need to know how the planner works. While with hints like "use index X", he *does* need to know how the planner works. Being able to give hints about statistical properties of relations and their relationships seems like a good idea to me. And we can later figure out ways to calculate them automatically. BTW, in DB2 you can declare a table as volatile, which means that the cardinality of the table varies greatly. The planner favors index scans on volatile tables. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple join optimized badly?
Heikki Linnakangas wrote: > BTW, in DB2 you can declare a table as volatile, which means that the > cardinality of the table varies greatly. The planner favors index scans > on volatile tables. Now that seems like a valuable idea. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Simple join optimized badly?
Tom, I'm interested in the problem of cross-column statistics from a theoretical perspective. It would be interesting to sit down and try to reason out a useful solution, or at very least to understand the problem better so I can anticipate when it might come and eat me. >From my understanding, the main problem is that if PG knows the selectivity of n conditions C1,C2,...,Cn then it doesn't know whether the combined selectivity will be C1*C2*...*Cn (conditions are independent) or max(C1,C2,...,Cn) (conditions are strictly dependent), or somewhere in the middle. Therefore, row estimates could be orders of magnitude off. I suppose a common example would be a table with a serial primary key column and a timestamp value which is always inserted as CURRENT_TIMESTAMP, so the two columns are strongly correlated. If the planner guesses that 1% of the rows of the table will match pk>100, and 1% of the rows of the table will match timestamp > X, then it would be nice for it to know that if you specify both "pk>100 AND timestamp>X" that the combined selectivity is still only 1% and not 1% * 1% = 0.01%. As long as I'm sitting down and reasoning about the problem anyway, are there any other types of cases you're aware of where some form of cross- column statistics would be useful? In the unlikely event that I actually come up with a brilliant and simple solution, I'd at least like to make sure that I'm solving the right problem :) Thanks, Mark Lewis On Tue, 2006-10-10 at 22:38 -0400, Tom Lane wrote: > Brian Herlihy <[EMAIL PROTECTED]> writes: > > What would it take for hints to be added to postgres? > > A *whole lot* more thought and effort than has been expended on the > subject to date. > > Personally I have no use for the idea of "force the planner to do > exactly X given a query of exactly Y". You don't have exactly Y > today, tomorrow, and the day after (if you do, you don't need a > hint mechanism at all, you need a mysql-style query cache). > IMHO most of the planner mistakes we see that could be fixed via > hinting are really statistical estimation errors, and so the right > level to be fixing them at is hints about how to estimate the number > of rows produced for given conditions. Mind you that's still a plenty > hard problem, but you could at least hope that a hint of that form > would be useful for more than one query. > > regards, tom lane > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Scrub one large table against another
Tom Lane wrote: Brendan Curran <[EMAIL PROTECTED]> writes: So much time is being spent in the Unique and Sort leaves... I would think that it wouldn't need to do the unique portion, since there is no DISTINCT clause... There's nothing in that query suggesting that suppress.email is unique. If you know that it is, try using a plain join instead of an IN. regards, tom lane Interestingly, and thank you to Tom and Jim, the explicit JOIN improved performance tremendously (RESULTS BELOW). I converted the entire query to use explicit joins instead of IN and EXISTS and discovered acceptable performance. I think the next place to go from here is RAID1/RAID10 and possibly partitioning my large table (Welcome to DDL insanity, right?). I have to add that I'm a little surprised the documentation is so generous to IN and EXISTS. Is there something amiss in my configuration that prevents them from performing correctly? If not, I can't imagine a time when IN or EXISTS would be more performant than an explicit JOIN... Additionally, I manually scrub for duplicates at the group level in the email_record table to keep my records unique. I would like to use a unique constraint, but have found that batching in JDBC is impossible due to irrecoverable errors even when using BEFORE INSERT triggers to just return NULL if a record exists already. Has anyone got an elegant solution for the 'add only if not exists already' problem similar to MSSQL's MERGE command? Just one more thing... I have found that maintaining a btree index on a varchar(255) value is extremely expensive on insert/update/delete. It is unfortunately necessary for me to maintain this index for queries and reports so I am transitioning to using an unindexed staging table to import data into before merging it with the larger table. All the docs and posts recommend is to drop the index, import your data, and then create the index again. This is untenable on a daily / bi-weekly basis. Is there a more elegant solution to this indexing problem? Thank you for all of your help! EXPLAIN ANALYZE result comparison... 1. EXPLAIN ANALYZE SELECT email_record_id from ONLY email_record er WHERE email_list_id = 13 AND email IN (select email from suppress); Hash Join (cost=8359220.68..9129843.00 rows=800912 width=8) (actual time=2121601.603..2121601.603 rows=0 loops=1) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) (actual time=1165955.907..1434439.731 rows=21646261 loops=1) -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) (actual time=1165955.903..1384667.715 rows=21646261 loops=1) Sort Key: suppress.email -> Seq Scan on suppress (cost=0.00..393024.60 rows=21646260 width=25) (actual time=37.784..609848.551 rows=21646261 loops=1) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual time=554522.983..554522.983 rows=3245336 loops=1) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) (actual time=275640.435..541342.727 rows=3245336 loops=1) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) (actual time=275102.037..275102.037 rows=5172979 loops=1) Index Cond: (email_list_id = 13) Total runtime: 2,122,693.864 ms 2. EXPLAIN ANALYZE SELECT email_record_id FROM ONLY email_record er JOIN suppress s USING (email) WHERE er.email_list_id = 13; Hash Join (cost=3945127.49..5000543.11 rows=800912 width=8) (actual time=808874.088..808874.088 rows=0 loops=1) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Seq Scan on suppress s (cost=0.00..393024.60 rows=21646260 width=25) (actual time=661.518..216933.399 rows=21646261 loops=1) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual time=494294.932..494294.932 rows=3245336 loops=1) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) (actual time=242198.226..485942.542 rows=3245336 loops=1) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) (actual time=241769.786..241769.786 rows=5172979 loops=1) Index Cond: (email_list_id = 13) Total runtime: 808,884.387 ms ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Scrub one large table against another (vmstat output)
What prevents you from using an aggregate function? I guess I could actually obtain the results in an aggregate function and use those to maintain a summary table. There is a web view that requires 'as accurate as possible' numbers to be queried per group (all 40 groups are displayed on the same page) and so constant aggregates over the entire table would be a nightmare. Probably not 2x, but better performance than now. You probably don't want RAID 1, depending on your setup, many list member swear by RAID 10. Of course, your setup will depend on how much money you have to burn. That said, RAID 1 testing will allow you to determine the upper bounds of your hardware. Some folks say they get better performance with WAL off the main RAID, some keep it on. Only testing will allow you to determine what is optimal. I will have to try moving WAL off those raid spindles, I have seen the posts regarding this. In the meantime, you need to identify the bottleneck of your operation. You should collect vmstat and iostat statistics for your present setup. Good luck! I have to confess that I am a bit of a novice with vmstat. Below is a sample of my vmstat output while running two scrubbing queries simultaneously: machine:/dir# vmstat -S M 2 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 1 4117 15 296200 10025 96 107 2 0 86 11 0 3 4117 15 296200 4884 1860 415 841 18 1 52 29 1 1 4115 15 296400 2246 1222 462 394 8 0 51 41 0 2 4114 14 296700 3932 2238 485 613 12 0 62 25 1 1 4115 13 296600 3004 1684 507 609 8 0 60 31 0 3 4116 13 296500 4688 4000 531 613 15 1 52 33 1 1 4117 13 296400 2890 268 433 441 9 1 58 32 0 1 4114 13 296800 2802 4708 650 501 8 1 64 28 0 2 4114 13 296800 4850 1696 490 574 15 1 57 27 0 2 4116 13 296600 4300 3062 540 520 13 1 61 26 0 2 4115 13 296600 3292 3608 549 455 10 1 65 24 0 3 4115 13 296600 4856 2098 505 564 15 1 59 26 0 3 4115 13 296600 1608 2314 447 413 4 0 63 33 0 3 4116 13 296600 6206 1664 442 649 18 1 52 29 1 1 4115 13 296600 1886 1262 464 412 5 0 60 35 0 3 4118 13 296400 2510 4138 571 493 7 1 64 28 1 1 4117 13 296400 163256 325 373 5 0 53 42 0 3 4116 13 296500 5358 3510 504 649 14 1 59 26 1 1 4118 13 296400 2814 920 447 403 8 0 63 29 I know that wa is the time spent waiting on IO, but I lack a benchmark to determine just what I should expect from my hardware (three 146GB U320 SCSI 10k drives in raid 5 on a Dell PERC4ei PE2850 controller). Those drives are dedicated completely to a /data mount that contains only /data/postgresql/8.1/main. I have another two drives in raid 1 for everything else (OS, apps, etc.). Can you give me any pointers based on that vmstat output? Regards and Thanks, Brendan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Collect stats during seqscan (was: [PERFORM] Simple join optimized badly?)
On Wed, Oct 11, 2006 at 10:27:26AM -0400, Bucky Jordan wrote: > Also, I'm guessing this has already come up at some point, but what > about allowing PG to do some stat collection during queries? If you're > touching a lot of data (such as an import process) wouldn't it be more > efficient (and perhaps more accurate) to collect stats then, rather than > having to re-scan? It would be nice to be able to turn this on/off on a > per query basis, seeing as it could have pretty negative impacts on OLTP > performance... I suspect that could be highly useful in data warehouse environments where you're more likely to have to sequential scan a table. It would be interesting to have it so that a sequential scan that will run to completion also collects stats along the way. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Scrub one large table against another
On Wed, Oct 11, 2006 at 10:53:41AM -0600, Brendan Curran wrote: > Interestingly, and thank you to Tom and Jim, the explicit JOIN improved > performance tremendously (RESULTS BELOW). I converted the entire query > to use explicit joins instead of IN and EXISTS and discovered acceptable > performance. I think the next place to go from here is RAID1/RAID10 and > possibly partitioning my large table (Welcome to DDL insanity, right?). Remember that partitioning is not a magic bullet: it only helps in cases where you need to keep a lot of data, but normally only access a small portion of it. WAL on RAID5 without a really good controller will probably kill you. Data being there isn't too much better. You'll probably be better with either 1 raid 10 or 2 raid 1s. > I have to add that I'm a little surprised the documentation is so > generous to IN and EXISTS. Is there something amiss in my configuration > that prevents them from performing correctly? If not, I can't imagine a > time when IN or EXISTS would be more performant than an explicit JOIN... Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a record. For some cases, it's equivalent to IN, but not all. IN has to de-duplicate it's list in some fashion. For small IN lists, you can do this with an OR, but at some point you need to switch to an actual unique (actually, I suspect the difference in PostgreSQL just depends on if you passed values into IN or a subquery). A join on the other hand doesn't worry about duplicates at all. There may be some brains in the planner that realize if a subquery will return a unique set (ie: you're querying on a primary key). > Additionally, I manually scrub for duplicates at the group level in the > email_record table to keep my records unique. I would like to use a > unique constraint, but have found that batching in JDBC is impossible > due to irrecoverable errors even when using BEFORE INSERT triggers to > just return NULL if a record exists already. Has anyone got an elegant > solution for the 'add only if not exists already' problem similar to > MSSQL's MERGE command? Your best bet (until we have something akin to MERGE, hopefully in 8.3) is to load the data into a TEMP table and de-dupe it from there. Depending on what you're doing you might want to delete it, or update an ID column in the temp table. Note that assumes that only one process is loading data at any time, if that's not the case you have to get trickier. > Just one more thing... I have found that maintaining a btree index on a > varchar(255) value is extremely expensive on insert/update/delete. It is > unfortunately necessary for me to maintain this index for queries and > reports so I am transitioning to using an unindexed staging table to > import data into before merging it with the larger table. All the docs > and posts recommend is to drop the index, import your data, and then > create the index again. This is untenable on a daily / bi-weekly basis. > Is there a more elegant solution to this indexing problem? You might be happier with tsearch than a regular index. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Scrub one large table against another (vmstat output)
Hi, Brendan, Brendan Curran wrote: >> What prevents you from using an aggregate function? > > I guess I could actually obtain the results in an aggregate function and > use those to maintain a summary table. There is a web view that requires > 'as accurate as possible' numbers to be queried per group (all 40 groups > are displayed on the same page) and so constant aggregates over the > entire table would be a nightmare. That sounds just like a case for GROUP BY and a materialized view. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org