[PERFORM] Postgre 8.0 Installation - Issues
All, We are facing few issues while we install Postgres 8.0 in Windows 2000 Japanese OS. Installer kit name : postgresql-8.0-ja Scenario 1: While installing PostGRE 8.0, we got an logon failure at the end of installing the component telling that it failed to produce the process for initdb and also that the user name was not able to be recognized or the password is wrong. After the OK button was clicked the whole process rolled back automatically and the PostGRE got uninstalled. Scenario 2: In one of the computers we managed to install the PostGRE 8.0 but the database initialization could not be performed. While creating the database using the Credb patch we got an error telling that the tables were missing and the connection with the local host failed. Scenario 3: For one of the machines the database has also been created but once the system is restarted the PostGRE does not work and we get the same error as in the Scenario2. Please shed some light on this. If this question is not relevant to this group, please redirect us... Thanks and regards, Ravi DISCLAIMER The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simple join optimized badly?
On Mon, Oct 09, 2006 at 06:45:16PM -0400, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > One of the big problems with doing set enable_...=off is that there's no > > way to embed that into something like a view, so you're almost forced > > into putting into the application code itself, which makes matters even > > worse. If you could hint this within a query (maybe even on a per-table > > level), you could at least encapsulate that into a view. > > You've almost reinvented one of the points that was made in the last > go-round on the subject of hints, which is that keeping them out of the > application code is an important factor in making them manageable by a > DBA. Hints stored in a system catalog (and probably having the form of > "make this statistical assumption" rather than specifically "use that > plan") would avoid many of the negatives. Sure, but IIRC no one's figured out what that would actually look like, while it's not hard to come up with a syntax that allows you to tell the optimizer "scan index XYZ to access this table". (And if there's real interest in adding that I'll come up with a proposal.) I'd rather have the ugly solution sooner rather than the elegant one later (if ever). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple join optimized badly?
On Mon, Oct 09, 2006 at 03:41:09PM -0700, Joshua D. Drake wrote: > > > > One of the big problems with doing set enable_...=off is that there's no > > way to embed that into something like a view, so you're almost forced > > into putting into the application code itself, which makes matters even > > worse. If you could hint this within a query (maybe even on a per-table > > level), you could at least encapsulate that into a view. > > You can easily pass multiple statements within a single exec() or push > it into an SPF. Unless I'm missing something, putting multiple statements in a single exec means you're messing with the application code. And you can't update a SRF (also means messing with the application code). Though, I suppose you could update a view that pulled from an SRF... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Simple join optimized badly?
On Mon, Oct 09, 2006 at 08:22:39PM -0700, Joshua D. Drake wrote: > > > Imagine I got run over by a train, and someone was reading my code. > > Which would be easier for them to maintain: Code with weird SQL, or code > > with sensible, well-written SQL and explicit hints? > > You forgot the most important option: > > Code with appropriate documentation about your weird SQL. > > If you document your code, your argument is moot. You apparently didn't read the whole email. He said he did document his code. But his point is still valid: obscure code is bad even with documentation. Would you put something from the obfuscated C contest into production with comments describing what it does, or would you just write the code cleanly to begin with? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Simple join optimized badly?
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > I'd rather have the ugly solution sooner rather than the elegant one > later (if ever). The trouble with that is that we couldn't ever get rid of it, and we'd be stuck with backward-compatibility concerns with the first (over simplified) design. It's important to get it right the first time, at least for stuff that you know perfectly well is going to end up embedded in application code. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Simple join optimized badly?
On Tue, Oct 10, 2006 at 09:07:03AM -0500, Jim C. Nasby wrote: > Would you put something from the obfuscated C contest > into production with comments describing what it does, If nothing else, it would be a nice practical joke =) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Simple join optimized badly?
On Tue, Oct 10, 2006 at 10:14:48AM -0400, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > I'd rather have the ugly solution sooner rather than the elegant one > > later (if ever). > > The trouble with that is that we couldn't ever get rid of it, and we'd > be stuck with backward-compatibility concerns with the first (over > simplified) design. It's important to get it right the first time, > at least for stuff that you know perfectly well is going to end up > embedded in application code. We've depricated things before, I'm sure we'll do it again. Yes, it's a pain, but it's better than not having anything release after release. And having a formal hint language would at least allow us to eventually clean up some of these oddball cases, like the OFFSET 0 hack. I'm also not convinced that even supplimental statistics will be enough to ensure the planner always does the right thing, so query-level hints may have to stay (though it'd be great if that wasn't the case). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Simple join optimized badly?
Jim C. Nasby wrote: > On Mon, Oct 09, 2006 at 03:41:09PM -0700, Joshua D. Drake wrote: >>> One of the big problems with doing set enable_...=off is that there's no >>> way to embed that into something like a view, so you're almost forced >>> into putting into the application code itself, which makes matters even >>> worse. If you could hint this within a query (maybe even on a per-table >>> level), you could at least encapsulate that into a view. >> You can easily pass multiple statements within a single exec() or push >> it into an SPF. > > Unless I'm missing something, putting multiple statements in a single > exec means you're messing with the application code. And you can't > update a SRF (also means messing with the application code). Though, I > suppose you could update a view that pulled from an SRF... I always think of application code as outside the db. I was thinking more in layers. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Simple join optimized badly?
Jim C. Nasby wrote: > On Mon, Oct 09, 2006 at 08:22:39PM -0700, Joshua D. Drake wrote: >>> Imagine I got run over by a train, and someone was reading my code. >>> Which would be easier for them to maintain: Code with weird SQL, or code >>> with sensible, well-written SQL and explicit hints? >> You forgot the most important option: >> >> Code with appropriate documentation about your weird SQL. >> >> If you document your code, your argument is moot. > > You apparently didn't read the whole email. He said he did document his > code. But his point is still valid: obscure code is bad even with > documentation. Would you put something from the obfuscated C contest > into production with comments describing what it does, or would you just > write the code cleanly to begin with? You are comparing apples to oranges. We aren't talking about an obfuscated piece of code. We are talking about an SQL statement that solves a particular problem. That can easily be documented, and documented with enough verbosity that it is never a question, except to test and see if the problem exists in current versions. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple join optimized badly?
Steinar H. Gunderson wrote: > On Tue, Oct 10, 2006 at 09:07:03AM -0500, Jim C. Nasby wrote: >> Would you put something from the obfuscated C contest >> into production with comments describing what it does, > > If nothing else, it would be a nice practical joke =) nice isn't the word I would use ;) Joshua D. Drake > > /* Steinar */ -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] long running transactions
While doing a verbose vacuum, I'm constantly hitting things like: DETAIL: 3606 dead row versions cannot be removed yet. I believe this is a problem, because I still do have some empty tables requireing up to 3-400 ms just to check if the table is empty (see thread "slow queue-like empty table"). If pg_stat_activity.query_start actually is the start time of the transaction, then we've gotten rid of all the real long-running transactions. Then again, if pg_stat_activity.query_start actually was the start time of the transaction, the attribute would have been called pg_stat_activity.transaction_start, right? Is there any way to find the longest running transaction? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] long running transactions
Tobias Brox <[EMAIL PROTECTED]> writes: > If pg_stat_activity.query_start actually is the start time of the > transaction, ... but it isn't. > Is there any way to find the longest running transaction? Look in pg_locks to see the lowest-numbered transaction ID --- each transaction will be holding exclusive lock on its own XID. You can correlate that back to pg_stat_activity via the PID. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Postgre 8.0 Installation - Issues
Moving to -general. On Tue, Oct 10, 2006 at 04:17:06PM +0530, Ravindran G - TLS, Chennai. wrote: > All, > > We are facing few issues while we install Postgres 8.0 in Windows 2000 > Japanese OS. Installer kit name : postgresql-8.0-ja Is there a reason you're not using 8.1.4? 8.0 was the first windows release, and as such there's a number of issues that were improved in 8.1. You should at least be using the latest 8.0 version (8.0.8). > Scenario 1: While installing PostGRE 8.0, we got an logon failure at the end BTW, it's PostgreSQL or Postgres. PostGRE doesn't exist... > of installing the component telling that it failed to produce the process > for initdb and also that the user name was not able to be recognized or the > password is wrong. After the OK button was clicked the whole process rolled > back automatically and the PostGRE got uninstalled. Make sure that you have the right password for the account that PostgreSQL will be running under. I often find it's easiest to just delete that account and let the installer create it for me. > Scenario 2: In one of the computers we managed to install the PostGRE 8.0 > but the database initialization could not be performed. While creating the > database using the Credb patch we got an error telling that the tables were > missing and the connection with the local host failed. > > Scenario 3: For one of the machines the database has also been created but > once the system is restarted the PostGRE does not work and we get the same > error as in the Scenario2. These could be issues surrounding administrator rights. PostgreSQL will refuse to start if the account it's running under has Administrator rights. > Please shed some light on this. If this question is not relevant to this > group, please redirect us... > > Thanks and regards, > Ravi > DISCLAIMER > The contents of this e-mail and any attachment(s) are confidential and > intended for the > > named recipient(s) only. It shall not attach any liability on the originator > or HCL or its > > affiliates. Any views or opinions presented in this email are solely those of > the author and > > may not necessarily reflect the opinions of HCL or its affiliates. Any form > of reproduction, > > dissemination, copying, disclosure, modification, distribution and / or > publication of this > > message without the prior written consent of the author of this e-mail is > strictly > > prohibited. If you have received this email in error please delete it and > notify the sender > > immediately. Before opening any mail and attachments please check them for > viruses and > > defect. > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match > -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Simple join optimized badly?
On Mon, Oct 09, 2006 at 23:33:03 +0200, Tobias Brox <[EMAIL PROTECTED]> wrote: > > Just a comment from the side line; can't the rough "set > enable_seqscan=off" be considered as sort of a hint anyway? There have > been situations where we've actually had to resort to such crud. That only works for simple queries. To be generally useful, you want to be able to hint how to handle each join being done in the query. The current controlls affect all joins. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] long running transactions
[Tom Lane - Tue at 12:23:40PM -0400] > Look in pg_locks to see the lowest-numbered transaction ID --- each > transaction will be holding exclusive lock on its own XID. You can > correlate that back to pg_stat_activity via the PID. Thanks a lot for the quick reply - I've already identified one long-running transaction. (I'm not allowed to order by xid, and not allowed to cast it to anything, how come?) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] long running transactions
[Tobias Brox - Tue at 06:39:13PM +0200] > Thanks a lot for the quick reply - I've already identified one > long-running transaction. belonging to autovacuum ... how come? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] long running transactions
Tobias Brox <[EMAIL PROTECTED]> writes: > [Tobias Brox - Tue at 06:39:13PM +0200] >> Thanks a lot for the quick reply - I've already identified one >> long-running transaction. > belonging to autovacuum ... how come? Blocked on someone else's lock, maybe? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] long running transactions
[Tom Lane - Tue at 12:42:52PM -0400] > > belonging to autovacuum ... how come? > > Blocked on someone else's lock, maybe? hardly, the autovacuum is the only one having such a low transaction id, and also the only one hanging around when waiting a bit and rechecking the pg_locks table. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] long running transactions
Tobias Brox <[EMAIL PROTECTED]> writes: >> Blocked on someone else's lock, maybe? > hardly, the autovacuum is the only one having such a low transaction id, > and also the only one hanging around when waiting a bit and rechecking > the pg_locks table. Hmph. Is the autovac process actually doing anything (strace would be revealing)? If not, can you attach to the autovac process with gdb and get a stack trace to see where it's blocked? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] long running transactions
[Tom Lane - Tue at 01:09:52PM -0400] > Hmph. Is the autovac process actually doing anything (strace would be > revealing)? If not, can you attach to the autovac process with gdb and > get a stack trace to see where it's blocked? Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the config would restart autovacuum. Well, whatever, we still have the nightly vacuum crontab. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] long running transactions
Tobias Brox <[EMAIL PROTECTED]> writes: > [Tom Lane - Tue at 01:09:52PM -0400] >> Hmph. Is the autovac process actually doing anything (strace would be >> revealing)? If not, can you attach to the autovac process with gdb and >> get a stack trace to see where it's blocked? > Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the > config would restart autovacuum. It'll come back after the autovacuum naptime. If it gets stuck again, please investigate. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Simple join optimized badly?
Jim, > We've depricated things before, I'm sure we'll do it again. Yes, it's a > pain, but it's better than not having anything release after release. > And having a formal hint language would at least allow us to eventually > clean up some of these oddball cases, like the OFFSET 0 hack. > > I'm also not convinced that even supplimental statistics will be enough > to ensure the planner always does the right thing, so query-level hints > may have to stay (though it'd be great if that wasn't the case). "stay"? I don't think that the general developers of PostgreSQL are going to *accept* anything that stands a significant chance of breaking in one release. You have you challange for the EDB development team: come up with a hinting language which is flexible enough not to do more harm than good (hint: it's not Oracle's hints). -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] long running transactions
[Tom Lane - Tue at 01:18:27PM -0400] > >> Hmph. Is the autovac process actually doing anything (strace would be > >> revealing)? It's definitively doing something; mostly reading, but also some few writes, semops and opens. > If not, can you attach to the autovac process with gdb and > >> get a stack trace to see where it's blocked? (gdb) bt #0 0xb7c599f8 in select () from /lib/tls/libc.so.6 #1 0x08253c53 in pg_usleep () #2 0x0812ee93 in vacuum_delay_point () #3 0x0812f2a5 in lazy_vacuum_rel () #4 0x0812ef7b in lazy_vacuum_rel () #5 0x0812b4b6 in vac_update_relstats () #6 0x0812a995 in vacuum () #7 0x0818d2ca in autovac_stopped () #8 0x0818ceae in autovac_stopped () #9 0x0818c848 in autovac_stopped () #10 0x0818c4e2 in autovac_start () #11 0x08192c11 in PostmasterMain () #12 0x08191dcf in PostmasterMain () #13 0x081541b1 in main () > It'll come back after the autovacuum naptime. If it gets stuck again, > please investigate. It seems stuck, has had the same transid for a long while, and the number of undeletable dead rows in our tables are increasing. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] long running transactions
Tobias Brox <[EMAIL PROTECTED]> writes: > (gdb) bt > #0 0xb7c599f8 in select () from /lib/tls/libc.so.6 > #1 0x08253c53 in pg_usleep () > #2 0x0812ee93 in vacuum_delay_point () > #3 0x0812f2a5 in lazy_vacuum_rel () > #4 0x0812ef7b in lazy_vacuum_rel () > #5 0x0812b4b6 in vac_update_relstats () That doesn't look particularly blocked, and if you are seeing reads/writes too, then it's doing something. > It seems stuck, has had the same transid for a long while, and the > number of undeletable dead rows in our tables are increasing. Perhaps you have overly aggressive vacuum cost delay settings? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] long running transactions
[Tom Lane - Tue at 02:04:55PM -0400] > > It seems stuck, has had the same transid for a long while, and the > > number of undeletable dead rows in our tables are increasing. > > Perhaps you have overly aggressive vacuum cost delay settings? Perhaps, though I wouldn't expect it to sleep in the middle of a transaction - and also, it really did seem to me that it's doing work rather than only sleeping. The transaction id for the vacuum process is the same now as when I wrote the previous email, and the number of dead unremovable rows have increased steadily. The settings in effect are: autovacuum_vacuum_cost_delay = 500 autovacuum_vacuum_cost_limit = 200 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] long running transactions
Tobias Brox <[EMAIL PROTECTED]> writes: >> Perhaps you have overly aggressive vacuum cost delay settings? > autovacuum_vacuum_cost_delay = 500 > autovacuum_vacuum_cost_limit = 200 Well, that's going to cause it to sleep half a second after every dozen or so page I/Os. I think you'd be well advised to reduce the delay. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] long running transactions
[Tom Lane - Tue at 02:26:53PM -0400] > > autovacuum_vacuum_cost_delay = 500 > > autovacuum_vacuum_cost_limit = 200 > > Well, that's going to cause it to sleep half a second after every dozen > or so page I/Os. I think you'd be well advised to reduce the delay. Modified it to 20/250, and it definitively helped. Sorry for the list verbosity; I should have been able to resolve this myself already some 2-3 emails ago :-) I wanted a "soft" introduction of autovac in production, and assumed that it was better to begin with too much sleep than too little! Well, well. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simple join optimized badly?
On Tue, Oct 10, 2006 at 10:28:29AM -0700, Josh Berkus wrote: > Jim, > > > We've depricated things before, I'm sure we'll do it again. Yes, it's a > > pain, but it's better than not having anything release after release. > > And having a formal hint language would at least allow us to eventually > > clean up some of these oddball cases, like the OFFSET 0 hack. > > > > I'm also not convinced that even supplimental statistics will be enough > > to ensure the planner always does the right thing, so query-level hints > > may have to stay (though it'd be great if that wasn't the case). > > "stay"? I don't think that the general developers of PostgreSQL are going > to *accept* anything that stands a significant chance of breaking in one > release. You have you challange for the EDB development team: come up > with a hinting language which is flexible enough not to do more harm than > good (hint: it's not Oracle's hints). My point was that I think we'll always have a need for fine-grained (ie: table and join level) hints, even if we do get the ability for users to over-ride the statistics system. It's just not possible to come up with automation that will handle every possible query that can be thrown at a system. I don't see how that means breaking anything in a given release. Worst-case, the optimizer might be able to do a better job of something than hints written for an older version of the database, but that's going to be true of any planner override we come up with. BTW, I'm not speaking for EnterpriseDB or it's developers here... query hints are something I feel we've needed for a long time. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Scrub one large table against another
I currently have a db supporting what is for the most part an OLAP data warehousing application. One table (good data) has roughly 120 million rows, divided into roughly 40 different relational groups (logically by foreign key). Every time I add data to this table, I need to afterwards scrub that group against known "bad data" which is housed in a second table that has roughly 21 million rows. The 120 million row good data table is called "email_record" The 21 million row bad data table is called "suppress" There are separate btree indexes on 'email_record_id', 'email_list_id' and 'email' on both tables. Each time I scrub data I pull out anywhere from 1 to 5 million rows from the good table (depending on the size of the group i'm scrubbing) and compare them against the 21 million rows in the 'suppress' table. So far I've done this using a temporary staging table that stores only the email_record_id for each row from the relevant group of the good table. I use a plsql function that does roughly the following (i've included only sql syntax and inserted the constant '9' where i would normally use a variable): The characters: email_record_id int8, email varchar(255), email_list_id int8 - CREATE TEMP TABLE temp_list_suppress(email_record_id int8); INSERT INTO temp_list_suppress SELECT email_record_id from ONLY email_record er WHERE email_list_id = 9 AND email IN (select email from suppress); CREATE INDEX unique_id_index on temp_list_suppress ( email_record_id ); INSERT INTO er_banned SELECT * from ONLY email_record er WHERE EXISTS (SELECT 1 from temp_list_suppress ts where er.email_record_id = ts.email_record_id)'; DELETE FROM ONLY email_record WHERE email_list_id = 9 AND email_record_id IN (SELECT email_record_id from temp_list_suppress); TRUNCATE TABLE temp_list_suppress; DROP TABLE temp_list_suppress; -- The performance is dreadful, is there a more efficient way to do this? Would I be better off just grabbing * initially from the good table instead of just the id to avoid more sequential searches later? Here are my configs: Debian Postgres 8.1.4 dual zeon ram: 4 gigs raid 5 # - Memory - shared_buffers = 3000 work_mem = 92768 maintenance_work_mem = 128384 autovacuum is turned off, and the db is annalyzed and vacuumed regularly. Regards, Brendan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Scrub one large table against another
Brendan Curran <[EMAIL PROTECTED]> writes: > CREATE TEMP TABLE temp_list_suppress(email_record_id int8); > INSERT INTO temp_list_suppress > SELECT email_record_id from ONLY email_record er > WHERE email_list_id = 9 AND email IN > (select email from suppress); > CREATE INDEX unique_id_index on temp_list_suppress ( email_record_id ); > INSERT INTO er_banned > SELECT * from ONLY email_record er WHERE EXISTS > (SELECT 1 from temp_list_suppress ts where er.email_record_id = > ts.email_record_id)'; > DELETE FROM ONLY email_record WHERE email_list_id = 9 AND email_record_id IN > (SELECT email_record_id from temp_list_suppress); > TRUNCATE TABLE temp_list_suppress; > DROP TABLE temp_list_suppress; > The performance is dreadful, is there a more efficient way to do this? Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? If you don't even know which part is slow, it's hard to improve. It would probably help to do an "ANALYZE temp_list_suppress" right after populating the temp table. As you have it, the second insert and delete are being planned with nothing more than a row count (obtained during CREATE INDEX) and no stats about distribution of the table contents. Also, I'd be inclined to try replacing the EXISTS with an IN test; in recent PG versions the planner is generally smarter about IN. (Is there a reason why you are doing the INSERT one way and the DELETE the other?) BTW, that TRUNCATE right before the DROP seems quite useless, although it's not the main source of your problem. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Scrub one large table against another
Tom Lane wrote: Brendan Curran <[EMAIL PROTECTED]> writes: CREATE TEMP TABLE temp_list_suppress(email_record_id int8); INSERT INTO temp_list_suppress SELECT email_record_id from ONLY email_record er WHERE email_list_id = 9 AND email IN (select email from suppress); CREATE INDEX unique_id_index on temp_list_suppress ( email_record_id ); INSERT INTO er_banned SELECT * from ONLY email_record er WHERE EXISTS (SELECT 1 from temp_list_suppress ts where er.email_record_id = ts.email_record_id)'; DELETE FROM ONLY email_record WHERE email_list_id = 9 AND email_record_id IN (SELECT email_record_id from temp_list_suppress); TRUNCATE TABLE temp_list_suppress; DROP TABLE temp_list_suppress; The performance is dreadful, is there a more efficient way to do this? Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? If you don't even know which part is slow, it's hard to improve. FIRST INSERT (Just the select is explained): Hash Join (cost=8359220.68..9129843.00 rows=800912 width=32) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) Sort Key: suppress.email -> Seq Scan on suppress (cost=0.00..393024.60 rows=21646260 width=25) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) Index Cond: (email_list_id = 13) SECOND INSERT (Using EXISTS): Seq Scan on email_record er (cost=0.00..381554175.29 rows=62254164 width=1863) Filter: (subplan) SubPlan -> Index Scan using er_primeq_pk on er_primeq eq (cost=0.00..3.03 rows=1 width=0) Index Cond: ($0 = email_record_id) SECOND INSERT (Using IN): Nested Loop (cost=26545.94..2627497.28 rows=27134 width=1863) -> HashAggregate (cost=26545.94..33879.49 rows=733355 width=8) -> Seq Scan on er_primeq (cost=0.00..24712.55 rows=733355 width=8) -> Index Scan using email_record_pkey on email_record er (cost=0.00..3.52 rows=1 width=1863) Index Cond: (er.email_record_id = "outer".email_record_id) Filter: (email_list_id = 13) DELETE Nested Loop (cost=26545.94..2627497.28 rows=50846 width=6) -> HashAggregate (cost=26545.94..33879.49 rows=733355 width=8) -> Seq Scan on er_primeq (cost=0.00..24712.55 rows=733355 width=8) -> Index Scan using email_record_pkey on email_record (cost=0.00..3.52 rows=1 width=14) Index Cond: (email_record.email_record_id = "outer".email_record_id) Filter: (email_list_id = 9) To get this explain data I used a sample "temp_suppress" table that contained about 700k rows and was indexed but not analyzed... It would probably help to do an "ANALYZE temp_list_suppress" right after populating the temp table. As you have it, the second insert and delete are being planned with nothing more than a row count (obtained during CREATE INDEX) and no stats about distribution of the table contents. Also, I'd be inclined to try replacing the EXISTS with an IN test; in recent PG versions the planner is generally smarter about IN. (Is there a reason why you are doing the INSERT one way and the DELETE the other?) BTW, that TRUNCATE right before the DROP seems quite useless, although it's not the main source of your problem. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Scrub one large table against another
Brendan Curran <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? > FIRST INSERT (Just the select is explained): EXPLAIN ANALYZE, please, not just EXPLAIN. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Scrub one large table against another
Tom Lane wrote: Brendan Curran <[EMAIL PROTECTED]> writes: Tom Lane wrote: Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? FIRST INSERT (Just the select is explained): EXPLAIN ANALYZE, please, not just EXPLAIN. regards, tom lane Sorry, here is the EXPLAIN ANALYZE output of that first SELECT EXPLAIN ANALYZE SELECT email_record_id from ONLY email_record er WHERE email_list_id = 13 AND email IN (select email from suppress); Hash Join (cost=8359220.68..9129843.00 rows=800912 width=8) (actual time=2121601.603..2121601.603 rows=0 loops=1) Hash Cond: (("outer".email)::text = ("inner".email)::text) -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) (actual time=1165955.907..1434439.731 rows=21646261 loops=1) -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) (actual time=1165955.903..1384667.715 rows=21646261 loops=1) Sort Key: suppress.email -> Seq Scan on suppress (cost=0.00..393024.60 rows=21646260 width=25) (actual time=37.784..609848.551 rows=21646261 loops=1) -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual time=554522.983..554522.983 rows=3245336 loops=1) -> Bitmap Heap Scan on email_record er (cost=38464.83..3899868.47 rows=4606808 width=32) (actual time=275640.435..541342.727 rows=3245336 loops=1) Recheck Cond: (email_list_id = 13) -> Bitmap Index Scan on list (cost=0.00..38464.83 rows=4606808 width=0) (actual time=275102.037..275102.037 rows=5172979 loops=1) Index Cond: (email_list_id = 13) Total runtime: 2122693.864 ms So much time is being spent in the Unique and Sort leaves... I would think that it wouldn't need to do the unique portion, since there is no DISTINCT clause... ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Simple join optimized badly?
Jim C. Nasby wrote: (snippage)... but we'll never get any progress so long as every time hints are brought up the response is that they're evil and should never be in the database. I'll also say that a very simple hinting language (ie: allowing you to specify access method for a table, and join methods) would go a huge way towards enabling app developers to get stuff done now while waiting for all these magical optimizer improvements that have been talked about for years. It is possibly because some of us feel they are evil :-) (can't speak for the *real* Pg developers, just my 2c here) As for optimizer improvements well, yeah we all want those - but the basic problem (as I think Tom stated) is the developer resources to do them. As an aside this applies to hints as well - even if we have a patch to start off with - look at how much time bitmap indexes have been worked on to get them ready for release Personally I don't agree with the oft stated comment along the lines of "we will never get the optimizer to the point where it does not need some form of hinting" as: 1/ we don't know that to be a true statement, and 2/ it is kind of admitting defeat on a very interesting problem, when in fact a great deal of progress has been made to date, obviously by people who believe it is possible to build a "start enough" optimizer. best wishes Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simple join optimized badly?
Mark Kirkwood wrote: who believe it is possible to build a "start enough" optimizer. That's meant to read "smart enough" optimizer .. sorry. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Scrub one large table against another
On Tue, Oct 10, 2006 at 05:46:18PM -0600, Brendan Curran wrote: > > > Tom Lane wrote: > >Brendan Curran <[EMAIL PROTECTED]> writes: > >>Tom Lane wrote: > >>>Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps? > > > >>FIRST INSERT (Just the select is explained): > > > >EXPLAIN ANALYZE, please, not just EXPLAIN. > > > > regards, tom lane > > > > Sorry, here is the EXPLAIN ANALYZE output of that first SELECT > > EXPLAIN ANALYZE SELECT email_record_id from ONLY email_record er > WHERE email_list_id = 13 AND email IN > (select email from suppress); > > Hash Join (cost=8359220.68..9129843.00 rows=800912 width=8) (actual > time=2121601.603..2121601.603 rows=0 loops=1) > Hash Cond: (("outer".email)::text = ("inner".email)::text) > -> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25) > (actual time=1165955.907..1434439.731 rows=21646261 loops=1) > -> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25) > (actual time=1165955.903..1384667.715 rows=21646261 loops=1) > Sort Key: suppress.email > -> Seq Scan on suppress (cost=0.00..393024.60 > rows=21646260 width=25) (actual time=37.784..609848.551 rows=21646261 > loops=1) > -> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32) (actual > time=554522.983..554522.983 rows=3245336 loops=1) > -> Bitmap Heap Scan on email_record er > (cost=38464.83..3899868.47 rows=4606808 width=32) (actual > time=275640.435..541342.727 rows=3245336 loops=1) > Recheck Cond: (email_list_id = 13) > -> Bitmap Index Scan on list (cost=0.00..38464.83 > rows=4606808 width=0) (actual time=275102.037..275102.037 rows=5172979 > loops=1) > Index Cond: (email_list_id = 13) > Total runtime: 2122693.864 ms > > > So much time is being spent in the Unique and Sort leaves... I would > think that it wouldn't need to do the unique portion, since there is no > DISTINCT clause... I think that's coming about because of the IN. Try a simple join instead... SELECT email_record_id FROM ONLY email_record er JOIN suppress s USING (email) WHERE er.email_list_id = 13; -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Simple join optimized badly?
-- tom lane wrote - "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > I'd rather have the ugly solution sooner rather than the elegant one > later (if ever). The trouble with that is that we couldn't ever get rid of it, and we'd be stuck with backward-compatibility concerns with the first (over simplified) design. It's important to get it right the first time, at least for stuff that you know perfectly well is going to end up embedded in application code. regards, tom lane --- I agree that it's important to get it right the first time. It's also important that my queries use the right index NOW. It's no use to me if my queries run efficiently in the next release when I am running those queries right now. Hints would allow me to do that. What would it take for hints to be added to postgres? If someone designed a hint system that was powerful and flexible, and offered to implement it themselves, would this be sufficient? This would address the concerns of having a "bad" hint system, and also the concern of time being better spent on other things. I want to know if the other objections to hints, such as hints being left behind after an improvement to the optimizer, would also be an issue. I don't see this objection as significant, as people are already using ad hoc hacks where they would otherwise use hints. The other reason I don't accept this objection is that people who care about performance will review their code after every DBMS upgrade, and they will read the release notes :) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simple join optimized badly?
Brian Herlihy <[EMAIL PROTECTED]> writes: > What would it take for hints to be added to postgres? A *whole lot* more thought and effort than has been expended on the subject to date. Personally I have no use for the idea of "force the planner to do exactly X given a query of exactly Y". You don't have exactly Y today, tomorrow, and the day after (if you do, you don't need a hint mechanism at all, you need a mysql-style query cache). IMHO most of the planner mistakes we see that could be fixed via hinting are really statistical estimation errors, and so the right level to be fixing them at is hints about how to estimate the number of rows produced for given conditions. Mind you that's still a plenty hard problem, but you could at least hope that a hint of that form would be useful for more than one query. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Scrub one large table against another
Brendan Curran <[EMAIL PROTECTED]> writes: > So much time is being spent in the Unique and Sort leaves... I would > think that it wouldn't need to do the unique portion, since there is no > DISTINCT clause... There's nothing in that query suggesting that suppress.email is unique. If you know that it is, try using a plain join instead of an IN. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simple join optimized badly?
--- Tom Lane <[EMAIL PROTECTED]> wrote: > Personally I have no use for the idea of "force the planner to do > exactly X given a query of exactly Y". You don't have exactly Y > today, tomorrow, and the day after (if you do, you don't need a > hint mechanism at all, you need a mysql-style query cache). I don't agree here. I have "exactly Y" running millions of times daily. There's enough data that the statistics on specific values don't help all that much, even at the maximum statistics collection level. By "exactly Y" I mean the form of the query is identical, and the query plan is identical, since only the general statistics are being used for most executions of the query. The specific values vary, so caching is no help. In summary, I have a need to run "exactly Y" with query plan "exactly X". (detail in postscript) > IMHO most of the planner mistakes we see that could be fixed via > hinting are really statistical estimation errors, and so the right > level to be fixing them at is hints about how to estimate the number > of rows produced for given conditions. Do you mean something like "The selectivity of these two columns together is really X"? That would solve my specific problem. And the academic part of me likes the elegance of that solution. On the negative side, it means people must learn how the optimizer uses statistics (which I would never have done if I could have said "Use index X"). > Mind you that's still a plenty > hard problem, but you could at least hope that a hint of that form > would be useful for more than one query. Yes it would be useful for more than one query. I agree that it's the "right" level to hint at, in that it is at a higher level. Maybe the right level is not the best level though? In a business environment, you just want things to work, you don't want to analyze a problem all the way through and find the best, most general solution. As a former academic I understand the two points of view, and I don't think either is correct or wrong. Each view has its place. Since I work for a business now, my focus is on making quick fixes that keep the system running smoothly. Solving problems in the "right" way is not important. If the query slows down again later, we will examine the query plan and do whatever we have to do to fix it. It's not elegant, but it gives fast response times to the customers, and that's what matters. PS The case in question is a table with a 3-column primary key on (A, B, C). It also has an index on (B, C). Re-ordering the primary key doesn't help as I do lookups on A only as well. When I specify A, B and C (the primary key), the optimizer chooses the (B, C) index, on the assumption that specifying these two values will return only 1 row. But high correlation between B and C leads to 100s of rows being returned, and the query gets very slow. The quick fix is to say "Use index (A, B, C)". The statistics level fix would be to say "B and C really have high correlation". ---(end of broadcast)--- TIP 6: explain analyze is your friend