[PERFORM] SubQuery Performance
Hi All, I have a poor performance SQL as following. The table has about 200M records, each employee have average 100 records. The query lasts about 3 hours. All I want is to update the flag for highest version of each client's record. Any suggestion is welcome! Thanks, Mike SQL=== update empTbl A set flag=1 where rec_ver = ( select max(rec_ver) from empTbl where empNo = A.empNo) ===Table empTbl= empTbl { int empNo; int flag; char[256] empDesc; int rec_ver; } -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Vacuum and Memory Loss
Hello friends, I am responsible for maintaining a high volume website using postgresql 8.1.4. Given the amount of reads and writes, I vacuum full the server a few times a week around 1, 2 AM shutting down the site for a few minutes. The next day morning around 10 - 11 AM the server slows down to death. It used to be that the error 'Too many clients' would be recorded, until I increased the number of clients it can handle, and now it simply slows down to death having lots and lots of postmaster processes running: Tasks: 665 total, 10 running, 655 sleeping, 0 stopped, 0 zombie Cpu(s): 14.9% us, 16.7% sy, 0.0% ni, 0.0% id, 68.4% wa, 0.0% hi, 0.0% si Mem: 2074932k total, 2051572k used,23360k free, 2736k buffers Swap: 2096440k total, 188k used, 251992k free, 102968k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6420 postgres 15 0 26912 11m 10m R 3.6 0.6 0:00.11 postmaster 6565 postgres 16 0 26912 11m 10m S 3.6 0.6 0:00.12 postmaster 6707 postgres 15 0 26912 11m 10m S 3.3 0.6 0:00.10 postmaster 6715 postgres 15 0 26912 11m 10m S 3.3 0.6 0:00.11 postmaster 6765 postgres 15 0 26912 11m 10m S 3.3 0.6 0:00.11 postmaster 6147 postgres 15 0 26912 11m 10m R 3.0 0.6 0:00.15 postmaster 6311 postgres 15 0 26904 11m 10m R 3.0 0.6 0:00.10 postmaster 6551 postgres 15 0 26912 11m 10m R 3.0 0.6 0:00.09 postmaster 6803 postgres 16 0 26912 11m 10m R 3.0 0.6 0:00.09 postmaster 6255 postgres 15 0 26904 11m 10m R 2.6 0.6 0:00.14 postmaster 6357 postgres 15 0 26912 11m 10m R 2.6 0.6 0:00.11 postmaster 6455 postgres 15 0 26912 11m 10m S 2.6 0.6 0:00.10 postmaster 6457 postgres 15 0 26912 11m 10m S 2.6 0.6 0:00.11 postmaster 6276 postgres 15 0 26912 11m 10m S 2.3 0.6 0:00.10 postmaster 6475 postgres 15 0 26912 11m 10m R 2.3 0.6 0:00.11 postmaster 6868 postgres 15 0 26912 11m 10m S 2.3 0.6 0:00.07 postmaster 6891 postgres 15 0 26912 11m 10m S 1.3 0.6 0:00.19 postmaster Thanks for your help in advance, Mike ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How would you store read/unread topic status?
So your update doesn't take long to run during off-peak times, so basically your options are: 1. Optimize your postgresql.conf settings or upgrade to the latest version of PostgreSQL. 2. Redesign your forum code so it can scale better. 3. Upgrade your servers hardware as it may be overloaded. I would probably attack those in the order I described. As far as redesigning your forum code, keep in mind that in PostgreSQL an update is basically a select, delete, insert in a single statement. First it needs to find the rows to update, it marks the rows for deletion (which vacuum later does) and inserts a new row. So updates can be quite expensive. In SOME situations, it can be faster to do inserts only, and modify your select query to get just the data you need, for example: Rather then an update like this: update set LastReadAnswerID = where UserID = AND TopicID = You could do this instead: insert into VALUES(,,) Then just modify your select statement slightly to get the last inserted row: select * from where user_id = AND topic_id = order by LastReadAnswerID DESC LIMIT 1 This makes your select statement slightly more expensive but your insert statement pretty much as cheap as possible. Since its much easier to cache select results you could easily wrap some caching mechanism around your select query to reduce the load there too. Then using a task scheduler like cron simply clear out old rows from the table you insert into every minute, 5 minutes, hour, day, whatever makes most sense to keep the select queries fast. A memcached solution would probably be much better, but its also likely much more involved to do. On Tue, 23 Jun 2009 17:50:50 +0200 Mathieu Nebra wrote: > Robert Haas a écrit : > Which pg version are you using? > >> I should have mentionned that before sorry: PostgreSQL 8.2 > > > > I think there is an awful lot of speculation on this thread about > > what your problem is without anywhere near enough investigation. A > > couple of seconds for an update is a really long time, unless your > > server is absolutely slammed, in which case probably everything is > > taking a long time. We need to get some more information on what > > is happening here. > > You're right, I'll give you the information you need. > > > Approximately how many requests per second are you servicing? > > Also, > > How can I extract this information from the database? I know how to > use pg_stat_user_tables. My table has: > > seq_tup_read > 133793491714 > > idx_scan > 12408612540 > > idx_tup_fetch > 41041660903 > > n_tup_ins > 14700038 > > n_tup_upd > 6698236 > > n_tup_del > 15990670 > > > can you: > > > > 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post > > the exact query and the output. > > "Index Scan using prj_frm_flg_pkey on prj_frm_flg (cost=0.00..8.58 > rows=1 width=18)" > " Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))" > > This time it only took 54ms, but maybe it's already a lot. > > > > > > 2. Run VACUUM VERBOSE on your database and send the last 10 lines or > > so of the output. > > It's not very long, I can give you the whole log: > > INFO: vacuuming "public.prj_frm_flg"INFO: scanned index > "prj_frm_flg_pkey" to remove 74091 row versions > DETAIL: CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO: scanned index > "flg_fav" to remove 74091 row versions > DETAIL: CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO: scanned index > "flg_notif" to remove 74091 row versions > DETAIL: CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO: scanned index > "flg_post" to remove 74091 row versions > DETAIL: CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO: scanned index > "flg_no_inter" to remove 74091 row versions > DETAIL: CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO: "prj_frm_flg": > removed 74091 row versions in 5979 pages > DETAIL: CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO: index > "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages > DETAIL: 63153 index row versions were removed. > 672 index pages have been deleted, 639 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_fav" now > contains 1315895 row versions in 18228 pages > DETAIL: 73628 index row versions were removed. > 21 index pages have been deleted, 16 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_notif" now > contains 1315895 row versions in 18179 pages > DETAIL: 73468 index row versions were removed. > 22 index pages have been deleted, 13 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_post" now > contains 1315895 row versions in 18194 pages > DETAIL: 73628 index row versions were removed. > 30 index pages have been deleted, 23 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_no_inter" now > contains 1315895 row versions in 8596 pages > DETAIL: 73628 index row versions were removed. > 13 index pages have been deleted, 8 are currently reusable. > C
Re: [PERFORM] How would you store read/unread topic status?
So your update doesn't take long to run during off-peak times, so basically your options are: 1. Optimize your postgresql.conf settings or upgrade to the latest version of PostgreSQL. 2. Redesign your forum code so it can scale better. 3. Upgrade your servers hardware as it may be overloaded. I would probably attack those in the order I described. As far as redesigning your forum code, keep in mind that in PostgreSQL an update is basically a select, delete, insert in a single statement. First it needs to find the rows to update, it marks the rows for deletion (which vacuum later does) and inserts a new row. So updates can be quite expensive. In SOME situations, it can be faster to do inserts only, and modify your select query to get just the data you need, for example: Rather then an update like this: update set LastReadAnswerID = where UserID = AND TopicID = You could do this instead: insert into VALUES(,,) Then just modify your select statement slightly to get the last inserted row: select * from where user_id = AND topic_id = order by LastReadAnswerID DESC LIMIT 1 This makes your select statement slightly more expensive but your insert statement pretty much as cheap as possible. Since its much easier to cache select results you could easily wrap some caching mechanism around your select query to reduce the load there too. Then using a task scheduler like cron simply clear out old rows from the table you insert into every minute, 5 minutes, hour, day, whatever makes most sense to keep the select queries fast. A memcached solution would probably be much better, but its also likely much more involved to do. On Tue, 23 Jun 2009 17:50:50 +0200 Mathieu Nebra wrote: > Robert Haas a écrit : > Which pg version are you using? > >> I should have mentionned that before sorry: PostgreSQL 8.2 > > > > I think there is an awful lot of speculation on this thread about > > what your problem is without anywhere near enough investigation. A > > couple of seconds for an update is a really long time, unless your > > server is absolutely slammed, in which case probably everything is > > taking a long time. We need to get some more information on what > > is happening here. > > You're right, I'll give you the information you need. > > > Approximately how many requests per second are you servicing? > > Also, > > How can I extract this information from the database? I know how to > use pg_stat_user_tables. My table has: > > seq_tup_read > 133793491714 > > idx_scan > 12408612540 > > idx_tup_fetch > 41041660903 > > n_tup_ins > 14700038 > > n_tup_upd > 6698236 > > n_tup_del > 15990670 > > > can you: > > > > 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post > > the exact query and the output. > > "Index Scan using prj_frm_flg_pkey on prj_frm_flg (cost=0.00..8.58 > rows=1 width=18)" > " Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))" > > This time it only took 54ms, but maybe it's already a lot. > > > > > > 2. Run VACUUM VERBOSE on your database and send the last 10 lines or > > so of the output. > > It's not very long, I can give you the whole log: > > INFO: vacuuming "public.prj_frm_flg"INFO: scanned index > "prj_frm_flg_pkey" to remove 74091 row versions > DETAIL: CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO: scanned index > "flg_fav" to remove 74091 row versions > DETAIL: CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO: scanned index > "flg_notif" to remove 74091 row versions > DETAIL: CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO: scanned index > "flg_post" to remove 74091 row versions > DETAIL: CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO: scanned index > "flg_no_inter" to remove 74091 row versions > DETAIL: CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO: "prj_frm_flg": > removed 74091 row versions in 5979 pages > DETAIL: CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO: index > "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages > DETAIL: 63153 index row versions were removed. > 672 index pages have been deleted, 639 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_fav" now > contains 1315895 row versions in 18228 pages > DETAIL: 73628 index row versions were removed. > 21 index pages have been deleted, 16 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_notif" now > contains 1315895 row versions in 18179 pages > DETAIL: 73468 index row versions were removed. > 22 index pages have been deleted, 13 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_post" now > contains 1315895 row versions in 18194 pages > DETAIL: 73628 index row versions were removed. > 30 index pages have been deleted, 23 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_no_inter" now > contains 1315895 row versions in 8596 pages > DETAIL: 73628 index row versions were removed. > 13 index pages have been deleted, 8 are currently reusable. >
Re: [PERFORM] How to speed up word count in tsearch2?
On Apr 1, 2005 4:03 AM, Yudie Pg <[EMAIL PROTECTED]> wrote: > > You need to look at what else is using RAM on that machine. And maybe buy > > more. > > Ouch.. I had that feeling also. then how can I know how much memory > needed for certain amount words? and why counting uncommon words are > faster than common one? Because the index is a tree. You fall of the end of a branch faster with uncommon words. Plus the executor goes back to the table for fewer real rows with uncommon words. It sounds like you may just need a faster disk subsystem. That would shrink the time for the first query on any particular set of words, and it would make everything else faster as a nice side effect. What does your disk layout look like now? -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Sustained inserts per sec ... ?
> Just curious, but does anyone have an idea of what we are capable of? I > realize that size of record would affect things, as well as hardware, but > if anyone has some ideas on max, with 'record size', that would be > appreciated ... Well, I just did an insert of 27,500 records with 9 fields, averaging around 118 bytes per record, each insert statement coming from a separate SQL statement fed to psql, and it took a bit over 4 minutes, or about 106 inserts per second. That seems consistent with what I get when I do a restore of a dump file that has insert statement instead of COPY. The hardware is a Dell dual Xeon system, the disks are mirrored SATA drives with write buffering turned off. -- Mike Nolan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Sustained inserts per sec ... ?
> > Well, I just did an insert of 27,500 records with 9 fields, averaging > > around 118 bytes per record, each insert statement coming from a separate > > SQL statement fed to psql, and it took a bit over 4 minutes, or about > > 106 inserts per second. > > Is that with a separate transaction for each insert command? I can get > significantly higher rates on my devel machine if the inserts are > bundled into transactions of reasonable length. That's with autocommit on. If I do it as a single transaction block, it takes about 6.5 seconds, which is about 4200 transactions/second. -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sustained inserts per sec ... ?
If I'm getting the point of this thread correctly, have a huge amount of data in one table degrades INSERT/COPY performance even with just a PKEY index. If that's about the size of it, read on. If not, ignore me because I missed something. On Apr 4, 2005 10:44 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > Before I start to tunnel-vision on a particular coincidence... > Don't worry too much about tunnel vision. I see the same thing every day with multi-million row tables. The bigger the table gets (with only a pkey index) the slower the inserts go. If I start over (truncate, drop/create table), or if I point the initial load at a new table, everything gets speedy. I've always figured it was a function of table size and learned to live with it... > How much memory have you got on the system? On mine, 16G > How much of that have you allocated to various tasks? shared buffers: 15000 > What else is happening on your system? Nothing on mine. > Tell us more about disk set-up and other hardware related things. 6-disk RAID10 on a Compaq SmartArray 6404 with 256M BB cache, WAL on 2-disk mirror on built in SmartArray5 controller. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?)
On 4/20/05, Anjan Dave <[EMAIL PROTECTED]> wrote: > In terms of vendor specific models - > > Does anyone have any good/bad experiences/recommendations for a 4-way > Opteron from Sun (v40z, 6 internal drives) or HP (DL585 5 internal > drives) models? We are going with the 90nm HPs for production. They "feel" like beefier boxes than the Suns, but the Suns cost a LOT less, IIRC. We're only using the internal drives for the OS. PG gets access to a fibre-channel array, HP StorageWorks 3000. I _can't wait_ to get this in. Our dev box is a 130nm DL585 with 16G of RAM and an HP SCSI array, and I have absolutely zero complaints. :) -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Table Partitioning: Will it be supported in Future?
On 4/26/05, Mohan, Ross <[EMAIL PROTECTED]> wrote: > Maybe he needs to spend $7K on performance improvements? > > ;-) > AAARRRGGG! I will forever hate the number 7,000 from this day forth! Seriously, though, I've never seen a thread on any list wander on so aimlessly for so long. Please, mommy, make it stop! -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM]
On 5/3/05, Tom Lane <[EMAIL PROTECTED]> wrote: > Steven Rosenstein <[EMAIL PROTECTED]> writes: > > My question is, are there any advantages, drawbacks, or outright > > restrictions to using multiple simultaneous COPY commands to load data into > > the same table? > > It will work; not sure about whether there is any performance benefit. > I vaguely recall someone having posted about doing this, so you might > check the archives. > I may be one of Tom's vague "voices". ;) The only issue would be that you need to remove all you UNIQUE constraints before sending multiple COPYs to the server. This includes the PRIMARY KEY constraint. To the backend, COPY is just like INSERT and all constraints need to be checked and this will block the commit of one of the COPY streams. However, multiple COPYs may no be needed. I regularly load several table totaling around 50M rows with a single COPY per table. I drop (actually, this is during DB reload, so I don't yet create...) all fkeys, constraints and indexes and the data loads in a matter of 5 minutes or so. Hope that helps! -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Table stats
Hello, I have a table collecting stats that shows 5 Index Tuples Fetched but no Index Scans. Should there not be at least one Index Scan showing in the stats? Mike ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] ok you all win what is best opteron (I dont want a hosed system again)
> This can often be called for. I'm working on a 400GB data warehouse right > now, and almost *all* of our queries run from materialized aggregate tables. I thought that was pretty much the definition of data warehousing! :-) -- Mike Nolan ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgresql on an AMD64 machine
On 06 Jun 2005 12:53:40 -0500, Mark Rinaudo <[EMAIL PROTECTED]> wrote: > I'm not sure if this is the appropriate list to post this question to > but i'm starting with this one because it is related to the performance > of Postgresql server. I have a Penguin Computing dual AMD 64 bit > opteron machine with 8 Gigs of memory. In my attempt to increase the > number of shared_buffers from the default to 65000 i was running into a > semget error when trying to start Postgresql. After reading the > documentation I adjusted the semaphore settings in the kernel to allow > Postgresql to start successfully. With this configuration running if I > do a ipcs -u i get the following. On my HP-585, 4xOpteron, 16G RAM, Gentoo Linux (2.6.9): $ ipcs -u i -- Shared Memory Status segments allocated 1 pages allocated 34866 pages resident 31642 pages swapped 128 Swap performance: 0 attempts 0 successes -- Semaphore Status used arrays = 7 allocated semaphores = 119 -- Messages: Status allocated queues = 0 used headers = 0 used space = 0 bytes Did you perhaps disable spinlocks when compiling PG? -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Advice on RAID card
On 9/25/05, Dave Cramer <[EMAIL PROTECTED]> wrote: > I would think software raid would be quite inappropriate considering > postgres when it is working is taking a fair amount of CPU as would > software RAID. Does anyone know if this is really the case ? > I attempted to get some extra speed out of my Compaq/HP SA6404 card by using software RAID1 across to hardware RAID10 sets. It didn't help, but there was no noticeable load or drop in performance because of it. Granted, this was on a 4-way Opteron, but, anecdotally speaking, the linux software RAID has surprisingly low overhead. My $0.02, hope it helps. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Table Partitions / Partial Indexes
Hello,I've got a table with ~60 Million rows and am having performance problems querying it. Disks are setup as 4x10K SCSI 76GB, RAID 1+0. The table is being inserted into multiple times every second of the day, with no updates and every 2nd day we delete 1/60th of the data (as it becomes old). Vacuum analyze is scheduled to run 3 times a day.Query:select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 descTable:CREATE TABLE PC_TRAFFIC ( PK_PC_TRAFFIC INTEGER NOT NULL, TRAFFIC_DATE TIMESTAMP NOT NULL, SOURCE_MAC CHAR(20) NOT NULL, DEST_IPCHAR(15), DEST_PORT INTEGER, TOTAL_TO DOUBLE PRECISION, TOTAL_FROM DOUBLE PRECISION, FK_DEVICE SMALLINT, PROTOCOL_TYPE SMALLINT); CREATE INDEX pc_traffic_pkidx ON pc_traffic (pk_pc_traffic);CREATE INDEX pc_traffic_idx3 ON pc_traffic (fk_device, traffic_date);Plan:Sort (cost=76650.58..76650.58 rows=2 width=40) Sort Key: sum(total_from) -> HashAggregate (cost=76650.54..76650.57 rows=2 width=40) -> Bitmap Heap Scan on pc_traffic (cost=534.64..76327.03 rows=43134 width=40) Recheck Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone)) -> Bitmap Index Scan on pc_traffic_idx3 (cost=0.00..534.64 rows=43134 width=0) Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))(7 rows)CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance: Sort (cost=39886.65..39886.66 rows=2 width=40) Sort Key: sum(total_from) -> HashAggregate (cost=39886.61..39886.64 rows=2 width=40) -> Index Scan using pc_traffic_idx3 on pc_traffic (cost=0.00..39551.26 rows=44714 width=40) Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))(5 rows)However the clustering is only effective on the first shot. Because of the constant usage of the table we can't perform a vacuum full nor any exclusive lock function.Would table partitioning/partial indexes help much? Partitioning on date range doesn't make much sense for this setup, where a typical 1-month query spans both tables (as the billing month for the customer might start midway through a calendar month).Noting that the index scan was quicker than the bitmap, I'm trying to make the indexes smaller/more likely to index scan. I have tried partitioning against fk_device, with 10 child tables. I'm using fk_device % 10 = 1, fk_device % 10 = 2, fk_device % 10 = 3, etc... as the check constraint.CREATE TABLE pc_traffic_0 (CHECK(FK_DEVICE % 10 = 0)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_1 (CHECK(FK_DEVICE % 10 = 1)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_2 (CHECK(FK_DEVICE % 10 = 2)) INHERITS (pc_traffic); CREATE TABLE pc_traffic_3 (CHECK(FK_DEVICE % 10 = 3)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_4 (CHECK(FK_DEVICE % 10 = 4)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_5 (CHECK(FK_DEVICE % 10 = 5)) INHERITS (pc_traffic); CREATE TABLE pc_traffic_6 (CHECK(FK_DEVICE % 10 = 6)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_7 (CHECK(FK_DEVICE % 10 = 7)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_8 (CHECK(FK_DEVICE % 10 = 8)) INHERITS (pc_traffic); CREATE TABLE pc_traffic_9 (CHECK(FK_DEVICE % 10 = 9)) INHERITS (pc_traffic);... indexes now look like:CREATE INDEX pc_traffic_6_idx3 ON pc_traffic_6 (fk_device, traffic_date);To take advantage of the query my SQL now has to include the mod operation (so the query planner picks up the correct child tables):select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and FK_DEVICE % 10 = 6 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 descSorry I would show the plan but I'm rebuilding the dev database atm. It was faster though and did pick up the correct child table. It was also a bitmap scan on the index IIRC.Would I be better off creating many partial indexes instead of multiple tables AND multiple indexes?Am I using a horrid method for partitioning the data? (% 10) Should there be that big of an improvement for multiple tables given that all the data is still stored on the same filesystem? Any advice on table splitting much appreciated. Cheers,Mike C.
Re: [PERFORM] Table Partitions / Partial Indexes
On 12/12/05, Tom Lane <[EMAIL PROTECTED]> wrote: Mike C <[EMAIL PROTECTED]> writes:> CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance:How can you tell? Neither of these are EXPLAIN ANALYZE output. regards, tom lane Sorry that's a result of my bad record keeping. I've been keeping records of the explain but not the analyze. IIRC the times dropped from ~25 seconds down to ~8 seconds (using analyze). Regards, Mike
Re: [PERFORM] Table Partitions / Partial Indexes
On 12/14/05, Simon Riggs <[EMAIL PROTECTED]> wrote: Maybe not for queries, but if you use a date range then you never needto run a DELETE and never need to VACUUM.You could split the data into two-day chunks. That's an interesting idea, thanks. > Am I using a horrid method for partitioning the data? (% 10)No, but what benefit do you think it provides. I'm not sure I see... I was trying to get both the indexes to be smaller without loosing selectivity, and make any table scans/index scans faster from having to read less data. > Should there be that big of an improvement for multiple tables given> that all the data is still stored on the same filesystem? You could store partitions in separate tablespaces/filesystems. Ideally that's what I would do, but to make the most of that I would have to have a dedicated RAID setup for each partition right? (Which is a bit pricey for the budget). Cheers, Mike
[PERFORM] strange issue for certain queries
int. geqo| on | Enables genetic query optimization. geqo_effort | 5 | GEQO: effort is used to set the default for other GEQO parameters. geqo_generations| 0 | GEQO: number of iterations of the algorithm. geqo_pool_size | 0 | GEQO: number of individuals in the population. geqo_selection_bias | 2 | GEQO: selective pressure within the population. geqo_threshold | 12 join_collapse_limit | 8 server_encoding | SQL_ASCII | Sets the server (database) character set encoding. server_version | 8.1.4 | Shows the server version. shared_buffers | 400 vacuum_cost_delay | 0 | Vacuum cost delay in milliseconds. vacuum_cost_limit | 200 | Vacuum cost amount available before napping. vacuum_cost_page_dirty | 20 | Vacuum cost for a page dirtied by vacuum. vacuum_cost_page_hit| 1 | Vacuum cost for a page found in the buffer cache. vacuum_cost_page_miss | 10 | Vacuum cost for a page not found in the buffer cache. wal_buffers | 8 | Sets the number of disk- page buffers in shared memory for WAL. wal_sync_method | fsync | Selects the method used for forcing WAL updates out to disk. work_mem| 1024| Sets the maximum memory to be used for query workspaces. zero_damaged_pages | off Thanks in advance for any help you can offer on this problem. -Mike ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Fwd: [PERFORM] Not Picking Index
This is very similar to the problem I posted to this list yesterday. Apparently, if you append an empty string to the column data in your WHERE clause it will force the planer to treat it as a filter and not an index cond. It's extremely ugly, but this method doesn't seem to be anymore elegant. -Mike On Feb 16, 2007, at 9:46 AM, Alvaro Herrera wrote: Gauri Kanekar escribió: I want the planner to ignore a specific index. I am testing some query output. For that purpose i dont want the index. I that possible to ignore a index by the planner. Sure: BEGIN DROP INDEX foo SELECT ROLLBACK -- Alvaro Herrerahttp:// www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Beginner Question
Yeah, I have a lot of similar problems where an index that I have to speed up one query is used in another query where it actually slows it down. Is there any way to ignore indexes for certain queries? We've been appending empty strings and adding zero's to the column data to force it into a filter, but it's a messy hack. I've tried ordering the joins in the the most efficent way with a join_collapse_limit of 1, but it still does uses this index in parallel with searching an index on another table (i guess the planner figures it's saving some time up front). -Mike On Apr 9, 2007, at 8:45 PM, s d wrote: Hi Jan, Adding this Index slowed down things by a factor of 4. Also, the performance is so horrible (example bellow) that i am certain i am doing something wrong. Does the following explain gives any ideas ? Thanks =# EXPLAIN ANALYZE select * from word_association where (word1 ='the' or word2='the') and count > 10; QUERY PLAN -- -- Bitmap Heap Scan on word_association (cost=250.86..7256.59 rows=4624 width=22) (actual time=13.461..211.568 rows=6601 loops=1) Recheck Cond: (((word1)::text = 'the'::text) OR ((word2)::text = 'the'::text)) Filter: (count > 10) -> BitmapOr (cost=250.86..250.86 rows=12243 width=0) (actual time=9.052..9.052 rows=0 loops=1) -> Bitmap Index Scan on word_association_index1_1 (cost=0.00..153.20 rows=7579 width=0) (actual time=5.786..5.786 rows=7232 loops=1) Index Cond: ((word1)::text = 'the'::text) -> Bitmap Index Scan on word_association_index2_1 (cost=0.00..95.34 rows=4664 width=0) (actual time=3.253..3.253 rows=4073 loops=1) Index Cond: ((word2)::text = 'the'::text) Total runtime: 219.987 ms (9 rows) On 4/9/07, Jan de Visser <[EMAIL PROTECTED]> wrote: On Monday 09 April 2007 05:09:53 s d wrote: > Hi, > I am trying to figure out how to debug a performance problem / use psql > explain. The table in question is: > # \d word_association; >Table "public.word_association" > Column | Type | Modifiers > ++ > word1 | character varying(128) | not null > word2 | character varying(128) | not null > count | integer| not null default 0 > Indexes: > "word1_word2_comb_unique" unique, btree (word1, word2) > "word1_hash_index" hash (word1) > "word2_hash_index" hash (word2) > "word_association_count_index" btree (count) > "word_association_index1_1" btree (word1) > "word_association_index2_1" btree (word2) > > It has multiple indices since i wanted to see which one the planner choses. > > > # explain select * FROM word_association WHERE (word1 = 'bdss' OR > word2 = 'bdss') AND count >= 10; >QUERY PLAN > - -- >- Bitmap Heap Scan on word_association > (cost=11.53..1192.09 rows=155 width=22) Recheck Cond: (((word1)::text = > 'bdss'::text) OR ((word2)::text = 'bdss'::text)) >Filter: (count >= 10) >-> BitmapOr (cost=11.53..11.53 rows=364 width=0) > -> Bitmap Index Scan on word_association_index1_1 > (cost=0.00..5.79 rows=190 width=0) >Index Cond: ((word1)::text = 'bdss'::text) > -> Bitmap Index Scan on word_association_index2_1 > (cost=0.00..5.67 rows=174 width=0) >Index Cond: ((word2)::text = 'bdss'::text) > (8 rows) > > The questions: > 1. i can undestand where the cost=11.53 came from but where did the > 1192.09 come form? The values are in milli right ? > 2. the query takes in reality much longer than 1 second. > > In short, it feels like something is very wrong here (i tried vacuum > analyze and it didn't do much diff). > any ideas ? You need an index on (word1, word2, count). In your current setup it will have to scan all rows that satisfy word1 and word2 to see if count >= 10. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Volunteer to build a configuration tool
It would be cool if someone started a generic configuration+benchmark utility that could be used with virtually any software. Something like this: 1. Create a configuration file parser for your specific application, be it PostgreSQL, MySQL, Apache, whatever. 2. Create a min/max or X,Y,Z configuration option file that determines which options to try. ie: shared_buffers = 1000-2[1000] //1000 is the increment by wal_buffers = 8,16,32 ... 3. Create start/stop scripts for the specific application 4. Create a benchmark script for the application that returns relevant metrics. In PGSQL's case, it would be tied in to PG bench probably. In Apache's case AB. This utility would of course need to know how to read the metrics to determine what is "best". 5. Run the utility. Ideally it would use some sort of genetic algorithm to benchmark the application initially to get base numbers, then one-by-one apply the different configuration options and re-run the benchmark. It would output the metrics for each run and once it is done, pick the best run and let you know what those settings are. I don't think something like this would be very difficult at all to write, and it would be modular enough to work for virtually any application. For a database it would take a while to run depending on the benchmark script, but even that you could have a "fast" and "slow" benchmark script that could be easily run when you first install PostgreSQL. This way too your not worrying about how much memory the system has, or how many disks they have, etc... The system will figure out the best possible settings for a specific benchmark. Not to mention people could easily take a SQL log of their own application running, and use that as the benchmark to get "real world" numbers. Any other sort of configuration "suggestion" utility will always have the question of what do you recommend? How much data do you try to get and what can be determined from that data to get the best settings? Is it really going to be that much better then the default, at least enough better to warrant the work and effort put into it? On Mon, 2007-06-18 at 10:04 -0500, Campbell, Lance wrote: > I am a Java Software architect, DBA, and project manager for the > University of Illinois, Department of Web Services. We use PostgreSQL > to serve about 2 million pages of dynamic content a month; everything > from calendars, surveys, forms, discussion boards, RSS feeds, etc. I > am really impressed with this tool. > > > > The only major problem area I have found where PostgreSQL is really > lacking is in “what should my initial configuration settings be?” I > realize that there are many elements that can impact a DBA’s specific > database settings but it would be nice to have a “configuration tool” > that would get someone up and running better in the beginning. > > > > This is my idea: > > > > A JavaScript HTML page that would have some basic questions at the > top: > > 1) How much memory do you have? > > 2) How many connections will be made to the database? > > 3) What operating system do you use? > > 4) Etc… > > > > Next the person would press a button, “generate”, found below the > questions. The JavaScript HTML page would then generate content for > two Iframes at the bottom on the page. One Iframe would contain the > contents of the postgresql.conf file. The postgresql.conf settings > would be tailored more to the individuals needs than the standard > default file. The second Iframe would contain the default settings > one should consider using with their operating system. > > > > My web team would be very happy to develop this for the PostgreSQL > project. It would have saved us a lot of time by having a > configuration tool in the beginning. I am willing to make this a very > high priority for my team. > > > > Thanks, > > > > Lance Campbell > > Project Manager/Software Architect > > Web Services at Public Affairs > > University of Illinois > > 217.333.0382 > > http://webservices.uiuc.edu > > > > -- Mike Benoit <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Help optimize view
Oops. Realized I posted the wrong SQL and EXPLAIN ANALYZE results. Also forgot to mention that my "server" has 1.5 GB memory. SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ("tblColors" JOIN ("tblTPNamesAndColors" JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text JOIN ("DigitalImages" JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" JOIN ("ParameterNames" JOIN ("Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text; QUERY PLAN HashAggregate (cost=519801.96..519898.00 rows=7683 width=70) (actual time=106219.710..106249.456 rows=14853 loops=1) -> Hash Join (cost=286101.76..519667.51 rows=7683 width=70) (actual time=50466.513..106111.635 rows=15123 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Hash Join (cost=286099.98..519260.45 rows=87588 width=61) (actual time=50466.417..106055.182 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=143.017..55178.583 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.012..0.027 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70 rows=608089 width=12) (actual time=142.986..54432.650 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Bitmap Index Scan on "PVParameterID_idx" (cost=0.00..7902.79 rows=608089 width=0) (actual time=109.178..109.178 rows=289724 loops=1) Index Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Hash (cost=259861.12..259861.12 rows=1454724 width=48) (actual time=50306.950..50306.950 rows=961097 loops=1) -> Hash Join (cost=8139.75..259861.12 rows=1454724 width=48) (actual time=971.910..48649.190 rows=961097 loops=1) Hash Cond: ("Measurements"."psaID" = "PrintSampleAnalyses"."psaID") -> Seq Scan on "Measurements" (cost=0.00..199469.09 rows=7541009 width=12) (actual time=0.047..35628.599 rows=7539838 loops=1) -> Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual time=971.734..971.734 rows=18901 loops=1) -> Hash Join (cost=5069.24..7949.67 rows=15206 width=44) (actual time=590.003..938.744 rows=18901 loops=1) Hash Cond: ("PrintSampleAnalyses"."ImageID" = "DigitalImages"."ImageID") -> Seq Scan on "PrintSampleAnalyses" (cost=0.00..2334.25 rows=78825 width=8) (actual time=0.021..130.335 rows=78859 loops=1) -> Hash (cost=4879.10..4879.10 rows=15211 width=44) (actual time=589.940..589.940 rows=18901 loops=1) -> Hash Join (cost=2220.11..4879.10 rows=15211 width=44) (actual time=168.307..557.675 rows=18901 loops=1) Hash Cond: ("DigitalImages"."PrintSampleID" = "PrintSamples"."PrintSampleID") -> Seq Scan on "DigitalImages" (cost=0.00..1915.50 rows=78850 width=8) (actual time=16.126..194.911 rows=78859 loops=1) -> Hash (cost=2029.98..2029.98 rows=15211 width=44) (actual time=152.128..152.128 rows=18645 loops=1) -> Hash Join (cost=564.39..2029.98 rows=15211 width=44) (actual time=13.951..121.903 rows=18645 loops=1) Hash Cond: (("PrintSamples"."TestPatternName")::text = ("tblTPNamesAndColors"."TestPatternName")::text) -> Bitmap Heap Scan on "Print
Re: [PERFORM] Help optimize view
> -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Friday, August 10, 2007 5:44 PM > To: Relyea, Mike > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Help optimize view > > Try increasing join_collapse_limit --- you have just enough > tables here that the planner isn't going to consider all > possible join orders. > And it sorta looks like it's picking a bad one. > > regards, tom lane > I tried increasing join_collapse_limit with no significant change in run time although a different plan was chosen. I've included a re-send of my original post, it looks like it didn't go through - it's not in the archives. I've also included an explain analyze before and after the join_collapse_limit change. I'm have the following view as part of a larger, aggregate query that is running slower than I'd like. There are 4 views total, each very similar to this one. Each of the views is then left joined with data from some other tables to give me the final result that I'm looking for. I'm hoping that if I can get some insight in to how to make this view execute faster, I can apply that learning to the other 3 views and thereby decrease the run time for my aggregate query. I'm running 8.2.4 on Windows XP with a single 10K rpm disk dedicated to the data directory and 1.5 GB memory. shared_buffers = 12288 work_mem = 262144 maintenance_work_mem = 131072 max_fsm_pages = 204800 random_page_cost = 2.0 effective_cache_size = 1 autovacuum = on EXPLAIN ANALYZE SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ("tblColors" JOIN ("tblTPNamesAndColors" JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text JOIN ("DigitalImages" JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" JOIN ("ParameterNames" JOIN ("Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text; HashAggregate (cost=519801.96..519898.00 rows=7683 width=70) (actual time=121101.027..121146.385 rows=14853 loops=1) -> Hash Join (cost=286101.76..519667.51 rows=7683 width=70) (actual time=52752.600..120989.713 rows=15123 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Hash Join (cost=286099.98..519260.45 rows=87588 width=61) (actual time=52752.502..120933.784 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=165.510..67811.086 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.012..0.026 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70 rows=608089 width=12) (actual time=165.481..67094.656 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID")
Re: [PERFORM] Help optimize view
> >>> On Mon, Aug 13, 2007 at 10:35 AM, in message > <[EMAIL PROTECTED] > .net>, "Relyea, Mike" <[EMAIL PROTECTED]> wrote: > > I'm running 8.2.4 on Windows XP with 1.5 GB memory. > > shared_buffers = 12288 > > effective_cache_size = 1 > > For starters, you might want to adjust one or both of these. > It looks to me like you're telling it that it only has 78.125 > MB cache space. That will make it tend to want to scan > entire tables, on the assumption that the cache hit ratio > will be poor for random reads. > > Since you're on 8.2.4, you can use units of measure to help > make this easier to read. You could, for example, say: > > shared_buffers = 96MB > effective_cache_size = 1200MB > > -Kevin I've increased shared_buffers to 128MB, and restarted the server. My total run time didn't really change. SELECT set_config('effective_cache_size', '1000MB', false); I have another app that uses about 500MB. SELECT set_config('join_collapse_limit', '20', false); explain analyze SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ("tblColors" JOIN ("tblTPNamesAndColors" JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text JOIN ("DigitalImages" JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" JOIN ("ParameterNames" JOIN ("Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text; HashAggregate (cost=489274.71..489372.94 rows=7858 width=70) (actual time=117632.844..117663.228 rows=14853 loops=1) -> Hash Join (cost=256774.03..489137.20 rows=7858 width=70) (actual time=50297.022..117530.665 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=172.341..66959.288 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.020..0.034 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70 rows=608089 width=12) (actual time=172.297..66241.380 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Bitmap Index Scan on "PVParameterID_idx" (cost=0.00..7902.79 rows=608089 width=0) (actual time=147.690..147.690 rows=289724 loops=1) Index Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Hash (cost=247087.84..247087.84 rows=130510 width=57) (actual time=50109.022..50109.022 rows=15123 loops=1) -> Hash Join (cost=8141.52..247087.84 rows=130510 width=57) (actual time=11095.022..50057.777 rows=15123 loops=1) Hash Cond: ("Measurements"."psaID" = "PrintSampleAnalyses"."psaID") -> Hash Join (cost=1.77..234364.57 rows=661492 width=21) (actual time=31.457..48123.380 rows=289724 loop
Re: [PERFORM] Help optimize view
> >>> On Mon, Aug 13, 2007 at 1:48 PM, in message > <[EMAIL PROTECTED] > .net>, "Relyea, Mike" <[EMAIL PROTECTED]> wrote: > > I've increased shared_buffers to 128MB, and restarted the > server. My > > total run time didn't really change. > > Please forgive me if this guess doesn't help either, but > could you try eliminating the GROUP BY options which don't > echo values in the select value list, and move the HAVING > conditions to a WHERE clause? Something like: > > explain analyze > SELECT > "PrintSamples"."MachineID", > "PrintSamples"."PrintCopyID", > "tblColors"."ColorID", > avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" > FROM "AnalysisModules" > JOIN > ( > "tblColors" > JOIN > ( > "tblTPNamesAndColors" > JOIN "PrintSamples" > ON ("tblTPNamesAndColors"."TestPatternName"::text = > "PrintSamples"."TestPatternName"::text) > JOIN > ( > "DigitalImages" > JOIN "PrintSampleAnalyses" > ON ("DigitalImages"."ImageID" = > "PrintSampleAnalyses"."ImageID") > JOIN > ( > "ParameterNames" > JOIN > ( > "Measurements" > JOIN "ParameterValues" > ON "Measurements"."MeasurementID" = > "ParameterValues"."MeasurementID" > ) ON "ParameterNames"."ParameterID" = > "ParameterValues"."ParameterID" > ) ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID" > ) ON "PrintSamples"."PrintSampleID" = > "DigitalImages"."PrintSampleID" > ) ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID" > ) ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" > WHERE "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text > AND "ParameterNames"."ParameterName"::text = 'NMF'::text > AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text > AND "tblColors"."ColorID" <> 3 > GROUP BY > "PrintSamples"."MachineID", > "PrintSamples"."PrintCopyID", > "tblColors"."ColorID" > ; > > I'd also be inclined to simplify the FROM clause by > eliminating the parentheses and putting the ON conditions > closer to where they are used, but that would be more for > readability than any expectation that it would affect the plan. > > -Kevin Thanks for your help. Re-writing the view like this maybe bought me something. I've pasted the explain analyze results below. Tough to tell because I also increased some of the statistics. From what Tom says, it sounds like if I want the data returned faster I'm likely to have to get beefier hardware. ALTER TABLE "ParameterValues" ALTER "MeasurementID" SET STATISTICS 500; ALTER TABLE "ParameterValues" ALTER "ParameterID" SET STATISTICS 500; ANALYZE "ParameterValues"; ALTER TABLE "Measurements" ALTER COLUMN "MetricID" SET STATISTICS 500; ALTER TABLE "Measurements" ALTER COLUMN "psaID" SET STATISTICS 500; ANALYZE "Measurements"; Running the above SQL: HashAggregate (cost=461541.53..461634.88 rows=7468 width=16) (actual time=110002.041..110024.777 rows=14853 loops=1) -> Hash Join (cost=230789.57..461464.70 rows=7683 width=16) (actual time=56847.814..109936.722 rows=15123 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Hash Join (cost=230787.80..461057.64 rows=87588 width=20) (actual time=56847.697..109884.122 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=6353.15..234044.47 rows=454038 width=8) (actual time=179.154..52780.680 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=4) (actual time=0.012..0.027 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on
[PERFORM] Help optimize view
I'm have the following view as part of a larger, aggregate query that is running slower than I'd like. There are 4 views total, each very similar to this one. Each of the views is then left joined with data from some other tables to give me the final result that I'm looking for. I'm hoping that if I can get some insight in to how to make this view execute faster, I can apply that learning to the other 3 views and thereby decrease the run time for my aggregate query. I'm running 8.2.4 on Windows XP with a single 10K rpm disk dedicated to the data directory. shared_buffers = 12288 work_mem = 262144 maintenance_work_mem = 131072 max_fsm_pages = 204800 random_page_cost = 2.0 effective_cache_size = 1 autovacuum = on SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "PrintSamples", "DigitalImages", "PrintSampleAnalyses", "Measurements", "ParameterValues", "tblTPNamesAndColors", "tblColors", "AnalysisModules", "ParameterNames" WHERE "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" AND "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID" AND "PrintSampleAnalyses"."psaID" = "Measurements"."psaID" AND "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID" AND "AnalysisModules"."MetricID" = "Measurements"."MetricID" AND "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID" AND "tblTPNamesAndColors"."TestPatternName" = "PrintSamples"."TestPatternName" AND "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "PrintSamples"."MachineID" = 4741 OR "PrintSamples"."MachineID" = 4745 AND "AnalysisModules"."AnalysisModuleName" = 'NMF' AND "ParameterNames"."ParameterName" = 'NMF' AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName" LIKE 'IQAF-TP8%'; EXPLAIN ANALYZE HashAggregate (cost=6069.71..6069.82 rows=9 width=70) (actual time=3230.868..3230.923 rows=31 loops=1) -> Nested Loop (cost=1.77..6069.55 rows=9 width=70) (actual time=367.959..3230.476 rows=31 loops=1) Join Filter: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.020..0.032 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Nested Loop (cost=1.77..6059.09 rows=682 width=61) (actual time=367.905..3230.154 rows=124 loops=1) -> Hash Join (cost=1.77..2889.96 rows=151 width=57) (actual time=119.748..1447.130 rows=31 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Nested Loop (cost=0.00..2880.22 rows=1722 width=48) (actual time=55.278..1444.801 rows=1656 loops=1) -> Nested Loop (cost=0.00..226.25 rows=18 width=44) (actual time=10.080..13.951 rows=31 loops=1) -> Nested Loop (cost=0.00..151.33 rows=18 width=44) (actual time=5.030..8.266 rows=31 loops=1) -> Nested Loop (cost=0.00..74.21 rows=18 width=44) (actual time=2.253..4.822 rows=31 loops=1) Join Filter: ("tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") -> Nested Loop (cost=0.00..48.11 rows=24 width=44) (actual time=2.232..3.619 rows=43 loops=1) -> Index Scan using "PSMachineID_idx" on "PrintSamples" (cost=0.00..7.99 rows=29 width=40) (actual time=2.204..2.515 rows=43 loops=1) Index Cond: ("MachineID" = 4741) Filter: (("TestPatternName")::text ~~ 'IQAF-TP8%'::text) -> Index Scan using "TPNTestPatternName" on "tblTPNamesAndColors" (cost=0.00..1.37 rows=1 width=30) (actual time=0.011..0.015 rows=1 loops=43) Index Cond: (("tblTPNamesAndColors"."TestPatternName")::text = ("PrintSamples"."TestPatternName")::text) -> Seq Scan on "tblColors" (cost=0.00..1.05 rows=3 width=4) (actual time=0.004..0.010 rows=3 loops=43) Filter: ("ColorID" <> 3) -> Index Scan using "DIPrintSampleID_idx" on "DigitalImages" (cost=0.00..4.27 rows=1 width=8) (actual time=0.100..0.102 rows=1 loops=31) Index Cond: ("PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") -> Index Scan using "PSAImageID_idx" on "PrintSampleAnalyses" (cost=0.00..4.15 rows=1 width=8) (actual time=0.171..0.174 rows=1 loops=31) Index Cond: ("DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID") -> Index Scan using "MpsaID_idx" on "Measurements" (cost=0.00..120.33 rows=2169 width=12) (actual time=19.381..46.0
Re: [PERFORM] Help optimize view
> From: Kevin Grittner [mailto:[EMAIL PROTECTED] > > First off, let's make sure we're optimizing the query you > really want to run. > AND binds tighter than OR, so as you have it written, it is > the same as: > > HAVING "PrintSamples"."MachineID" = 4741 > OR ( "PrintSamples"."MachineID" = 4745 > AND "AnalysisModules"."AnalysisModuleName" = 'NMF' > AND "ParameterNames"."ParameterName" = 'NMF' > AND "tblColors"."ColorID" <> 3 > AND "PrintSamples"."TestPatternName" LIKE 'IQAF-TP8%'; > ) > > I fear you may really want it evaluate to: > > HAVING ("PrintSamples"."MachineID" = 4741 OR > "PrintSamples"."MachineID" = 4745) > AND "AnalysisModules"."AnalysisModuleName" = 'NMF' > AND "ParameterNames"."ParameterName" = 'NMF' > AND "tblColors"."ColorID" <> 3 > AND "PrintSamples"."TestPatternName" LIKE 'IQAF-TP8%'; The query I really want to run is several times larger than this. I didn't think people would want to wade through pages and pages worth of SQL and then explain analyze results - especially when I'm fairly certain that optimizing this smaller part of the overall aggregate query would provide me the help I was looking for. You're right about what I really want the query to evaluate to. I'll give your suggestion a try. Thanks. Mike ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table
Hello performance, I need help explaining the performance of a particular query: select * from messages where ((messages.topic = E'/x') AND (messages.processed = 'f')) ORDER BY messages.created_at ASC limit 10; Table Structure: Column |Type | Modifiers +-+ id | integer | not null default nextval('landing_page.messages_id_seq'::regclass) processed | boolean | topic | character varying(255) | body | text| created_at | timestamp without time zone | updated_at | timestamp without time zone | Indexes: "messages_pkey" PRIMARY KEY, btree (id) "idx_landing_page_messages_created_at" btree (created_at) "idx_messages_topic_processed" btree (topic, processed) Table row count ~ 1million When I run the query with limit 10 it skips the idx_messages_topic_processed. When I run the query with no limit, or with a limit above 20 it uses the desired index. On a different system with a much smaller data set (~200,000) i have to use a limit of about 35 to use the desired index. this is the good plan with no limit or 'sweet spot' limit Limit (cost=2050.29..2050.38 rows=35 width=1266) -> Sort (cost=2050.29..2052.13 rows=737 width=1266) Sort Key: created_at -> Bitmap Heap Scan on messages (cost=25.86..2027.70 rows=737 width=1266) Recheck Cond: ((topic)::text = 'x'::text) Filter: (NOT processed) -> Bitmap Index Scan on idx_messages_topic_processed (cost=0.00..25.68 rows=737 width=0) Index Cond: (((topic)::text = '/x'::text) AND (processed = false)) This is the bad plan with limit 10 Limit (cost=0.00..1844.07 rows=30 width=1266) -> Index Scan using idx_landing_page_messages_created_at on messages (cost=0.00..45302.70 rows=737 width=1266) Filter: ((NOT processed) AND ((topic)::text = 'x'::text)) Not sure if cost has anything to do with it, but this is set in postgresql.conf. I am hesitant to change this as I have inherited the database from a previous dba and dont want to adversely affect things that caused this to be set in a non default manner if possible. #seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 3.0 # same scale as above Why does the smaller limit cause it to skip the index? Is there a way to help the planner choose the better plan? Much appreciated, Mike
Re: [PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table
Thanks for the assistance. Here is an explain analyze of the query with the problem limit: production=# explain analyze select * from landing_page.messages where ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY messages.created_at ASC limit 10; QUERY PLAN -- Limit (cost=0.00..2891.06 rows=10 width=1340) (actual time=207922.586..207922.586 rows=0 loops=1) -> Index Scan using idx_landing_page_messages_created_at on messages (cost=0.00..449560.48 rows=1555 widt h=1340) (actual time=207922.581..207922.581 rows=0 loops=1) Filter: ((NOT processed) AND ((topic)::text = 'x'::text)) Total runtime: 207949.413 ms (4 rows) and an explain analyze with a higher limit that hits the index: production=# explain analyze select * from landing_page.messages where ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY messages.created_at ASC limit 25; QUERY PLAN -- - Limit (cost=5885.47..5885.54 rows=25 width=1340) (actual time=80.931..80.931 rows=0 loops=1) -> Sort (cost=5885.47..5889.36 rows=1555 width=1340) (actual time=80.926..80.926 rows=0 loops=1) Sort Key: created_at Sort Method: quicksort Memory: 17kB -> Bitmap Heap Scan on messages (cost=60.45..5841.59 rows=1555 width=1340) (actual time=64.404..64. 404 rows=0 loops=1) Recheck Cond: ((topic)::text = 'x'::text) Filter: (NOT processed) -> Bitmap Index Scan on idx_messages_topic_processed (cost=0.00..60.06 rows=1550 width=0) (ac tual time=56.207..56.207 rows=0 loops=1) Index Cond: (((topic)::text = 'x'::text) AND (p rocessed = false)) Total runtime: 88.051 ms (10 rows) overrides in postgresql.conf shared_buffers = 256MB work_mem = 8MB max_fsm_pages = 200 max_fsm_relations = 2000 checkpoint_segments = 10 archive_mode = on random_page_cost = 3.0 effective_cache_size = 6GB default_statistics_target = 250 logging_collector = on Forgot to mention this is Postgres 8.3.8 with 6GB memory on the server. When you ask how big is the active portion of the database I am not sure how to answer. The whole database server is about 140GB, but there are other applications that use this database, this particular table is about 1.6GB and growing. Currently there are jobs that query from this table every minute. Thanks again Mike On Wed, Jan 5, 2011 at 5:10 PM, Kevin Grittner wrote: > Mike Broers wrote: > > > Hello performance, I need help explaining the performance of a > > particular query > > You provided some of the information needed, but you should review > this page and post a bit more: > > http://wiki.postgresql.org/wiki/SlowQueryQuestions > > In particular, post the result of EXPLAIN ANALYZE, not just EXPLAIN. > Also, showing all overrides in your postgresql.conf file is > important, and some information about your hardware. How big is the > active portion of your database (the frequently read portion)? > > > Why does the smaller limit cause it to skip the index? > > Because the optimizer thinks the query will return rows sooner that > way. > > > Is there a way to help the planner choose the better plan? > > You might get there by adjusting your memory settings and/or costing > settings, but we need to see more information to know that. > > -Kevin >
Re: [PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table
Thanks for the suggestion, created_at is a timestamp without time zone type column. When I add +0 to created at I get a cast error. I am able to get the query to use the desired index when increasing or removing the limit, and I am still looking for the reason why that is happening. Any advice or more information I can supply please let me know. ERROR: operator does not exist: timestamp without time zone + integer LINE 1: ...es.processed = 'f')) ORDER BY messages.created_at+0 ASC lim... ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. From: "pasman pasmański" To: pgsql-performance@postgresql.org Date: Fri, 7 Jan 2011 15:00:22 +0100 Subject: Re: plan question - query with order by and limit not choosing index depends on size of limit, table Try order by created_at+0 On Thu, Jan 6, 2011 at 3:36 PM, Mike Broers wrote: > Thanks for the assistance. > > Here is an explain analyze of the query with the problem limit: > > production=# explain analyze select * from landing_page.messages where > ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY > messages.created_at ASC limit 10; > > > QUERY PLAN > > > -- > > Limit (cost=0.00..2891.06 rows=10 width=1340) (actual > time=207922.586..207922.586 rows=0 loops=1) >-> Index Scan using idx_landing_page_messages_created_at on messages > (cost=0.00..449560.48 rows=1555 widt > h=1340) (actual time=207922.581..207922.581 rows=0 loops=1) > Filter: ((NOT processed) AND ((topic)::text = 'x'::text)) > Total runtime: 207949.413 ms > (4 rows) > > > and an explain analyze with a higher limit that hits the index: > > > production=# explain analyze select * from landing_page.messages where > ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY > messages.created_at ASC limit 25; > QUERY > PLAN > > > -- > - > Limit (cost=5885.47..5885.54 rows=25 width=1340) (actual > time=80.931..80.931 rows=0 loops=1) >-> Sort (cost=5885.47..5889.36 rows=1555 width=1340) (actual > time=80.926..80.926 rows=0 loops=1) > Sort Key: created_at > Sort Method: quicksort Memory: 17kB > -> Bitmap Heap Scan on messages (cost=60.45..5841.59 rows=1555 > width=1340) (actual time=64.404..64. > 404 rows=0 loops=1) >Recheck Cond: ((topic)::text = 'x'::text) >Filter: (NOT processed) >-> Bitmap Index Scan on idx_messages_topic_processed > (cost=0.00..60.06 rows=1550 width=0) (ac > tual time=56.207..56.207 rows=0 loops=1) > Index Cond: (((topic)::text = 'x'::text) AND (p > rocessed = false)) > Total runtime: 88.051 ms > (10 rows) > > > overrides in postgresql.conf > > shared_buffers = 256MB > work_mem = 8MB > max_fsm_pages = 200 > max_fsm_relations = 2000 > checkpoint_segments = 10 > archive_mode = on > random_page_cost = 3.0 > effective_cache_size = 6GB > default_statistics_target = 250 > logging_collector = on > > > Forgot to mention this is Postgres 8.3.8 with 6GB memory on the server. > > When you ask how big is the active portion of the database I am not sure > how to answer. The whole database server is about 140GB, but there are > other applications that use this database, this particular table is about > 1.6GB and growing. Currently there are jobs that query from this table > every minute. > > Thanks again > Mike > > > > > > > On Wed, Jan 5, 2011 at 5:10 PM, Kevin Grittner < > kevin.gritt...@wicourts.gov> wrote: > >> Mike Broers wrote: >> >> > Hello performance, I need help explaining the performance of a >> > particular query >> >> You provided some of the information needed, but you should review >> this page and post a bit more: >> >> http://wiki.postgresql.org/wiki/SlowQueryQuestions >> >> In particular, post the result of EXPLAIN ANALYZE, not just EXPLAIN. >> Also, showing all overrides in your postgresql.conf file is >> important, and some information about your hardware. How big is the >> active portion of your database (the frequently read portion)? >> >> > Why does the smaller limit cause it to skip the index? >> >> Because the optimizer thinks the query will return rows sooner that >> way. >> >> > Is there a way to help the planner choose the better plan? >> >> You might get there by adjusting your memory settings and/or costing >> settings, but we need to see more information to know that. >> >> -Kevin >> > >
Re: [PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table
Thanks Robert, this is what I was looking for. I will try these suggestions and follow up if any of them are the silver bullet. On Fri, Jan 14, 2011 at 7:11 AM, Robert Haas wrote: > On Thu, Jan 6, 2011 at 4:36 PM, Mike Broers wrote: > > Thanks for the assistance. > > Here is an explain analyze of the query with the problem limit: > > production=# explain analyze select * from landing_page.messages where > > ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY > > messages.created_at ASC limit 10; > > > >QUERY PLAN > > > > > -- > > > > Limit (cost=0.00..2891.06 rows=10 width=1340) (actual > > time=207922.586..207922.586 rows=0 loops=1) > >-> Index Scan using idx_landing_page_messages_created_at on messages > > (cost=0.00..449560.48 rows=1555 widt > > h=1340) (actual time=207922.581..207922.581 rows=0 loops=1) > > Filter: ((NOT processed) AND ((topic)::text = 'x'::text)) > > Total runtime: 207949.413 ms > > (4 rows) > > You're not the first person to have been bitten by this. The > optimizer thinks that rows WHERE NOT processed and topic = 'x' are > reasonably common, so it figures that it can just index scan until it > finds 10 of them. But when it turns out that there are none at all, > it ends up having to scan the entire index, which stinks big-time. > > The alternative plan is to use a different index to find ALL the > relevant rows, sort them, and then take the top 10. That would suck > if there actually were tons of rows like this, but there aren't. > > So the root of the problem, in some sense, is that the planner's > estimate of the selectivity of "NOT processed and topic = 'x'" is not > very good. Some things to try: > > - increase the statistics target for the "processed" and "topic" > columns even higher > - put the processed rows in one table and the not processed rows in > another table > - do something like SELECT * FROM (SELECT .. LIMIT 200 OFFSET 0) LIMIT > 10 to try to fool the planner into planning based on the higher, inner > limit > - create a partial index on messages (topic) WHERE NOT processed and > see if the planner will use it > > ...Robert >
[PERFORM] Huge Data sets, simple queries
Does anyone have any experience with extremely large data sets? I'm mean hundreds of millions of rows. The queries I need to run on my 200 million transactions are relatively simple: select month, count(distinct(cardnum)) count(*), sum(amount) from transactions group by month; This query took 18 hours on PG 8.1 on a Dual Xeon, RHEL3, (2.4 Kernel) with RAID-10 (15K drives) and 12 GB Ram. I was expecting it to take about 4 hours - based on some experience with a similar dataset on a different machine (RH9, PG7.3 Dual Xeon, 4GB RAM, Raid-5 10K drives) This machine is COMPLETELY devoted to running these relatively simple queries one at a time. (No multi-user support needed!)I've been tooling with the various performance settings: effective_cache at 5GB, shared_buffers at 2 GB, workmem, sortmem at 1 GB each. ( Shared buffers puzzles me a it bit - my instinct says to set it as high as possible, but everything I read says that "too high" can hurt performance.) Any ideas for performance tweaking in this kind of application would be greatly appreciated. We've got indexes on the fields being grouped, and always vacuum analzye after building them. It's difficult to just "try" various ideas because each attempt takes a full day to test. Real experience is needed here! Thanks much, Mike ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Huge Data sets, simple queries
On 2/1/06, Luke Lonergan <[EMAIL PROTECTED]> wrote: [snip] > This is actually interesting overall - I think what this might be showing is > that the Linux SW RAID1 is alternating I/Os to the mirror disks from > different processes (LWP or HWP both maybe?), but not within one process. I can confirm this behavior after looking at my multipathed fibre channel SAN. To the best of my knowledge, the multipathing code uses the same underlying I/O code as the Linux SW RAID logic. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] The order of fields around the "=" in the WHERE conditions
version PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6 (1 row) -- The order of fields around the "=" in the WHERE conditions -- affects the query plan. I would rather not have to worry about -- that. It seems that it puts me back in the place of having to -- figure what join order is best. Here are two sql statements and -- the query plan that is generated for each. The worst of the two -- is first and the best one is second. -- Mike Quinn -- the worst way -- EXPLAIN ANALYZE SELECT Locts.id, Commtypes.name FROM Growers , Locts , Crops , Commtypes WHERE Growers.id = '0401606' AND -- Commtypes.number = Crops.Commtype Crops.Commtype = Commtypes.number AND Locts.number = Crops.Loct -- Crops.Loct = Locts.number AND Growers.number = Locts.Grower -- Locts.Grower = Growers.number ; QUERY PLAN - Nested Loop (cost=18934.81..647002.69 rows=1045 width=20) (actual time=525.267..4079.051 rows=69 loops=1) Join Filter: ("outer".commtype = "inner".number) -> Nested Loop (cost=18923.21..631988.31 rows=1310 width=18) (actual time=523.867..4036.005 rows=69 loops=1) Join Filter: ("inner".number = "outer".loct) -> Seq Scan on crops (cost=0.00..7599.46 rows=258746 width=24) (actual time=0.006..278.656 rows=258746 loops=1) -> Materialize (cost=18923.21..18924.25 rows=104 width=18) (actual time=0.001..0.007 rows=9 loops=258746) -> Nested Loop (cost=5503.02..18923.11 rows=104 width=18) (actual time=0.061..523.703 rows=9 loops=1) Join Filter: ("outer".number = "inner".grower) -> Index Scan using growers_id on growers (cost=0.00..3.05 rows=4 width=12) (actual time=0.016..0.024 rows=1 loops=1) Index Cond: ((id)::text = '0401606'::text) -> Materialize (cost=5503.02..7451.58 rows=112456 width=30) (actual time=0.007..433.970 rows=112456 loops=1) -> Seq Scan on locts (cost=0.00..4566.56 rows=112456 width=30) (actual time=0.003..176.771 rows=112456 loops=1) -> Materialize (cost=11.60..16.69 rows=509 width=26) (actual time=0.001..0.287 rows=509 loops=69) -> Seq Scan on commtypes (cost=0.00..11.09 rows=509 width=26) (actual time=0.021..0.672 rows=509 loops=1) Total runtime: 4081.766 ms (15 rows) -- the best way -- EXPLAIN ANALYZE SELECT Locts.id, Commtypes.name FROM Growers , Locts , Crops , Commtypes WHERE Growers.id = '0401606' AND Commtypes.number = Crops.Commtype -- Crops.Commtype = Commtypes.number AND -- Locts.number = Crops.Loct Crops.Loct = Locts.number AND -- Growers.number = Locts.Grower Locts.Grower = Growers.number ; QUERY PLAN Nested Loop (cost=0.00..11224.18 rows=1045 width=20) (actual time=0.259..1.172 rows=69 loops=1) -> Nested Loop (cost=0.00..5717.09 rows=1310 width=18) (actual time=0.205..0.466 rows=69 loops=1) -> Nested Loop (cost=0.00..31.90 rows=104 width=18) (actual time=0.141..0.171 rows=9 loops=1) -> Index Scan using growers_id on growers (cost=0.00..3.05 rows=4 width=12) (actual time=0.078..0.080 rows=1 loops=1) Index Cond: ((id)::text = '0401606'::text) -> Index Scan using locts_grower on locts (cost=0.00..6.15 rows=85 width=30) (actual time=0.058..0.070 rows=9 loops=1) Index Cond: (locts.grower = "outer".number) -> Index Scan using crops_loct on crops (cost=0.00..54.13 rows=43 width=24) (actual time=0.012..0.022 rows=8 loops=9) Index Cond: (crops.loct = "outer".number) -> Index Scan using commtypes_number_key on commtypes (cost=0.00..4.19 rows=1 width=26) (actual time=0.006..0.007 rows=1 loops=69) Index Cond: (commtypes.number = "outer".commtype) Total runtime: 1.308 ms (12 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] The order of fields around the "=" in the WHERE
The datatype of the join columns is a user defined type and there are no commutators defined. I will fix that and retest. Thanks for the insight. Mike Quinn ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] The order of fields around the "=" in the WHERE
version PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6 (1 row) -- After commutator added to operators of user defined type, -- the order of fields around the "=" in WHERE conditions -- no longer affect the query plan. -- previously the worst way -- EXPLAIN ANALYZE SELECT Locts.id, Commtypes.name FROM Growers , Locts , Crops , Commtypes WHERE Growers.id = '0401606' AND -- Commtypes.number = Crops.Commtype Crops.Commtype = Commtypes.number AND Locts.number = Crops.Loct -- Crops.Loct = Locts.number AND Growers.number = Locts.Grower -- Locts.Grower = Growers.number ; QUERY PLAN Nested Loop (cost=0.00..17467.00 rows=954 width=20) (actual time=0.253..1.155 rows=69 loops=1) -> Nested Loop (cost=0.00..12413.93 rows=1202 width=18) (actual time=0.191..0.472 rows=69 loops=1) -> Nested Loop (cost=0.00..32.51 rows=104 width=18) (actual time=0.142..0.171 rows=9 loops=1) -> Index Scan using growers_id on growers (cost=0.00..3.05 rows=4 width=12) (actual time=0.065..0.067 rows=1 loops=1) Index Cond: ((id)::text = '0401606'::text) -> Index Scan using locts_grower on locts (cost=0.00..6.23 rows=91 width=30) (actual time=0.070..0.085 rows=9 loops=1) Index Cond: ("outer".number = locts.grower) -> Index Scan using crops_loct on crops (cost=0.00..118.53 rows=42 width=24) (actual time=0.011..0.021 rows=8 loops=9) Index Cond: ("outer".number = crops.loct) -> Index Scan using commtypes_number_key on commtypes (cost=0.00..4.19 rows=1 width=26) (actual time=0.006..0.007 rows=1 loops=69) Index Cond: ("outer".commtype = commtypes.number) Total runtime: 1.299 ms (12 rows) -- previously the best way -- EXPLAIN ANALYZE SELECT Locts.id, Commtypes.name FROM Growers , Locts , Crops , Commtypes WHERE Growers.id = 0401606 AND Commtypes.number = Crops.Commtype -- Crops.Commtype = Commtypes.number AND -- Locts.number = Crops.Loct Crops.Loct = Locts.number AND -- Growers.number = Locts.Grower Locts.Grower = Growers.number ; QUERY PLAN Nested Loop (cost=0.00..17467.00 rows=954 width=20) (actual time=0.063..0.947 rows=69 loops=1) -> Nested Loop (cost=0.00..12413.93 rows=1202 width=18) (actual time=0.050..0.318 rows=69 loops=1) -> Nested Loop (cost=0.00..32.51 rows=104 width=18) (actual time=0.036..0.064 rows=9 loops=1) -> Index Scan using growers_id on growers (cost=0.00..3.05 rows=4 width=12) (actual time=0.018..0.020 rows=1 loops=1) Index Cond: ((id)::text = '0401606'::text) -> Index Scan using locts_grower on locts (cost=0.00..6.23 rows=91 width=30) (actual time=0.012..0.023 rows=9 loops=1) Index Cond: (locts.grower = "outer".number) -> Index Scan using crops_loct on crops (cost=0.00..118.53 rows=42 width=24) (actual time=0.007..0.018 rows=8 loops=9) Index Cond: (crops.loct = "outer".number) -> Index Scan using commtypes_number_key on commtypes (cost=0.00..4.19 rows=1 width=26) (actual time=0.005..0.006 rows=1 loops=69) Index Cond: (commtypes.number = "outer".commtype) Total runtime: 1.091 ms (12 rows) >>> "Mike Quinn" <[EMAIL PROTECTED]> 4/4/06 10:18:30 AM >>> The datatype of the join columns is a user defined type and there are no commutators defined. I will fix that and retest. Thanks for the insight. Mike Quinn ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] High CPU with 7.4.1 after running for about 2 weeks
I've been running Postgresql 7.4.1 for a couple weeks after upgrading from 7.2. I noticed today that the postmaster had been using 99% of the dual CPUs (on a PowerEdge 2650) non-stop for the last couple days. I stopped all the clients, and it didn't abate---even with no connections---so I restarted the postmaster. Now everything is running smoothly again. Is there anything that might accumulate after two weeks that might cause postgresql to thrash? I'm running pg_autovacuum, so the database itself should be nice and clean. It isn't connections, because I restarted the clients a few times without success. I've been running a long time on 7.2 with essentially the same configuration (minus pg_autovacuum) without any problems Thanks for any help, -Mike ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] High CPU with 7.4.1 after running for about 2 weeks
>Since the postmaster is a single unthreaded process, it's quite >impossible for it to take up 100% of two CPUs. Could you be more >precise about which processes were eating CPU, and what they were >doing according to the available state data? (ps auxww and >pg_stat_activity can be helpful tools.) > > regards, tom lane I shut down all our clients (all java except one in perl), and pg_stat_activity showed that there was still one query active. That's a good table to know about! Anyway, it didn't end until I sent it a TERM signal. I assume this means there's a runaway query somewhere, which I'll have to hunt down. But if the client dies, doesn't postgresql normally terminate the query that that client initiated? Or do I need to set statement_timeout? (As for the 100% CPU, I was confused by the fact that I was getting two lines in "top" (on Linux) with 99% utilization---I assume with two runaway queries.) Thanks for your help! -Mike ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
> Almost any cross dbms migration shows a drop in performance. The engine > effectively trains developers and administrators in what works and what > doesn't. The initial migration thus compares a tuned to an untuned version. I think it is also possible that Microsoft has more programmers working on tuning issues for SQL Server than PostgreSQL has working on the whole project. -- Mike Nolan ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Long running queries degrade performance
> We have a web app with a postgres backend. Most queries have subsecond > response times through the web even with high usage. Every once in awhile > someone will run either an ad-hoc query or some other long running db > process. Are you sure it is postgres where the delay is occurring? I ask this because I also have a web-based front end to postgres, and while most of the time the queries respond in about a second every now and then I see one that takes much longer, sometimes 10-15 seconds. I've seen this behavior on both my development system and on the production server. The same query a while later might respond quickly again. I'm not sure where to look for the delay, either, and it is intermittent enough that I'm not even sure what monitoring techniques to use. -- Mike Nolan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Long running queries degrade performance
> Fairly sure, when it is happening, postgres usually is taking up the top slots > for cpu usage as reported by top. Perhaps there is a better way to monitor > this? Given the intermittent nature of the problem and its relative brevity (5-10 seconds), I don't know whether top offers the granularity needed to locate the bottleneck. > likely you have a situation where something else is happening which blocks > the current thread. It happens on my development system, and I'm the only one on it. I know I've seen it on the production server, but I think it is a bit more common on the development server, though that may be a case of which system I spend the most time on. (Also, the production server is 1300 miles away with a DSL connection, so I may just be seeing network delays some of the time there.) > Both of these were triggered by users double clicking links in our > web app and were fixed by a better implementation. Perhaps something like > that is causing what you are seeing. My web app traps double-clicks in javascript and ignores all but the first one. That's because some of the users have mice that give double-clicks even when they only want one click. -- Mike Nolan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Trigger & Function
> My problem is I defined the "before" and "after" > fields in the audit table as TEXT and when I try to move NEW or OLD into > these fields I get the error "NEW used in query that is not in a rule". You're trying to insert record data into a text field, that doesn't work. OLD and NEW can be used as either record identifiers (as in RETURN OLD) or column qualifiers (as in OLD.colname), but you can't intermingle them. I don't think postgres (pl/pgsql) has row-to-variable and variable-to-row functions like serialize and unserialize, that's probably what you'd need. It would probably be necessary to write something like that in C, since at this point pl/perl cannot be used for trigger functions. I've not tried using pl/php yet, the announcement for it says it can be used for trigger functions. My first thought is that even if there was a serialize/unserialize capabiity you might be able to write something using it that creates the log entry but not anything that allows you to query the log for specific column or row entries. It would probably require a MAJOR extension of SQL to add it to pg, as there would need to be qualifiers that can be mapped to specific tables and columns. Even if we had that, storing values coming from multiple tables into a single audit table would present huge challenges. I've found only two ways to implement audit logs: 1. Have separate log tables that match the structure of the tables they are logging. 2. Write a trigger function that converts columns to something you can store in a common log table. (I've not found a way to do this without inserting one row for each column being logged, though.) -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] vacuum full 100 mins plus?
Tom Lane wrote: > Christopher Browne <[EMAIL PROTECTED]> writes: >> A long time ago, in a galaxy far, farpliers [EMAIL PROTECTED] (Patrick >> Hatcher) wrote: >>> Answered my own question. I gave up the vacuum full after 150 mins. I >>> was able to export to a file, vacuum full the empty table, and reimport >>> in less >>> than 10 mins. I suspect the empty item pointers and the sheer number of >>> removable rows was causing an issue. > >> In that case, you'd be a little further better off if the steps were: >> - drop indices; >> - copy table to file (perhaps via pg_dump -t my_table); >> - truncate the table, or drop-and-recreate, both of which make >>it unnecessary to do _any_ vacuum of the result; >> - recreate indices, probably with SORT_MEM set high, to minimize >>paging to disk >> - analyze the table (no need to vacuum if you haven't created any >>dead tuples) >> - cut SORT_MEM back down to "normal" sizes > > Rather than doing all this manually, you can just CLUSTER on any handy > index. In 7.5, another possibility is to issue one of the forms of > ALTER TABLE that force a table rewrite. > > The range of usefulness of VACUUM FULL is really looking narrower and > narrower to me. I can foresee a day when we'll abandon it completely. I would love to see this 10lb sledge hammer go away when we have enough tiny screwdrivers and needlenose pliers to make it obsolete! > > regards, tom lane > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Performance Bottleneck
On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote: > > The queries themselves are simple, normally drawing information from one > > table with few conditions or in the most complex cases using joins on > > two table or sub queries. These behave very well and always have, the > > problem is that these queries take place in rather large amounts due to > > the dumb nature of the scripts themselves. > > Show us the explain analyze on that queries, how many rows the tables are > containing, the table schema could be also usefull. > If the queries themselves are optimized as much as they can be, and as you say, its just the sheer amount of similar queries hitting the database, you could try using prepared queries for ones that are most often executed to eliminate some of the overhead. I've had relatively good success with this in the past, and it doesn't take very much code modification. -- Mike Benoit <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Performance Bottleneck
On Fri, 2004-08-06 at 23:18 +, Martin Foster wrote: > Mike Benoit wrote: > > > On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote: > > > > > >>>The queries themselves are simple, normally drawing information from one > >>>table with few conditions or in the most complex cases using joins on > >>>two table or sub queries. These behave very well and always have, the > >>>problem is that these queries take place in rather large amounts due to > >>>the dumb nature of the scripts themselves. > >> > >>Show us the explain analyze on that queries, how many rows the tables are > >>containing, the table schema could be also usefull. > >> > > > > > > If the queries themselves are optimized as much as they can be, and as > > you say, its just the sheer amount of similar queries hitting the > > database, you could try using prepared queries for ones that are most > > often executed to eliminate some of the overhead. > > > > I've had relatively good success with this in the past, and it doesn't > > take very much code modification. > > > > One of the biggest problems is most probably related to the indexes. > Since the performance penalty of logging the information needed to see > which queries are used and which are not is a slight problem, then I > cannot really make use of it for now. > > However, I am curious how one would go about preparing query? Is this > similar to the DBI::Prepare statement with placeholders and simply > changing the values passed on execute? Or is this something database > level such as a view et cetera? > Yes, always optimize your queries and GUC settings first and foremost. Thats where you are likely to gain the most performance. After that if you still want to push things even further I would try prepared queries. I'm not familiar with DBI::Prepare at all, but I don't think its what your looking for. This is what you want: http://www.postgresql.org/docs/current/static/sql-prepare.html -- Mike Benoit <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning
On 15 Sep 2004 23:55:24 -0400, Greg Stark <[EMAIL PROTECTED]> wrote: > > "J. Andrew Rogers" <[EMAIL PROTECTED]> writes: > > > We do something very similar, also using table inheritance > > I have a suspicion postgres's table inheritance will end up serving as a good > base for a partitioned table feature. Is it currently possible to query which > subtable a record came from though? >From the docs on http://www.postgresql.org/docs/7.4/static/ddl-inherit.html : ... In some cases you may wish to know which table a particular row originated from. There is a system column called TABLEOID in each table which can tell you the originating table: SELECT c.tableoid, c.name, c.altitude FROM cities c WHERE c.altitude > 500; which returns: tableoid | name| altitude --+---+-- 139793 | Las Vegas | 2174 139793 | Mariposa | 1953 139798 | Madison | 845 (If you try to reproduce this example, you will probably get different numeric OIDs.) By doing a join with pg_class you can see the actual table names: SELECT p.relname, c.name, c.altitude FROM cities c, pg_class p WHERE c.altitude > 500 and c.tableoid = p.oid; which returns: relname | name| altitude --+---+-- cities | Las Vegas | 2174 cities | Mariposa | 1953 capitals | Madison | 845 --miker ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] COPY slows down?
I just ran a COPY of a million records several times, and each time I ran it it ran apparently exponentially slower. If I do an insert of 10 million records, even with 2 indexes (same table) it doesn't appear to slow down at all. Any ideas? - Mike H. (I apologize for the ^Hs) Script started on Wed Oct 6 08:37:32 2004 bash-3.00$ psql Welcome to psql 7.4.5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit mvh=# \timing Timing is on. mvh=# \timingreindex table bgtest;mvh=# [2Pdelete from bgtest;mvh=# [4hcopy bgtest from '/home/mvh/database[4lstuff/dbdmp/bgdump'; COPY Time: 69796.130 ms mvh=# vacuum analyze; VACUUM Time: 19148.621 ms mvh=# vacuum analyze;mvh=# [4hcopy bgtest from '/home/mvh/databasestuff[4l/dbdmp/bgdump'; COPY Time: 89189.939 ms mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K VACUUM Time: 26814.670 ms mvh=# vacuum analyze;mvh=# [4hcopy bgtest from '/home/mvh/databasestuff[4l/dbdmp/bgdump'; COPY Time: 131131.982 ms mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K VACUUM Time: 64997.264 ms mvh=# vacuum analyze;mvh=# [4hcopy bgtest from '/home/mvh/databasestuff[4l/dbdmp/bgdump'; COPY Time: 299977.697 ms mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K VACUUM Time: 103541.716 ms mvh=# vacuum analyze;mvh=# [4hcopy bgtest from '/home/mvh/databasestuff[4l/dbdmp/bgdump'; COPY Time: 455292.600 ms mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K VACUUM Time: 138910.015 ms mvh=# vacuum analyze;mvh=# [4hcopy bgtest from '/home/mvh/databasestuff[4l/dbdmp/bgdump'; COPY Time: 612119.661 ms mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K VACUUM Time: 151331.243 ms mvh=# \q bash-3.00$ exit Script done on Wed Oct 6 10:43:04 2004 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Does PostgreSQL run with Oracle?
On Fri, 15 Oct 2004 11:54:44 -0500, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > My basic question to the community is "is PostgreSQL approximately as fast > as Oracle?" My personal experience comparing PG to Oracle is across platforms, Oracle on Sun/Solaris (2.7, quad-proc R440) and PG on Intel/Linux (2.6 kernel, dual P3/1GHz). When both were tuned for the specific app I saw a 45% speedup after switching to PG. This was with a customized CRM and System Monitoring application serving ~40,000 trouble tickets and monitoring 5,000 metric datapoints every 5-30 minutes. The hardware was definitely not comparable (the Intel boxes have more horsepower and faster disks), but dollar for dollar, including support costs, PG is the winner by a BIG margin. YMMV, of course, and my results are apparently above average. Another big plus I found was that PG is much easier to admin as long as you turn on pg_autovacuum. --miker ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] preloading indexes
If your running Linux, and kernel 2.6.x, you can try playing with the: /proc/sys/vm/swappiness setting. My understanding is that: echo "0" > /proc/sys/vm/swappiness Will try to keep all in-use application memory from being swapped out when other processes query the disk a lot. Although, since PostgreSQL utilizes the disk cache quite a bit, this may not help you. On Wed, 2004-11-03 at 15:53 -0500, Tom Lane wrote: > <[EMAIL PROTECTED]> writes: > > The caching appears to disappear overnight. > > You've probably got cron jobs that run late at night and blow out your > kernel disk cache by accessing a whole lot of non-Postgres stuff. > (A nightly disk backup is one obvious candidate.) The most likely > solution is to run some cron job a little later to exercise your > database and thereby repopulate the cache with Postgres files before > you get to work ;-) > > regards, tom lane > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- Mike Benoit <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
[PERFORM] int4 in a GiST index
Hello all, I am using tsearch2 to (imagine this... :) index a text field. There is also a, for lack of a better name, "classification" field called 'field' that will be used to group certain rows together. CREATE TABLE biblio.metarecord_field_entry ( record BIGINT REFERENCES biblio.metarecord (id) ON UPDATE CASCADE ON DELETE SET NULL DEFERRABLE INITIALLY DEFERRED, field INT NOT NULL REFERENCES biblio.metarecord_field_map (id) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED, value TEXT, value_fti tsvector, source BIGINT NOT NULL REFERENCES biblio.record (id) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED ) WITHOUT OIDS; Because there will be "or" queries against the 'value_fti' I want to create a multi-column index across the tsvector and classification columns as that should help with selectivity. But because there is no GiST opclass for INT4 the index creation complains thusly: oils=# CREATE INDEX metarecord_field_entry_value_and_field_idx ON biblio.metarecord_field_entry USING GIST (field, value_fti); ERROR: data type integer has no default operator class for access method "gist" HINT: You must specify an operator class for the index or define a default operator class for the data type. I attempted to give it the 'int4_ops' class, but that also complains: oils=# CREATE INDEX metarecord_field_entry_value_and_field_idx ON biblio.metarecord_field_entry USING GIST (value_fti, field int4_ops); ERROR: operator class "int4_ops" does not exist for access method "gist" I couldn't find any info in the docs (7.4 and 8.0.0b4) for getting GiST to index standard integers. I'm sure this has been done before, but I've note found the magic spell. Of course, I may just be barking up the wrong tree altogether... Thanks in advance! -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] int4 in a GiST index
On Wed, 10 Nov 2004 18:50:28 -0800 (PST), George Essig <[EMAIL PROTECTED]> wrote: > Mike Rylander wrote: > > > I want to create a multi-column index across the tsvector and classification > > columns as that should help with selectivity. But because there is no > > GiST opclass for INT4 the index creation complains thusly: > > Install contrib/btree_gist along with contrib/tsearch2 to create a > multicolumn index on the in4 > and the tsvector columns. See the following for an example: > > http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/oscon_tsearch2/multi_column_index.html > > George Essig > Thanks a million. I had actually just found the answer after some more googling, but I hadn't seen that page and it happens to be exactly what I wanted. As a side note I'd like to thank everyone here (and especially George, in this case). I've been on these lists for quite a while and I'm always amazed at the speed, accuracy and precision of the answers on the PG mailing lists. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] memcached and PostgreSQL
On 17 Nov 2004 03:08:20 -0500, Greg Stark <[EMAIL PROTECTED]> wrote: > Josh Berkus <[EMAIL PROTECTED]> writes: > > > So memcached becomes a very good place to stick data that's read often but > > not > > updated often, or alternately data that changes often but is disposable. > > An > > example of the former is a user+ACL list; and example of the latter is web > > session information ... or simple materialized views. > > I would like very much to use something like memcached for a materialized view > I have. The problem is that I have to join it against other tables. > > I've thought about providing a SRF in postgres to read records out of > memcached but I'm unclear it would it really help at all. > > Has anyone tried anything like this? I haven't tried it yet, but I plan too. An intersting case might be to use plperlu to interface with memcached and store hashes in the cache via some external process, like a CGI script. Then just define a TYPE for the perl SRF to return, and store the data as an array of hashes with keys matching the TYPE. A (perhaps useless) example could then be something like: CREATE TYPE user_info AS ( sessionid TEXT, userid INT, lastaccess TIMESTAMP, lastrequest TEXT); CREATE FUNCTION get_user_info_by_session ( TEXT) RETURNS SETOF user_info AS $$ use Cache::Memcached; my $session = shift; my $c = $_SHARED{memcached} || Cache::Memcached->new( {servers => '127.0.0.1:'} ); my $user_info = $m->get('web_access_list'); # $user_info looks like # [ {userid => 5, lastrequest => 'http://...', lastaccess => localtime(), #sessionid => '123456789'}, { ...} ] # and is stored by a CGI. @info = grep {$$_{sessionid} eq $session} @$user_info; return [EMAIL PROTECTED]; $$ LANGUAGE 'plperlu'; SELECT u.username, f.lastrequest FROM users u, get_user_info_by_session('123456789') WHERE f.userid = u.userid; Any thoughts? > > -- > greg > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faqs/FAQ.html > -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Slow execution time when querying view with WHERE clause
loops=1) Filter: (webuser = getwebuser()) -> Subquery Scan "*SELECT* 2" (cost=1.08..3.33 rows=3 width=8) (actual time=0.295..0.381 rows=1 loops=1) -> Hash Join (cost=1.08..3.30 rows=3 width=8) (actual time=0.286..0.368 rows=1 loops=1) Hash Cond: ("outer".company = "inner".company) -> Seq Scan on _stores s (cost=0.00..2.13 rows=13 width=16) (actual time=0.008..0.080 rows=13 loops=1) -> Hash (cost=1.07..1.07 rows=1 width=8) (actual time=0.116..0.116 rows=0 loops=1) -> Seq Scan on _companies c (cost=0.00..1.07 rows=1 width=8) (actual time=0.062..0.083 rows=1 loops=1) Filter: ((companyid)::text = 'DEFAULT'::text) -> Index Scan using i_offers4 on _offers o (cost=0.00..1007.93 rows=15524 width=16) (actual time=0.023..67.183 rows=10049 loops=48) -> Index Scan using i_inventories1 on _inventories i (cost=0.00..5.07 rows=1 width=16) (actual time=0.025..0.029 rows=1 loops=163561) Index Cond: (i.inventory = "outer".inventory) -> Index Scan using i_bins1 on _bins b (cost=0.00..3.40 rows=1 width=16) (actual time=0.021..0.026 rows=1 loops=163561) Index Cond: (b.bin = "outer".bin) Total runtime: 20027.414 ms (36 rows) --- That's a slow-down on execution time by a factor of 50, even though the row count was the same: 34. In fact, it's MUCH faster to do: create temporary table foo as select * from p_areas; select * from foo where deactive is null; The database has been analyzed. Any tips would be greatly appreciated. Mike Mascari P.S.: I turned off word-wrap in my mail client for this post. Is that the right thing to do for analyze output? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Mike Mascari wrote: I have the following view: create or replace view market.p_areas as select a.* from _areas a where a.area in ( select b.area from _bins b, _inventories i, _offers o, _pricemembers p where b.bin = i.bin and i.inventory = o.inventory and o.pricegroup = p.pricegroup and p.buyer in ( select s.store from _stores s, _webusers w where w.webuser = getWebuser() and w.company = s.company union select s.store from _stores s, _companies c where s.company = c.company and c.companyid = 'DEFAULT' ) ); ... I failed to report the version: select version(); PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.0 20040204 (prerelease) Sorry. Mike Mascari ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Tom Lane wrote: Mike Mascari <[EMAIL PROTECTED]> writes: When I query the view with a simple filter, I get: explain analyze select * from p_areas where deactive is null; The problem seems to be here: -> Seq Scan on _areas a (cost=0.00..2.48 rows=1 width=163) (actual time=0.037..0.804 rows=48 loops=1) Filter: (deactive IS NULL) Why is it so completely off about the selectivity of the IS NULL clause? Are you sure you ANALYZEd this table recently? Yes. I just did: [EMAIL PROTECTED] vacuum full analyze; VACUUM [EMAIL PROTECTED] explain analyze select * from p_areas where deactive is null; QUERY PLAN -- Nested Loop IN Join (cost=8.62..512.47 rows=1 width=162) (actual time=1143.969..21811.417 rows=37 loops=1) Join Filter: ("outer".area = "inner".area) -> Seq Scan on _areas a (cost=0.00..2.49 rows=1 width=162) (actual time=0.037..1.673 rows=49 loops=1) Filter: (deactive IS NULL) -> Nested Loop (cost=8.62..25740.20 rows=2681 width=8) (actual time=1.172..429.501 rows=3566 loops=49) -> Nested Loop (cost=8.62..16674.93 rows=2680 width=8) (actual time=1.125..281.570 rows=3566 loops=49) -> Merge Join (cost=8.62..3012.72 rows=2778 width=8) (actual time=0.876..128.908 rows=3566 loops=49) Merge Cond: ("outer".pricegroup = "inner".pricegroup) -> Nested Loop IN Join (cost=8.62..1929.41 rows=9 width=8) (actual time=0.613..5.504 rows=9 loops=49) Join Filter: ("outer".buyer = "inner".store) -> Index Scan using i_pricemembers3 on _pricemembers p (cost=0.00..11.13 rows=217 width=16) (actual time=0.403..1.476 rows=142 loops=49) -> Subquery Scan "IN_subquery" (cost=8.62..8.74 rows=8 width=8) (actual time=0.013..0.019 rows=1 loops=6950) -> Unique (cost=8.62..8.66 rows=8 width=8) (actual time=0.007..0.010 rows=1 loops=6950) -> Sort (cost=8.62..8.64 rows=8 width=8) (actual time=0.003..0.004 rows=1 loops=6950) Sort Key: store -> Append (cost=2.87..8.50 rows=8 width=8) (actual time=8.394..8.446 rows=1 loops=1) -> Subquery Scan "*SELECT* 1" (cost=2.87..5.17 rows=5 width=8) (actual time=8.112..8.112 rows=0 loops=1) -> Hash Join (cost=2.87..5.12 rows=5 width=8) (actual time=8.106..8.106 rows=0 loops=1) Hash Cond: ("outer".company = "inner".company) -> Seq Scan on _stores s (cost=0.00..2.13 rows=13 width=16) (actual time=0.014..0.052 rows=13 loops=1) -> Hash (cost=2.87..2.87 rows=1 width=8) (actual time=7.878..7.878 rows=0 loops=1) -> Seq Scan on _webusers w (cost=0.00..2.87 rows=1 width=8) (actual time=7.868..7.868 rows=0 loops=1) Filter: (webuser = getwebuser()) -> Subquery Scan "*SELECT* 2" (cost=1.08..3.33 rows=3 width=8) (actual time=0.273..0.322 rows=1 loops=1) -> Hash Join (cost=1.08..3.30 rows=3 width=8) (actual time=0.263..0.308 rows=1 loops=1) Hash Cond: ("outer".company = "inner".company) -> Seq Scan on _stores s (cost=0.00..2.13 rows=13 width=16) (actual time=0.008..0.042 rows=13 loops=1) -> Hash (cost=1.07..1.07 rows=1 width=8) (actual time=0.093..0.093 rows=0 loops=1) -> Seq Scan on _companies c (cost=0.00..1.07 rows=1 width=8) (actual time=0.061..0.081 rows=1 loops=1) Filter: ((companyid)::text = 'DEFAULT'::text) -> Index Scan using i_offers4 on _offers o (cost=0.00..1014.76 rows=16298 width=16) (actual time=0.244..72.742 rows=10433 loops=49)
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Tom Lane wrote: Um ... doh ... analyze.c about line 1550: /* We can only compute valid stats if we found some non-null values. */ if (nonnull_cnt > 0) ... There's a bit of an epistemological issue here: if we didn't actually find any nonnull values in our sample, is it legitimate to assume that the column is entirely null? On the other hand, if we find only "3" in our sample we will happily assume the column contains only "3", so I dunno why we are discriminating against null. This seems like a case that just hasn't come up before. Will this discriminatory policy toward null end for 8.0? Mike Mascari ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] VACUUM ANALYZE downgrades performance
On 30 Nov 2004 14:30:37 +0100, Dmitry Karasik <[EMAIL PROTECTED]> wrote: > > Hi all, > > On v7.4.5 I noticed downgrade in the planner, namely favoring > sequential scan over index scan. The proof: > >create table a ( a integer); >create index aidx on a(a); >explain analyze select * from a where a = 0; >-- Index Scan using aidx on a (cost=0.00..17.07 rows=5 width=4) (actual >-- time=0.029..0.029 rows=0 loops=1) >-- Index Cond: (a = 0) >vacuum analyze; >explain analyze select * from a where a = 0; >-- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009 >-- rows=0 loops=1) >-- Filter: (a = 0) Looks to me like the seq scan is a better plan. The "actual time" went down. > > I do realize that there might be reasons why this happens over an empty > table, but what is way worse that when the table starts actually to fill, > the seq scan is still there, and the index is simply not used. How > that could be so ...mmm... shortsighted, and what is more important, > how to avoid this? I hope the answer is not 'run vacuum analyze each 5 > seconds'. > See this thread (http://archives.postgresql.org/pgsql-hackers/2004-11/msg00985.php and http://archives.postgresql.org/pgsql-hackers/2004-11/msg01080.php) for an ongoing discussion of the issue. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Alternatives to Dell?
On Wed, 1 Dec 2004 14:24:12 -0800, Josh Berkus <[EMAIL PROTECTED]> wrote: > Folks, > > A lot of people have been having a devilish time with Dell hardware lately. > It seems like the quality control just isn't there on the Dell servers. Which is a shame, because I *still* drool over a rack full of those front bevels with the bright blue LEDs. :) > > Thing is, some companies are required to use 1st-tier or at least 2nd-tier > vendors for hardware; they won't home-build. For those people, what vendors > do others on this list recommend? What have been your good/bad experiences? I'm using an HP DL585 quad Opteron with 16G RAM as a development box. It's working great. ;) Seriously though, I never really liked HP (or worse, Compaq) hardware before, but this box seems really well built, and I've yet to see a 'D' in the S column in top with the SA-6404/256 RAID card. If all goes as well as it has so far on this testbed I'll be deploying on a Slony-1 clustered set of 3 of these bad boys with 32G RAM each. Dollar-for-dollar, we're saving 90% (that's right, an order of magnitude) going this route, PG with linux-amd64 on HP/Opterons, as opposed to the E20K monster that was originally spec'd out. Mail me direct if you want the full spec list on this beast. And if there is a ready-made benchmark anyone would like me to run, just drop me a note. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Alternatives to Dell?
On Fri, 03 Dec 2004 06:38:50 -0800, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > That's true :) One of the reasons the compaq's are expensive > is they supposedly use a quad board, even for the dual machine. > Which means a different opteron chip as well. I can confirm that. You have a choice of CPUs, but all the DL585s are expandable to 4 procs if you get the 800 series Opterons. Each CPU sits on it's own daughter board that links up the HyperTransport busses between all the others. Each CPU card has (I think...) 8 slots for DIMMS, for a max of 64G. > > I don't know this for a fact, it is just what one of their > "ahem" sales guys told me. > At least in that case they were being accurate. ;) -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] LIMIT causes SEQSCAN in subselect
ndex Cond: (source = 10) Total runtime: 0.101 ms (6 rows) -- query 3: if we were to drop the limit, since we're using a unique index oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from biblio.record where id = 10) b, (select * from biblio.metarecord_field_entry limit 1000) a where a.source = b.id; QUERY PLAN Nested Loop (cost=0.00..41.97 rows=5 width=22) (actual time=1.169..1.169 rows=0 loops=1) -> Index Scan using biblio_record_pkey on record (cost=0.00..3.67 rows=1 width=22) (actual time=0.036..0.038 rows=1 loops=1) Index Cond: (id = 10) -> Subquery Scan a (cost=0.00..38.25 rows=5 width=16) (actual time=1.126..1.126 rows=0 loops=1) Filter: (source = 10) -> Limit (cost=0.00..25.75 rows=1000 width=87) (actual time=0.005..0.673 rows=1000 loops=1) -> Seq Scan on metarecord_field_entry (cost=0.00..43379.75 rows=1684575 width=87) (actual time=0.004..0.424 rows=1000 loops=1) Total runtime: 1.243 ms (8 rows) -- query 4: what I would like the seqscan in query 3 to become... oils4=# EXPLAIN ANALYZE select * from biblio.metarecord_field_entry where source = 10 limit 1000; QUERY PLAN -- Limit (cost=0.00..16.19 rows=9 width=87) (actual time=0.026..0.035 rows=7 loops=1) -> Index Scan using metarecord_field_entry_source_idx on metarecord_field_entry (cost=0.00..16.19 rows=9 width=87) (actual time=0.025..0.032 rows=7 loops=1) Index Cond: (source = 10) Total runtime: 0.069 ms (4 rows) -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Which is more efficient?
Hi, I have data that I am taking from 2 tables, pulling out specific columns and inserting into one table. Is it more efficient to do: a) insert into x select z from y; insert into x select z from a; b) insert into x select z from y union all select z from a; I have run both through explain. a) 650ms b) 741.57ms According to the planner option a, select z from y takes 545.93 ms Under option b select z from y takes 553.34 ms Shouldn't the time predicted for the select z from y be the same? I would believe b would be more efficient as the inserts could be done in a batch rather than individual transactions but the planner doesn't recognize that. When I run option a through the planner I have to highlight each insert separately since the planner stops executing after the first ; it comes across. Mike ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] LIMIT causes SEQSCAN in subselect
On Fri, 10 Dec 2004 21:40:18 -0800, Josh Berkus <[EMAIL PROTECTED]> wrote: > Mike, > The fact that the estimator knows that the LIMIT is pointless because there > are less rows in the subselect than the LIMIT will return is not something we > want to count on; sometimes the estimator has innaccurate information. The > UNIQUE index makes this more certain, except that I'm not sure that the > planner distinguishes between actual UNIQUE indexes and columns which are > estimated unique (per the pg_stats). And I think you can see in your case > that there's quite a difference between a column we're CERTAIN is unique, > versus a column we THINK is unique. Absolutely. At first I was going to ask if perhaps using the stats to discard the LIMIT would be possible, but since the stats are only guidelines I dropped that. The stats are just so tempting! > > > I realize this is a rather specialized case and not really great form. > > Exactly. You've grasped the main issue: that this has not been optimized > because it's bizarre and not very sensible query writing. Someday we'll get > around to optimizing the really wierd queries, but there's still a lot of > work to be done on the common ones (like count(*) ...). Absolutely. And if I can help out with the common cases to gain some Karmic currency I will. ;) After thinking about it some more, I don't think those queries we really all that wacky though. The problem with the example is that the generated query is very simple, and real-world queries that would be used in the subselect would be much more complex, and row estimation would be untrustworthy without a UNIQUE index. > > Keep in mind that the only reason we support LIMIT inside subqueries in the > first place is a workaround to slow aggregates, and a way to do RANK. It's > certainly not SQL-standard. > No it's not, but then nobody ever accused the authors of the SQL spec of being omniscient... I' cant think of another way to get, say, a 'top 10' list from a subselect, or use a paging iterator (LIMIT .. OFFSET ..) as the seed for an outer query. Well, other than an SRF of course. > > Just a matter of > > defining result sets independently, and creating a simple wrapper to > > join them. > > Well, if you think so, you know where to submit patches ... > Well, I do, but I was talking about it being 'easy' in the middleware. Just let PG handle optimizing the subselects. For example, you have a pile of predefined SELECTS that don't know they are related and are used for simple lookups. You tell the SQL generator thingy that it should use two of those, queries A and B, that they are related on x, and that you want to see the 'id' from A and the 'value' from B. Instead of having to preplan every possible combination of JOINS the SQL generator will toss the preplanned ones into subselects and join them in the outer query instead of having to rip them apart and calculate the join syntax. And yes, I know that view will take care of most of that for me... :) Thanks for all your comments. Pretty much what I expected, but I thought I'd raise a use case. I'll just have to give the query builder more smarts. > -- > Josh Berkus > Aglio Database Solutions > San Francisco > -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Low Performance for big hospital server ..
William Yu wrote: [EMAIL PROTECTED] wrote: Yes , vacuumdb daily. Do you vacuum table by table or the entire DB? I find over time, the system tables can get very bloated and cause a lot of slowdowns just due to schema queries/updates. You might want to try a VACUUM FULL ANALYZE just on the system tables. A REINDEX of the system tables in stand-alone mode might also be in order, even for a 7.4.x database: http://www.postgresql.org/docs/7.4/interactive/sql-reindex.html If a dump-reload-analyze cycle yields significant performance improvements then we know it's due to dead-tuple bloat - either heap tuples or index tuples. Mike Mascari ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgres Optimizer is not smart enough?
Litao Wu wrote: Hi All, Here is my test comparison between Postgres (7.3.2) optimizer vs Oracle (10g) optimizer. It seems to me that Postgres optimizer is not smart enough. Did I miss anything? Yeah, 7.4. 7.3.2 is *ancient*. Here's output from 7.4: [EMAIL PROTECTED] explain analyze test-# SELECT module, sum(action_deny) test-# FROM test test-# WHERE created >= ('now'::timestamptz - '1 test'# day'::interval) AND customer_id='100' test-# AND domain='100' test-# GROUP BY module; QUERY PLAN --- HashAggregate (cost=5.69..5.69 rows=1 width=13) (actual time=715.058..715.060 rows=1 loops=1) -> Index Scan using test_id1 on test (cost=0.00..5.68 rows=1 width=13) (actual time=0.688..690.459 rows=1 loops=1) Index Cond: ((customer_id = 100) AND (created >= '2005-01-11 17:52:22.364145-05'::timestamp with time zone) AND (("domain")::text = '100'::text)) Total runtime: 717.546 ms (4 rows) [EMAIL PROTECTED] create index test_id2 on test(domain); CREATE INDEX [EMAIL PROTECTED] analyze test; ANALYZE [EMAIL PROTECTED] [EMAIL PROTECTED] explain analyze test-# SELECT module, sum(action_deny) test-# FROM test test-# WHERE created >= ('now'::timestamptz - '1 test'# day'::interval) AND customer_id='100' test-# AND domain='100' test-# GROUP BY module; QUERY PLAN HashAggregate (cost=5.68..5.69 rows=1 width=13) (actual time=10.778..10.780 rows=1 loops=1) -> Index Scan using test_id2 on test (cost=0.00..5.68 rows=1 width=13) (actual time=10.702..10.721 rows=1 loops=1) Index Cond: (("domain")::text = '100'::text) Filter: ((created >= '2005-01-11 17:53:16.720749-05'::timestamp with time zone) AND (customer_id = 100)) Total runtime: 11.039 ms (5 rows) [EMAIL PROTECTED] select version(); PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.0 20040204 (prerelease) (1 row) Hope that helps, Mike Mascari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Tuning
On Wed, 9 Feb 2005 17:30:41 -0500, Chris Kratz <[EMAIL PROTECTED]> wrote: > The solutions appear to primarily be: > 1. Going to faster hardware of which probably Opterons would be about the only > choice. And even that probably won't be a huge difference. I'd beg to differ on that last part. The difference between a 3.6GHz Xeon and a 2.8GHz Opteron is ~150% speed increase on the Opteron on my CPU bound app. This is because the memory bandwidth on the Opteron is ENORMOUS compared to on the Xeon. Add to that the fact that you actually get to use more than about 2G of RAM directly and you've got the perfect platform for a high speed database on a budget. > 2. Moving to more materialized views and prepared statements where we can. Definitely worth investigating. I wish I could, but I can't get my customers to even consider slightly out of date stats :( > 3. Continue to tweak the sql behind our app. Short of an Opteron based system, this is by far your best bet. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Benchmark
I have never used Oracle myself, nor have I read its license agreement, but what if you didn't name Oracle directly? ie: TPS Database --- 112 MySQL 120 PgSQL 90 Sybase 95 "Other database that *may* start with a letter after N" 50 "Other database that *may* start with a letter after L" As far as I know there are only a couple databases that don't allow you to post benchmarks, but if they remain "unnamed" can legal action be taken? Just like all those commercials on TV where they advertise: "Cleans 10x better then the other leading brand". On Fri, 2005-02-11 at 00:22 -0500, Mitch Pirtle wrote: > On Thu, 10 Feb 2005 08:21:09 -0500, Jeff <[EMAIL PROTECTED]> wrote: > > > > If you plan on making your results public be very careful with the > > license agreements on the other db's. I know Oracle forbids the > > release of benchmark numbers without their approval. > > ...as all of the other commercial databases do. This may be off-topic, > but has anyone actually suffered any consequences of a published > benchmark without permission? > > For example, I am a developer of Mambo, a PHP-based CMS application, > and am porting the mysql functions to ADOdb so I can use grown-up > databases ;-) > > What is keeping me from running a copy of Mambo on a donated server > for testing and performance measures (including the commercial > databases) and then publishing the results based on Mambo's > performance on each? > > It would be really useful to know if anyone has ever been punished for > doing this, as IANAL but that restriction is going to be very, VERY > difficult to back up in court without precedence. Is this just a > deterrent, or is it real? > > -- Mitch > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Mike Benoit <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: [PERFORM] queries on huge tables
The most recent version of this thread starts here: http://archives.postgresql.org/pgsql-general/2005-03/msg00321.php . Search the archives for "table partition", "union view" and "partition inherits" and you should find most relevant discussions. Hope that helps! On Thu, 17 Mar 2005 15:01:43 +0100, Lending, Rune <[EMAIL PROTECTED]> wrote: > > Hello all. > > I am having a couple of tables with couple of hundre millions records in > them. The tables contains a timestamp column. > I am almost always interested in getting datas from a specific day or month. > Each day contains aprox. 400.000 entries. > > When I do such queries as " select ... from archive where m_date between > '2005-01-01' and '2005-02-01' group by ... " and so on. > It takes very long. I am having indexes that kicks in, but still it takes > sometime. > > I have splitted the archive table in smaller monthly tables, it then goes a > lot faster, but not fast enough. > > I know simular systems that uses Oracle and gains a lot on performance > because of the partioning. That kind of anoyes me a bit :) > > Does anyone of you have some good ideas on how speed up such queries on > huge tables? > > regards > rune > > > > -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Tsearch2 performance on big database
On Tue, 22 Mar 2005 15:36:11 +0300 (MSK), Oleg Bartunov wrote: > On Tue, 22 Mar 2005, Rick Jansen wrote: > > > Hi, > > > > I'm looking for a *fast* solution to search thru ~ 4 million records of book > > descriptions. I've installed PostgreSQL 8.0.1 on a dual opteron server with > > 8G of memory, running Linux 2.6. I haven't done a lot of tuning on > > PostgreSQL > > itself, but here's the settings I have changed so far: > > > > shared_buffers = 2000 (anything much bigger says the kernel doesnt allow > > it, > > still have to look into that) > > use something like > echo "15000" > /proc/sys/kernel/shmmax > to increase shared memory. In your case you could dedicate much more > memory. > > Regards, > Oleg And Oleg should know. Unless I'm mistaken, he (co)wrote tsearch2. Other than shared buffers, I can't imagine what could be causing that kind of slowness. EXPLAIN ANALYZE, please? As an example of what I think you *should* be seeing, I have a similar box (4 procs, but that doesn't matter for one query) and I can search a column with tens of millions of rows in around a second. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Tsearch2 performance on big database
On Tue, 22 Mar 2005 14:25:19 +0100, Rick Jansen <[EMAIL PROTECTED]> wrote: > > ilab=# explain analyze select count(titel) from books where idxfti @@ > to_tsquery('default', 'buckingham | palace'); > QUERY PLAN > > Aggregate (cost=35547.99..35547.99 rows=1 width=56) (actual > time=125968.119..125968.120 rows=1 loops=1) > -> Index Scan using idxfti_idx on books (cost=0.00..35525.81 > rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1) > Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery) > Total runtime: 125968.212 ms > (4 rows) > > Time: 125969.264 ms > ilab=# Ahh... I should have qualified my claim. I am creating a google-esqe search interface and almost every query uses '&' as the term joiner. 'AND' queries and one-term queries are orders of magnitude faster than 'OR' queries, and fortunately are the expected default for most users. (Think, "I typed in these words, therefore I want to match these words"...) An interesting test may be to time multiple queries independently, one for each search term, and see if the combined cost is less than a single 'OR' search. If so, you could use UNION to join the results. However, the example you originally gave ('terminology') should be very fast. On a comparable query ("select count(value) from metabib.full_rec where index_vector @@ to_tsquery('default','jane');") I get 12ms. Oleg, do you see anything else on the surface here? Try: EXPLAIN ANALYZE SELECT titel FROM books WHERE idxfti @@ to_tsquery('default', 'buckingham') UNION SELECT titel FROM books WHERE idxfti @@ to_tsquery('default', 'palace'); and see if using '&' instead of '|' where you can helps out. I imagine you'd be surprised by the speed of: SELECT titel FROM books WHERE idxfti @@ to_tsquery('default', 'buckingham&palace'); > > > As an example of what I think you *should* be seeing, I have a similar > > box (4 procs, but that doesn't matter for one query) and I can search > > a column with tens of millions of rows in around a second. > > > > That sounds very promising, I'd love to get those results.. could you > tell me what your settings are, howmuch memory you have and such? 16G of RAM on a dedicated machine. shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each work_mem = 10240# min 64, size in KB maintenance_work_mem = 100 # min 1024, size in KB # big m_w_m for loading data... random_page_cost = 2.5 # units are one sequential page fetch cost # fast drives, and tons of RAM -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] 8x2.5" or 6x3.5" disks
You don't mention the capacity of the disks you are looking at. Here is something you might want to consider. I've seen a few performance posts on using different hardware technologies to gain improvements. Most of those comments are on raid, interface and rotation speed. One area that doesn't seem to have been mentioned is to run your disks empty. One of the key roadblocks in disk performance is the time for the disk heads to seek, settle and find the start of the data. Another is the time to transfer from disk to interface. Everyone may instinctively know this but its often ignored. Hard disks are CRV ( constant rotational velocity) = they spin at the same speed all the time Hard disk drives use a technology called ZBR = Zone Bit Recording = a lot more data on the outside tracks than the inner ones. Hard disk fill up from outside track to inside track generally unless you've done some weird partitioning. On the outside of the disk you get a lot more data per seek than on the inside. Double whammy you get it faster. Performance can vary more than 100% between the outer and inner tracks of the disk. So running a slower disk twice as big may give you more benefit than running a small capacity 15K disk full. The slower disks are also generally more reliable and mostly much cheaper. The other issue for full disks especially with lots of random small transactions is the heads are seeking and settling across the whole disk but typically with most of those seeks being on the latest transactions which are placed nicely towards the middle of the disk. I know of a major bank that has a rule of thumb 25% of the disk partioned as a target maximum for high performance disk systems in a key application. They also only pay for used capacity from their disk vendor. This is not very green as you need to buy more disks for the same amount of data and its liable to upset your purchasing department who won't understand why you don't want to fill your disks up. Mike
Re: [PERFORM] 8x2.5" or 6x3.5" disks
[presumably the empty-disk effect could also be achieved by partitioning, say 25% of the drive for the database, and 75% empty partition. But in fact, you could use that "low performance 75%" for rarely-used or static data, such as the output from pg_dump, that is written during non-peak times] Larry Ellison financed a company called Pillar Data Systems which was founded on the principle that you can tier the disk according to the value and performance requirements of the data. They planned to put the most valuable in performance terms on the outside of SATA disks and use the empty space in the middle for slower stuff.. (This is not an advert. I like the idea but I dont know if it works well and I dont have anything to do with Pillar other than EnterpriseDB compete against Larry's other little company). Probably the way to go is flash drives for primary performance data . EMC and others have announced Enterprise Flash Drives (they claim 30 times performance of 15K disks although at 30 times the cost of standard disk today ). Flash should also have pretty much consistent high performance across the whole capacity. Within a couple of years EFD should be affordable for mainstream use.
[PERFORM] Implications of having large number of users
Hi there, Please help me to make a decision on how to manage users. For some reason it is easier in the project I'm working on to split data by schemes and assign them to Postgres' users (I mean those created with CREATE USER) rather than support 'owner' fields referring to a global users table. The question is what could be the consequences of having a large number of them (tens of thousands)? Context: - it is a web app - thousands of concurrent requests from different users - amount of user's data in the db is relatively small Concerns: - how big is the performance/memory penalty on switching users in the same connection (connections are reused of course)? - will it hurt the cache? - are prepared statements kept per user or per connection? - is the query planner global or somehow tied to users? I'd be glad to hear any opinions/suggestions. Best regards, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Implications of having large number of users
I'd be glad to hear any opinions/suggestions. Many thanks to everyone who responded! Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Hi Sean, Sean Ma wrote: One months ago, this type of slow query happened about a few time per day. But recently, the slow query happens more frequent at the rate of once every 10 minutes or less. There seesm not relation to th What is your hardware (memory, CPU type and such)? This seems like a cache issue to me, but I can't tell for sure without some additional information on your system: 1) What is the amount of a) available memory b) free memory and c) memory available to i/o buffers? 2) What is the swap usage if any? 3) What is the CPU load? Any noticeable patterns in CPU load? You can use /usr/bin/top to obtain most of this information. Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Hi Sean, Well, the overall impression is your machine is badly overloaded. Look: top - 10:18:58 up 224 days, 15:10, 2 users, load average: 6.27, 7.33, 6 The load average of 6.5 means there are six and a half processes competing for the same CPU (and this system apparently has only one). This approximately equals to 500% overload. Recommendation: either add more CPU's or eliminate process competition by moving them to other boxes. Tasks: 239 total, 1 running, 238 sleeping, 0 stopped, 0 zombie This supports what I said above. There are only 92 processes running on my laptop and I think it is too much. Do you have Apache running on the same machine? Cpu(s): 5.0%us, 0.7%sy, 0.0%ni, 61.5%id, 32.7%wa, 0.0%hi, 0.1%si, 0 Waiting time (wa) is rather high, which means processes wait on locks or for IO, another clue for concurrency issues on this machine. Mem: 32962804k total, 32802612k used, 160192k free, 325360k buffers Buffers are about 10% of all the memory which is OK, but I tend to give buffers some more room. Recommendation: eliminate unneeded processes, decrease (yes, decrease) the Postgres cache buffers if they are set too high. Swap: 8193140k total, 224916k used, 7968224k free, 30829456k cached 200M paged out. It should be zero except of an emergency. 3G of cached swap is a sign of some crazy paging activity in thepast. Those unexplainable slowdowns are very likely caused by that. Didn't really see the pattern, typical the cpu load is only about 40% 40% is too much, really. I start worrying when it is above 10%. Conclusion: - the system bears more load than it can handle - the machine needs an upgrade - Postges is competing with something (presumably Apache) - separate them. That should help. Cheers, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Sean, Yes, besides another mysql server running on the same server, Which is a really bad idea :-) The postgres shared cache is at 4G, is that too big? OK, I have misread the total memory amount which was 32G, and I thought it was 3G. Thanks to Scott Marlow who pointed that out. In this case 4G for shared buffers is good. Actually, I take back my words on swap, too. 200M swapped is less important when you have a plenty of memory. Regards, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Scott Marlowe wrote: The postgres shared cache is at 4G, is that too big? Not for a machine with 32Gig of ram. He could even add some more. Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Hi Scott, Well, we can't be sure OP's only got one core. In fact, we can, Sean posted what top -b -n 1 says. There was only one CPU line. the number of cores, it's the IO subsystem is too slow for the load. More cores wouldn't fix that. While I agree on the IO, more cores would definitely help to improve ~6.5 load average. My production PG server that runs ONLY pg has 222 processes on it. It's no big deal. Unless they're all trying to get cpu time, which generally isn't the case. 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239. More likely just a slow IO subsystem. Like a single drive or something. adding drives in a RAID-1 or RAID-10 etc usually helps. Absolutely. This is kernel buffers, not pg buffers. It's set by the OS semi-automagically. In this case it's 325M out of 32 Gig, so it's well under 10%, which is typical. You can control the FS buffers indirectly by not allowing running processes to take too much memory. If you have like 40% free, there are good chances the system will use that memory for buffers. If you let them eat up 90% and swap out some more, there is no room for buffers and the system will have to swap out something when it really needs it. Not true. Linux will happily swap out seldom used processes to make room in memory for more kernel cache etc. You can adjust this tendency by setting swappiness. This is fine until one of those processes wakes up. Then your FS cache is dumped. It's 30G btw, Yeah, I couldn't believe my eyes :-) > 3G of cached swap and it's not swap that's cached, it's the kernel using extra memory to cache data to / from the hard drives. Oh please.. it *is*: http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314 It's normal, and shouldn't worry anybody. In fact it's a good sign that you're not using way too much memory for any one process. It says exactly the opposite. Really? I have eight cores on my production servers and many batch jobs I run put all 8 cores at 90% for extended periods. Since that machine is normally doing a lot of smaller cached queries, it hardly even notices. The OP's machine is doing a lot of write ops, which is different. Yes, more hard drives / better caching RAID controller. +1 BTW, nearly full file system can be another source of problems. Cheers, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Scott Marlowe wrote: Also think about it, the OP has 8G of swap and 30Gig cached. How / why would you be caching 30Gigs worth of data when there's only 8G to cache anyway? You're right, I have misread it again :-) Cheers, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Scott Marlowe wrote: Close, but it'll use that memory for cache. Large buffers are not typical in linux, large kernel caches are. OK, we're talking about different things. You're right. If that tutorial says that, then that tutorial is wrong. I'm guessing what that tutorial is talking about, and what top is saying are two very different things though. Then it is an amazingly common misconception. I guess it first appeared in some book and then reproduced by zillion blogs. Essentially this is what Goolgle brings you on 'swap cache' query. Thanks for clearing that out. It's normal, and shouldn't worry anybody. In fact it's a good sign that you're not using way too much memory for any one process It says exactly the opposite. This time I agree :-) Cheers, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Scott Carey wrote: 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239 That's not rediculous at all. Modern OS's handle thousands of idle processes just fine. I meant that 27 was a ridiculously small number. Or you can control the behavior with the following kenrnel params: vm.swappiness vm.dirty_ratio vm.dirty_background ratio Thanks for pointing that out! Actually, no. When a process wakes up only the pages that are needed are accessed. For most idle processes that wake up from time to time, a small bit of work is done, then they go back to sleep. This initial allocation does NOT come from the page cache, but from the "buffers" line in top. The os tries to keep some ammount of free buffers not allocated to processes or pages available, so that allocation demands can be met without having to synchronously decide which buffers from page cache to eject. Wait a second, I'm trying to understand that :-) Did you mean that FS cache pages are first allocated from the buffer pages or that process memory being paged out to swap is first written to buffers? Could you clarify please? If queries are intermittently causing problems, it might be due to checkpoints. Make sure that the kernel parameters for dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less. Scott, isn't dirty_ratio supposed to be less than dirty_background_ratio? I've heard that system would automatically set dirty_ratio = dirty_background_ratio / 2 if that's not the case. Also, how dirty_ratio could be less than 5 if 5 is the minimal value? Regards, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Scott Carey wrote: the OS can either quickly allocate to the process or the page cache from the free buffers, or more slowly take from the page cache, or even more slowly page out a process page. Aha, now it all makes sense. I like to use the '5 second rule'. dirty_background_ratio should be sized so that it takes about 5 seconds to flush to disk in optimal conditions. dirty_ratio should be 2x to 5x this depending on your application's needs -- for a system with well tuned postgres checkpoints, smaller tends to be better to limit stalls while waiting for the checkpoint fsync to finish. Thanks a lot, this is invaluable information. Regards, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] - Slow Query
Hi Rui, i have this query (i think is a simple one) Could you EXPLAIN ANALYZE the query and show the results please? Thanks, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] - Slow Query
> Merge Join (cost=111885.70..319492.88 rows=13016048 width=620) The outermost merge join has to go through 13 million rows. If you remove "distinct on (bien.uid)", you'll see that. > LEFT outer JOIN ville ON ville.uid = bien.ref_ville > LEFT outer JOIN freguesia_ville ON freguesia_ville.ref_ville =ville.uid This is not enough. You have to add this condition as well: AND bien.ref_ville = freguesia_ville.ref_ville In other words, when you link three tables by a common field, all three relationships should be explicitly expressed, otherwise you'll have this type of explosive row multiplication. Although I don't quite understand the purpose of the query, I don't think you need all those OUTER joins. Regards, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] - Slow Query
Rui Carvalho wrote: hum thanks a lot for the quick answer, if is not abuse of your patience what is the best alternative to the LEFT OUTER JOINS? I meant I wasn't sure whether you really meant *outer* joins. Too many of them looked kinda suspicious :-) If you *do* need them, then there is no alternative, as Scott said. Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1
ning wrote: The log is really long, Which usually signals a problem with the query. but I compared the result of "explain analyze" for first and later executions, except for 3 "time=XXX" numbers, they are identical. They are supposed to be identical unless something is really badly broken. I agree with you that PostgreSQL is doing different level of caching, I just wonder if there is any way to speed up PostgreSQL in this scenario, This is what EXPLAIN ANALYZE for. Could you post the results please? Cheers, Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1
Interesting. It's quite a hairy plan even though all the branches are cut off by conditions ("never executed") so the query yields 0 rows. 0.018 is not a bad timing for that. However, if you run this query with different parameters, the result could be quite sad. There are some deeply nested loops with joins filtered by inner seq scans; this can be extremely expensive. Also, note that Left Merge Join with 16243 rows being reduced into just 1. With a database like DB2, the results you had are quite predictable: slow first time execution (because of the ineffective query) and then fast consequent responses because the tiny resultset produced by the query can be stored in the memory. Now, with Postgres the picture is different: all this complex stuff has to be executed each time the query is sent. I would rather rewrite the query without inner selects, using straight joins instead. Also, try to filter things before joining, not after. Correct me if I'm wrong, but in this particular case this seems pretty much possible. Cheers, Mike ning wrote: Hi Mike, Thank you for your explanation. The "explain analyze" command used is as follows, several integers are bound to '?'. - SELECT oid_,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval FROM (SELECT attributeOf,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval FROM DenormAttributePerf WHERE attributeof IN (SELECT oid_ FROM JobView WHERE JobView.JobId=? and JobView.assignedTo_=?) AND nameId in (?)) x RIGHT OUTER JOIN (SELECT oid_ FROM JobView WHERE JobView.JobId=? and JobView.assignedTo_=?) y ON attributeof = oid_ FOR READ ONLY - The result of the command is - QUERY PLAN --- Nested Loop Left Join (cost=575.60..1273.15 rows=81 width=568) (actual time=0.018..0.018 rows=0 loops=1) Join Filter: (x.attributeof = j1.oid_) -> Index Scan using job_tc1 on job j1 (cost=0.00..8.27 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1) Index Cond: ((assignedto_ = 888) AND (jobid = 0)) -> Merge Left Join (cost=575.60..899.41 rows=16243 width=564) (never executed) Merge Cond: (v.void = b.void) -> Merge Left Join (cost=470.77..504.87 rows=2152 width=556) (never executed) Merge Cond: (v.void = res.void) -> Sort (cost=373.61..374.39 rows=310 width=544) (never executed) Sort Key: v.void -> Hash Left Join (cost=112.07..360.78 rows=310 width=544) (never executed) Hash Cond: (v.void = i.void) -> Hash Left Join (cost=65.40..303.17 rows=38 width=540) (never executed) Hash Cond: (v.void = r.void) -> Hash Left Join (cost=21.42..257.86 rows=5 width=532) (never executed) Hash Cond: (v.void = s.void) -> Nested Loop Left Join (cost=8.27..244.65 rows=5 width=16) (never executed) Join Filter: (v.containedin = a.id) -> Nested Loop (cost=8.27..16.57 rows=1 width=12) (never executed) -> HashAggregate (cost=8.27..8.28 rows=1 width=4) (never executed) -> Index Scan using job_tc1 on job j1 (cost=0.00..8.27 rows=1 width=4) (never executed) Index Cond: ((assignedto_ = 888) AND (jobid = 0)) -> Index Scan using attribute_tc1 on attribute a (cost=0.00..8.27 rows=1 width=12) (never executed) Index Cond: ((a.attributeof = j1.oid_) AND (a.nameid = 6)) -> Append (cost=0.00..137.60 rows=7239 width=12) (never executed) -> Index Scan using attribute_value_i on attribute_value v (cost=0.00..5.30 rows=9 width=12) (never executed) Index Cond: (v.containedin = a.id) -> Seq Scan on string_value v (cost=0.00..11.40 rows=140 width=12) (never executed) -> Seq Scan on integer_value v (cost=0.00..26.30 rows=1630 width=12) (never executed) -> Seq Scan on bigint_value v (cost=0.00..25.10 rows=1510 width=12) (never executed)
Re: [PERFORM] select query performance question
Hi Thomas, How is 'messungen_v_dat_2009_04_13_gtyp_minute_tag_idx' defined? What is the row count for the table? Mike Hi, subject is the following type of query needed in a function to select data: SELECT ' 13.04.2009 12:00:00 ' AS zeit, 'M' AS ganglinientyp, m.zs_nr AS zs, j_ges, de_mw_abh_j_lkw(mw_abh) AS j_lkw, de_mw_abh_v_pkw(mw_abh) AS v_pkw, de_mw_abh_v_lkw(mw_abh) AS v_lkw, de_mw_abh_p_bel(mw_abh) AS p_bel FROM messungen_v_dat_2009_04_13 m INNER JOIN de_mw w ON w.nr = m.mw_nr WHERE m.ganglinientyp = 'M' AND ' 890 ' = m.minute_tag; explain analyse brings up Nested Loop (cost=0.00..66344.47 rows=4750 width=10) (actual time=134.160..19574.228 rows=4148 loops=1) -> Index Scan using messungen_v_dat_2009_04_13_gtyp_minute_tag_idx on messungen_v_dat_2009_04_13 m (cost=0.00..10749.14 rows=4750 width=8) (actual time=64.681..284.732 rows=4148 loops=1) Index Cond: ((ganglinientyp = 'M'::bpchar) AND (891::smallint = minute_tag)) -> Index Scan using de_nw_nr_idx on de_mw w (cost=0.00..10.69 rows=1 width=10) (actual time=4.545..4.549 rows=1 loops=4148) Index Cond: (w.nr = m.mw_nr) Total runtime: 19590.078 ms Seems quite slow to me. Is this query plan near to optimal or are their any serious flaws? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Number of tables
Fabio La Farcioli wrote: Hi to all, i am developing a web app for thousands users (1.000/2.000). Each user have a 2 table of work...I finally have 2.000 (users) x 2 tables = 4.000 tables! As a someone with a ~50K-table database, I can tell you it's definitely possible to survive with such a layout :-) However, expect very slow (hours) pg_dump, \dt and everything else that requires reading schema information for the whole db. Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Number of tables
Greg Stark wrote: You may not expect to be need to run queries which combine multiple users' data now but you will eventually. We store cross-user data in a separate schema, which solves all *our* problems. This doesn't work so great when each user is going to be specifying their own custom schema on the fly This works fine, at least we didn't encounter any issues with that. but that's not really what relational databases were designed for. Sometimes you have to.. you know, unusual things to meet some specific requirements, like independent user schemas. It's not a conventional web app we run :-) I'm not arguing this is a bit extremal approach, but if one is forced to go this path, it's quite walkable ;-) Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What's faster?
> Because Postgres requires VACUUM ANALYZE more frequently on updated tables, > should I break this single field out into its own table, and if so what kind > of a speed up can I expect to achieve. I would be appreciative of any > guidance offered. Unless that field is part of the key, I wouldn't think that a vacuum analyze would be needed, as the key distribution isn't changing. I don't know if that is still true if that field is indexed. Tom? Even then, as I understand things vacuum analyze doesn't rebuild indexes, so I could see a need to drop and rebuild indexes on a regular basis, even if you move that field into a separate table. -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Very slow update + not using clustered index
)::text -> Seq Scan on inventory i (cost=0.00..16173.92 rows=882192 width=14) (actual time=0.088..1942.173 rows=882192 loops=1) Total runtime: 174926.115 ms So, my first question is: why is the planner still sorting on price when isbn seems (considerably) quicker, and how can I force it to sort by isbn(if I even should)? The second question is: why, oh why does the update take such and obscenely long time to complete? The 175s (and even 216s) for the select seems reasonable given the size of the tables, but not 3000s to update the same rows. The processor (AMD 1.3GHz) is 90%+ utilization for most of the execution time. I can post more information if it would be helpful, but this post is long enough already. TIA, and happy new year. -mike -- Mike Glover Key ID BFD19F2C <[EMAIL PROTECTED]> pgp0.pgp Description: PGP signature
Re: [PERFORM] Very slow update + not using clustered index
Tom- Thanks for the quick response. More details are inline. -mike On Thu, 01 Jan 2004 23:06:11 -0500 Tom Lane <[EMAIL PROTECTED]> wrote: > Mike Glover <[EMAIL PROTECTED]> writes: > AFAICS these plans are identical, and therefore the difference in > runtime must be ascribed to the time spent actually doing the updates. > It seems unlikely that the raw row inserts and updating the single > index could be quite that slow --- perhaps you have a foreign key > or trigger performance problem? There are no foreign keys or triggers for either of the tables. > Is this PG 7.4? Yes, PG 7.4 > > A quick experiment shows that if the planner does not have any reason > to prefer one ordering over another, the current coding will put the > last WHERE clause first: [snip]> > and so you could probably improve matters just by switching the order > of your WHERE clauses. Of course this answer will break as soon as > anyone touches any part of the related code, so I'd like to try to fix > it so that there is actually a principled choice made. Could you send > along the pg_stats rows for these columns? > It looks like the planner is already making a principled choice: bookshelf=> explain select s.* from summary s, inventory i where s.isbn = i.isbn and s.price_min = i.price; QUERY PLAN --- Merge Join (cost=491180.66..512965.72 rows=9237 width=58) Merge Cond: (("outer".price_min = "inner".price) AND ("outer"."?column8?" = "inner"."?column3?")) -> Sort (cost=361887.05..367000.05 rows=2045201 width=58) Sort Key: s.price_min, (s.isbn)::text -> Seq Scan on summary s (cost=0.00..44651.01 rows=2045201 width=58) -> Sort (cost=129293.61..131499.09 rows=882192 width=25) Sort Key: i.price, (i.isbn)::text -> Seq Scan on inventory i (cost=0.00..16173.92 rows=882192 width=25) (8 rows) bookshelf=> explain select s.* from summary s, inventory i where s.price_min = i.price and s.isbn = i.isbn; QUERY PLAN --- Merge Join (cost=491180.66..512965.72 rows=9237 width=58) Merge Cond: (("outer".price_min = "inner".price) AND ("outer"."?column8?" ="inner"."?column3?")) -> Sort (cost=361887.05..367000.05 rows=2045201 width=58) Sort Key: s.price_min, (s.isbn)::text -> Seq Scan on summary s (cost=0.00..44651.01 rows=2045201 width=58) -> Sort(cost=129293.61..131499.09 rows=882192 width=25) Sort Key: i.price, (i.isbn)::text -> Seq Scan on inventory i (cost=0.00..16173.92 rows=882192 width=25) (8 rows) Here are the pg_stats rows: bookshelf=> select * from pg_stats where schemaname='de' and tablename='inventory' and attname='isbn'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---+-+---+---+ +--+---+--- --- --+- de | inventory | isbn| 0 |14 | -1 | | | {0002551543,0198268211,0375507299,0486231305,0673395197,0767901576,0810 304430,0865738890,0931595029,1574160052,9971504014} | 1(1 row) bookshelf=> select * from pg_stats where schemaname='de' and tablename='inventory' and attname='price'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---+-+---+---+ +--+--- --- ---+--- --+- de | inventory | price | 0 |11 | 1628 | {59.95,0.00,54.88,53.30,60.50,64.25,73.63,49.39,50.02,53.37} | {0.259667,0.0063,0.0053,0.0047,0.0047,0.0047,0.0046 6667,0.0043,0.004,0.004
Re: [PERFORM] Slow query problem
On Thu, 08 Jan 2004 16:52:05 +1100 Bradley Tate <[EMAIL PROTECTED]> wrote: > Am I correct in interpreting that most time was spent doing the > sorting? looks so. your table is about 70MB total size, and its getting loaded completely into memory (you have 12000 * 8k = 96M available). 26s to load 70MB from disk seems reasonable. The rest of the time is used for sorting. > Explain confuses the heck out of me and any help on how I could make > this run faster would be gratefully received. > You should bump sort_mem as high as you can stand. with only 8MB sort memory available, you're swapping intermediate sort pages to disk -- a lot. Try the query with sort_mem set to 75MB (to do the entire sort in memory). -mike > Cheers, > > Bradley. > > > ---(end of > broadcast)--- TIP 8: explain analyze is your > friend -- Mike Glover Key ID BFD19F2C <[EMAIL PROTECTED]> pgp0.pgp Description: PGP signature
Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?
> Seriously, I am tired of this kind of question. You gotta get bold > enough to stand up in a "meeting" like that, say "guy's, you can ask me > how this compares to Oracle ... but if you're seriously asking me how > this compares to MySQL, call me again when you've done your homework". Can they call you at the unemployment office? -- Mike Nolan ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] RAID or manual split?
It seems, that if I know the type and frequency of the queries a database will be seeing, I could split the database by hand over multiple disks and get better performance that I would with a RAID array with similar hardware. Most of the data is volatile and easily replaceable (and the rest is backed up independently), so redundancy isn't importand, and I'm willing to do some ongoing maintenance if I can get a decent speed boost. Am I misguided, or might this work? details of my setup are below: Six large (3-7 Mrow) 'summary' tables, each being updated continuously by 5-20 processes with about 0.5 transactions/second/process. Periodically (currently every two weeks), join queries are performed between one of the 'summary' tables(same one each time) and each of the other five. Each join touches most rows of both tables, indexes aren't used. Results are written into a separate group of 'inventory' tables (about 500 Krow each), one for each join. There are frequent (100-1000/day) queries of both the inventory and summary tables using the primary key -- always using the index and returning < 10 rows. We're currently getting (barely) acceptable performance from a single 15k U160 SCSI disk, but db size and activity are growing quickly. I've got more disks and a battery-backed LSI card on order. -mike -- Mike Glover GPG Key ID BFD19F2C <[EMAIL PROTECTED]> pgp0.pgp Description: PGP signature
Re: [PERFORM] Fixed width rows faster?
> Frankly, the only reason to use anything other than TEXT is compatibility with > other databases and applications. You don't consider a requirement that a field be no longer than a certain length a reason not to use TEXT? -- Mike Nolan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Fixed width rows faster?
> >>You don't consider a requirement that a field be no longer than a > >>certain length a reason not to use TEXT? > > Can't you just create a TEXT(255) field same as you can just create > VARCHAR (with no length) field? I think they're basically synonyms for > each other these days. I'll defer to the SQL standard gurus on this, as well as to the internals guys, but I suspect there is a difference between the standard itself and implementor details, such as how char, varchar, varchar2 and text are implemented. As long as things work as specified, I don't think the standard cares much about what's happening behind the curtain. -- Mike Nolan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Fixed width rows faster?
> Actually, I don't. Good reason to have a check constraint on it though > (hint, check constraints can be changed while column types cannot be, at > this moment). Is there a way to copy a table INCLUDING the check constraints? If not, then that information is lost, unlike varchar(n). -- Mike Nolan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster