Re: [PERFORM] Help with tuning this query (with explain analyze finally)
2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE; Actually, it was 312 milliseconds, so it got worse. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] name search query speed
I have about 5M names stored on my DB. Currently the searches are very quick unless, they are on a very common last name ie. SMITH. The Index is always used, but I still hit 10-20 seconds on a SMITH or Jones search, and I average about 6 searches a second and max out at about 30/s. Any suggestions on how I could arrange things to make this search quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I can increase this speed w/o a HW upgrade. thanx, -jj- -- You probably wouldn't worry about what people think of you if you could know how seldom they do. -- Olin Miller. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] name search query speed
I'm not sure what the answer is but maybe I can help? Would clustering the name index make this faster? I thought that would bunch up the pages so the names were more or less in order, which would improve search time. Just a guess though. Ken - Original Message - From: "Jeremiah Jahn" <[EMAIL PROTECTED]> To: "postgres performance" Sent: Thursday, March 03, 2005 11:38 AM Subject: [PERFORM] name search query speed I have about 5M names stored on my DB. Currently the searches are very quick unless, they are on a very common last name ie. SMITH. The Index is always used, but I still hit 10-20 seconds on a SMITH or Jones search, and I average about 6 searches a second and max out at about 30/s. Any suggestions on how I could arrange things to make this search quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I can increase this speed w/o a HW upgrade. thanx, -jj- -- You probably wouldn't worry about what people think of you if you could know how seldom they do. -- Olin Miller. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] name search query speed
yes, it does. I forgot to mention, that I also have clustering on that table by my name_field index. My Bad. On Thu, 2005-03-03 at 12:00 -0500, Ken Egervari wrote: > I'm not sure what the answer is but maybe I can help? Would clustering the > name index make this faster? I thought that would bunch up the pages so the > names were more or less in order, which would improve search time. Just a > guess though. > > Ken > > - Original Message - > From: "Jeremiah Jahn" <[EMAIL PROTECTED]> > To: "postgres performance" > Sent: Thursday, March 03, 2005 11:38 AM > Subject: [PERFORM] name search query speed > > > >I have about 5M names stored on my DB. Currently the searches are very > > quick unless, they are on a very common last name ie. SMITH. The Index > > is always used, but I still hit 10-20 seconds on a SMITH or Jones > > search, and I average about 6 searches a second and max out at about > > 30/s. Any suggestions on how I could arrange things to make this search > > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > > can increase this speed w/o a HW upgrade. > > > > thanx, > > -jj- > > > > > > > > -- > > You probably wouldn't worry about what people think of you if you could > > know how seldom they do. > >-- Olin Miller. > > > > > > ---(end of broadcast)--- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- You probably wouldn't worry about what people think of you if you could know how seldom they do. -- Olin Miller. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
Ken, Well, I'm a bit stumped on troubleshooting the actual query since Windows' poor time resolution makes it impossible to trust the actual execution times. Obviously this is something we need to look into for the Win32 port for 8.1 .. > shared_buffers = 1000 This may be slowing up that merge join. Try resetting it to 6000.I'm not sure what system settings you might have to do on Windows to get it to support higher shared buffers; see the docs. > #work_mem = 1024 Up this to 4096 for testing purposes; your production value will vary depending on several factors; see link below. > #effective_cache_size = 1000 Increase this to the actual amount of RAM you have available, about 750MB (you do the math) > #random_page_cost = 4 Leave this for now. See www.powerpostgresql.com/PerfList for more information. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] name search query speed
Jeremiah Jahn wrote: I have about 5M names stored on my DB. Currently the searches are very quick unless, they are on a very common last name ie. SMITH. The Index is always used, but I still hit 10-20 seconds on a SMITH or Jones search, and I average about 6 searches a second and max out at about 30/s. Any suggestions on how I could arrange things to make this search quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I can increase this speed w/o a HW upgrade. thanx, -jj- It sounds like the problem is just that you have a lot of rows that need to be returned. Can you just put a limit on the query? And then change the client app to recognize when the limit is reached, and either give a link to more results, or refine query, or something like that. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] name search query speed
Jeremiah, > I have about 5M names stored on my DB. Currently the searches are very > quick unless, they are on a very common last name ie. SMITH. The Index > is always used, but I still hit 10-20 seconds on a SMITH or Jones > search, and I average about 6 searches a second and max out at about > 30/s. Any suggestions on how I could arrange things to make this search > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > can increase this speed w/o a HW upgrade. First off, see http://www.powerpostgresql.com/PerfList about your configuration settings. The problem you're running into with SMITH is that, if your query is going to return a substantial number of rows (variable, but generally anything over 5% of the table and 1000 rows) is not able to make effective use of an index. This makes it fall back on a sequential scan, and based on you execution time, I'd guess that the table is a bit too large to fit in memory. AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on your database, if you're still having problems post an EXPLAIN ANALYZE of the query to this list. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] name search query speed
Hi, Jeremiah, Jeremiah Jahn schrieb: > yes, it does. I forgot to mention, that I also have clustering on that > table by my name_field index. My Bad. Fine. Did you run ANALYZE and CLUSTER on the table after every large bunch of insertions / updates? Markus -- Markus Schaber | Dipl. Informatiker | Software Development GIS Fight against software patents in EU! http://ffii.org/ http://nosoftwarepatents.org/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] name search query speed
On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote: > Jeremiah Jahn wrote: > > >I have about 5M names stored on my DB. Currently the searches are very > >quick unless, they are on a very common last name ie. SMITH. The Index > >is always used, but I still hit 10-20 seconds on a SMITH or Jones > >search, and I average about 6 searches a second and max out at about > >30/s. Any suggestions on how I could arrange things to make this search > >quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > >can increase this speed w/o a HW upgrade. > > > >thanx, > >-jj- > > > > > > > > > > > It sounds like the problem is just that you have a lot of rows that need > to be returned. Can you just put a limit on the query? And then change > the client app to recognize when the limit is reached, and either give a > link to more results, or refine query, or something like that. Not really, about 2% of the returned rows are thrown away for security reasons based on the current user, security groups they belong to and different flags in the data itself. So the count for this is generated on the fly needed for pagination in the app which expresses the total number of finds, but only displays 40 of them. If any one knows a way to determine the total number of matches without needing to iterate through them using jdbc, I'm all ears as this would save me huge amounts of time and limit/offset would become an option. > > John > =:-> > -- "A power so great, it can only be used for Good or Evil!" -- Firesign Theatre, "The Giant Rat of Summatra" ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] name search query speed
On Thu, 2005-03-03 at 09:44 -0800, Josh Berkus wrote: > Jeremiah, > > > I have about 5M names stored on my DB. Currently the searches are very > > quick unless, they are on a very common last name ie. SMITH. The Index > > is always used, but I still hit 10-20 seconds on a SMITH or Jones > > search, and I average about 6 searches a second and max out at about > > 30/s. Any suggestions on how I could arrange things to make this search > > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > > can increase this speed w/o a HW upgrade. > > First off, see http://www.powerpostgresql.com/PerfList about your > configuration settings. > > The problem you're running into with SMITH is that, if your query is going to > return a substantial number of rows (variable, but generally anything over 5% > of the table and 1000 rows) is not able to make effective use of an index. > This makes it fall back on a sequential scan, and based on you execution > time, I'd guess that the table is a bit too large to fit in memory. > > AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on > your database, if you're still having problems post an EXPLAIN ANALYZE of the > query to this list. > ie. throw more hardware at it. All of the other things on the list, except for effective_cache_size have always been done. I bumped it up from the default to 260. Will see if that makes a difference. thanx, -jj- -- "A power so great, it can only be used for Good or Evil!" -- Firesign Theatre, "The Giant Rat of Summatra" ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] name search query speed
> -Original Message- > From: Jeremiah Jahn [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 03, 2005 2:15 PM > To: John A Meinel > Cc: postgres performance > Subject: Re: [PERFORM] name search query speed > > [...] > So the count for this is generated on the fly needed for > pagination in the app which expresses the total number of > finds, but only displays 40 of them. If any one knows a way > to determine the total number of matches without needing to > iterate through them using jdbc, I'm all ears as this would > save me huge amounts of time and limit/offset would become > an option. Is there a reason you can't do a count(field) query first? If so, you can get the number of records returned by setting absolute(-1) and getting the row number. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] name search query speed
doesn't that cause two queries? I used to do it that way and cut my time substantially by counting in-line. Even though the results were cached it still took more time. Also since the tables is constantly be updated the returned total would not always match the number of results on the second query. On Thu, 2005-03-03 at 14:26 -0600, Dave Held wrote: > > -Original Message- > > From: Jeremiah Jahn [mailto:[EMAIL PROTECTED] > > Sent: Thursday, March 03, 2005 2:15 PM > > To: John A Meinel > > Cc: postgres performance > > Subject: Re: [PERFORM] name search query speed > > > > [...] > > So the count for this is generated on the fly needed for > > pagination in the app which expresses the total number of > > finds, but only displays 40 of them. If any one knows a way > > to determine the total number of matches without needing to > > iterate through them using jdbc, I'm all ears as this would > > save me huge amounts of time and limit/offset would become > > an option. > > Is there a reason you can't do a count(field) query first? If > so, you can get the number of records returned by setting > absolute(-1) and getting the row number. > > __ > David B. Held > Software Engineer/Array Services Group > 200 14th Ave. East, Sartell, MN 56377 > 320.534.3637 320.253.7800 800.752.8129 -- "A power so great, it can only be used for Good or Evil!" -- Firesign Theatre, "The Giant Rat of Summatra" ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] name search query speed
Jeremiah Jahn wrote: On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote: ... Not really, about 2% of the returned rows are thrown away for security reasons based on the current user, security groups they belong to and different flags in the data itself. So the count for this is generated on the fly needed for pagination in the app which expresses the total number of finds, but only displays 40 of them. If any one knows a way to determine the total number of matches without needing to iterate through them using jdbc, I'm all ears as this would save me huge amounts of time and limit/offset would become an option. Well, what is wrong with "select count(*) from "? Are you saying 2% are thrown away, or only 2% are kept? Is this being done at the client side? Is there a way to incorporate the security info into the database, so that the query actually only returns the rows you care about? That seems like it would be a decent way to speed it up, if you can restrict the number of rows that it needs to look at. There are other alternatives, such as materialized views, or temp tables, where you select into the temp table the rows that the user would request, and then you generate limit/offset from that. The first query would be a little slow, since it would get all the rows, but all subsequent accesses for that user could be really fast. The other possibility is to do "limit 200", and then in your list of pages, you could have: 1, 2, 3, 4, 5, ... This means that you don't have to worry about getting 10,000 entries, which probably isn't really useful for the user anyway, and you can still break things into 40 entry pages, just 200 entries at a time. John =:-> John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] name search query speed
Hi, Jeremiah, Jeremiah Jahn schrieb: > doesn't that cause two queries? I used to do it that way and cut my time > substantially by counting in-line. Even though the results were cached > it still took more time. This sounds rather strange. > Also since the tables is constantly be updated > the returned total would not always match the number of results on the > second query. Did you run both queries in the same transaction, with transaction isolation level set to serializable? If yes, you found a serious bug in PostgreSQL transaction engine. Markus -- Markus Schaber | Dipl. Informatiker | Software Development GIS Fight against software patents in EU! http://ffii.org/ http://nosoftwarepatents.org/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
Josh, I did everything you said and my query does perform a bit better. I've been getting speeds from 203 to 219 to 234 milliseconds now. I tried increasing the work mem and the effective cache size from the values you provided, but I didn't see any more improvement. I've tried to looking into setting the shared buffers for Windows XP, but I'm not sure how to do it. I'm looking in the manual at: http://www.postgresql.org/docs/8.0/interactive/kernel-resources.html#SYSVIPC-PARAMETERS It doesn't mention windows at all. Does anyone have any ideas on have to fix this? Here is the new explain analyze. Sort (cost=1996.21..1996.26 rows=17 width=165) (actual time=297.000..297.000 rows=39 loops=1) Sort Key: ss.date -> Merge Right Join (cost=1951.26..1995.87 rows=17 width=165) (actual time=297.000..297.000 rows=39 loops=1) Merge Cond: ("outer".id = "inner".driver_id) -> Index Scan using driver_pkey on driver d (cost=0.00..42.16 rows=922 width=43) (actual time=0.000..0.000 rows=922 loops=1) -> Sort (cost=1951.26..1951.30 rows=17 width=122) (actual time=297.000..297.000 rows=39 loops=1) Sort Key: s.driver_id -> Hash Join (cost=586.48..1950.91 rows=17 width=122) (actual time=219.000..297.000 rows=39 loops=1) Hash Cond: ("outer".carrier_code_id = "inner".id) -> Merge Join (cost=571.97..1931.95 rows=830 width=87) (actual time=219.000..297.000 rows=310 loops=1) Merge Cond: ("outer".current_status_id = "inner".id) -> Index Scan using shipment_current_status_id_idx on shipment s (cost=0.00..2701.26 rows=60307 width=66) (actual time=0.000..62.000 rows=27711 loops=1) Filter: (is_purged = false) -> Sort (cost=571.97..576.38 rows=1766 width=21) (actual time=125.000..156.000 rows=6902 loops=1) Sort Key: ss.id -> Hash Join (cost=1.11..476.72 rows=1766 width=21) (actual time=0.000..93.000 rows=6902 loops=1) Hash Cond: ("outer".release_code_id = "inner".id) -> Index Scan Backward using current_status_date_idx on shipment_status ss (cost=0.00..387.35 rows=14122 width=21) (actual time=0.000..16.000 rows=14925 loops=1) Index Cond: (date >= (('now'::text)::date - 31)) -> Hash (cost=1.10..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=1) -> Seq Scan on release_code rc (cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Filter: ((number)::text = '9'::text) -> Hash (cost=14.51..14.51 rows=2 width=35) (actual time=0.000..0.000 rows=0 loops=1) -> Nested Loop (cost=4.92..14.51 rows=2 width=35) (actual time=0.000..0.000 rows=2 loops=1) -> Index Scan using person_pkey on person p (cost=0.00..5.73 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (id = 355) -> Hash Join (cost=4.92..8.75 rows=2 width=39) (actual time=0.000..0.000 rows=2 loops=1) Hash Cond: ("outer".id = "inner".carrier_id) -> Seq Scan on carrier c (cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1) -> Hash (cost=4.92..4.92 rows=2 width=43) (actual time=0.000..0.000 rows=0 loops=1) -> Hash Join (cost=3.04..4.92 rows=2 width=43) (actual time=0.000..0.000 rows=2 loops=1) Hash Cond: ("outer".carrier_id = "inner".carrier_id) -> Seq Scan on carrier_code cc (cost=0.00..1.57 rows=57 width=35) (actual time=0.000..0.000 rows=57 loops=1) -> Hash (cost=3.04..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1) -> Index Scan using carrier_to_person_person_id_idx on carrier_to_person ctp (cost=0.00..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (355 = person_id) Total runtime: 297.000 ms ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
Ken Egervari wrote: Josh, I did everything you said and my query does perform a bit better. I've been getting speeds from 203 to 219 to 234 milliseconds now. I tried increasing the work mem and the effective cache size from the values you provided, but I didn't see any more improvement. I've tried to looking into setting the shared buffers for Windows XP, but I'm not sure how to do it. I'm looking in the manual at: http://www.postgresql.org/docs/8.0/interactive/kernel-resources.html#SYSVIPC-PARAMETERS You probably don't need to change anything for Windows. If you set shared_buffers too high, then postgres won't start. If it is starting, then you don't need to modify the OS to get more shared buffers. (For instance, on my Mac, I can't get shared_buffers > 500 without changing things, but on windows I run with 3000 and no modification). It doesn't mention windows at all. Does anyone have any ideas on have to fix this? Do you need the interior sort? It's taking ~93ms to get 7k rows from shipment_status, and then another 30ms to sort them. This isn't a lot, so it might be fine. Also, did you ever try CLUSTER current_status_date_idx ON shipment_status. This groups the rows in shipment_status by their status date, which helps put items with the same date next to eachother. This may effect other portions of the query, or other queries. Also, if you are inserting sequentially, it would seem that the items would already be naturally near eachother based on date. The next big cost is having to merge the 28k rows with the fast hash plan, which takes about 80ms. I guess the biggest issue is that you are doing a lot of work, and it takes time to do it. Also, I've noticed that this query is being run with exactly the same data. Which is good to compare two methods. But remember to test on multiple potential values. You might be better off one way with this query, but much worse for a different dataset. I noticed that this seems to have fewer rows than what postgres thinks the *average* number would be. (It predicts 60k and you only get 28k rows). If this query is performed a lot, and you can be okay with a slight delay in updating, you could always switch to some sort of lazy materialized view. You could also always throw more hardware at it. :) If the shipment_status is one of the bottlenecks, create a 4-disk raid10 and move the table over. I don't remember what your hardware is, but I don't remember it being a quad opteron with 16GB ram, and 20 15k SCSI disks, with the transaction log on a solid state disk. :) Why do you need the query to be 30ms? ~250ms is still pretty fast. If you are needing updates faster than that, you might look more into *why* and then handle it from a higher level. And naturally, the most important this is to test it under load. 250ms is pretty good, but if under load it goes back to 6s, then we probably should look for different alternatives. Also, what is the load that is causing the problem? Is it that you have some other big seqscans which are causing all of your tables to go out of cache? Also, I believe I remember you saying that your production server is a P4, is that a single P4? Because I know postgres prefers Opterons to Pentium Xeons when in a multiprocessor machine. Look through the archives about spinlocks and the context switch bug. (context storm, etc). Plus, since opterons are 64-bit, you can throw a lot more RAM at them. I believe opterons outperform xeons for the same cost, *and* you can scale them up with extra ram. But remember, the biggest bottleneck is almost *always* the I/O. So put more & faster disks into the system first. John =:-> Here is the new explain analyze. Sort (cost=1996.21..1996.26 rows=17 width=165) (actual time=297.000..297.000 rows=39 loops=1) Sort Key: ss.date -> Merge Right Join (cost=1951.26..1995.87 rows=17 width=165) (actual time=297.000..297.000 rows=39 loops=1) Merge Cond: ("outer".id = "inner".driver_id) -> Index Scan using driver_pkey on driver d (cost=0.00..42.16 rows=922 width=43) (actual time=0.000..0.000 rows=922 loops=1) -> Sort (cost=1951.26..1951.30 rows=17 width=122) (actual time=297.000..297.000 rows=39 loops=1) Sort Key: s.driver_id -> Hash Join (cost=586.48..1950.91 rows=17 width=122) (actual time=219.000..297.000 rows=39 loops=1) Hash Cond: ("outer".carrier_code_id = "inner".id) -> Merge Join (cost=571.97..1931.95 rows=830 width=87) (actual time=219.000..297.000 rows=310 loops=1) Merge Cond: ("outer".current_status_id = "inner".id) -> Index Scan using shipment_current_status_id_idx on shipment s (cost=0.00..2701.26 rows=60307 width=66) (actual time=0.000..62.000 rows=27711 loops=1) Filter: (is_purged = false) -> Sort (cost=571.97..576.38 rows=1766 width=21) (actual time=125.000..156.000 rows=6902 loops=1)
Re: [PERFORM] name search query speed
Jeremiah Jahn wrote: I have about 5M names stored on my DB. Currently the searches are very quick unless, they are on a very common last name ie. SMITH. The Index is always used, but I still hit 10-20 seconds on a SMITH or Jones search, and I average about 6 searches a second and max out at about 30/s. Any suggestions on how I could arrange things to make this search quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I can increase this speed w/o a HW upgrade. If it's just "SMITH", the only fix is to throw more hardware at the problem. I've got my own database of medical providers & facilities in the millions and anytime somebody tries to search for MEDICAL FACILITY, it takes forever. I've tried every optimization possible but when you have 500K records with the word "MEDICAL" in it, what can you do? You've got to check all 500K records to see if it matches your criteria. For multi-word searches, what I've found does work is to periodically generate stats on work frequencies and use those stats to search the least common words first. For example, if somebody enters "ALTABATES MEDICAL HOSPITAL", I can get the ~50 providers with ALTABATES in the name and then do a 2nd and 3rd pass to filter against MEDICAL and HOSPITAL. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
Josh, Thanks so much for your comments. They are incredibly insightful and you clearly know your stuff. It's so great that I'm able to learn so much from you. I really appreciate it. Do you need the interior sort? It's taking ~93ms to get 7k rows from shipment_status, and then another 30ms to sort them. This isn't a lot, so it might be fine. Running the query without the sort doesn't actually improve performance unfortunately, which I find strange. I think the analyze is giving bad feedback because taking all sorts out completely makes no difference in performance. Dan Tow's book actually said the same thing... how sorting rarely takes up the bulk of the work. Although I know you didn't like his book much, but I had observed that in my experience too. Also, did you ever try CLUSTER current_status_date_idx ON shipment_status. This groups the rows in shipment_status by their status date, which helps put items with the same date next to eachother. This may effect other portions of the query, or other queries. Also, if you are inserting sequentially, it would seem that the items would already be naturally near eachother based on date. Yes, this was one of the first things I tried actually and it is currently clustered. Since shipment status comes into our system at real time, the dates are more or less in order as well. The next big cost is having to merge the 28k rows with the fast hash plan, which takes about 80ms. I guess the biggest issue is that you are doing a lot of work, and it takes time to do it. Also, I've noticed that this query is being run with exactly the same data. Which is good to compare two methods. But remember to test on multiple potential values. You might be better off one way with this query, but much worse for a different dataset. I noticed that this seems to have fewer rows than what postgres thinks the *average* number would be. (It predicts 60k and you only get 28k rows). Well, the example where p.id = 355 is an above normal case where performance is typically bad. If a user's company has very few shipments and shipment_status rows, performance isn't going to matter much and those queries usually perform much faster. I really needed to tune this for the larger customers who do have thousands of rows for their entire company and will probably reach 6 digits by the end of next year. For the person 355, they've only been on the system for 3 months and they already have 27700 rows. Even if this makes the smaller customers a bit slower, I think it's worth it if I can speed up cases like this, who all have very similar data distribution. If this query is performed a lot, and you can be okay with a slight delay in updating, you could always switch to some sort of lazy materialized view. I thought about this, but it's very important since shipment and shipment_status are both updated in real time 24/7/365. I think I might be able to cache it within the application for 60 seconds at most, but it would make little difference since people tend to refresh within that time anyway. It's very important that real-time inforamtion exists though. You could also always throw more hardware at it. :) If the shipment_status is one of the bottlenecks, create a 4-disk raid10 and move the table over. I don't remember what your hardware is, but I don't remember it being a quad opteron with 16GB ram, and 20 15k SCSI disks, with the transaction log on a solid state disk. :) That sounds like an awesome system. I loved to have something like that. Unfortunately, the production server is just a single processor machine with 1 GB ram. I think throwing more disks at it is probably the best bet, moving the shipment and shipment_status tables over as you suggested. That's great advice. Why do you need the query to be 30ms? ~250ms is still pretty fast. If you are needing updates faster than that, you might look more into *why* and then handle it from a higher level. 30ms is a good target, although I guess I was naive for setting that goal perhaps. I've just taken queries that ran at 600ms and with 1 or 2 indexes, they went down to 15ms. Let's say we have 200 users signed into the application at the same time. The application refreshes their shipment information automatically to make sure it's up to date on the user's screen. The application will execute the query we are trying to tune every 60 seconds for most of these users. Users can set the refresh time to be higher, but 60 is the lowest amount so I'm just assuming everyone has it at 60. Anyway, if you have 200 users logged in, that's 200 queries in the 60 second period, which is about 3-4 queries every second. As you can see, it's getting maxed out, and because of bad luck, the queries are bunched together and are being called at the same time, making 8-9 queries in the same second and that's where the performance is starting to degrade. I just know that if I could get this down to 30 ms, or even 100