Re: [PERFORM] Nested loop question
On Tuesday 16 December 2003 17:06, Nick Fankhauser - Doxpop wrote: > Hi- > > I'm trying to optimize a query that I *think* should run very fast. > Essentially, I'm joining two tables that have very selective indexes and > constraining the query on an indexed field. (There's a third small lookup > table in the mix, but it doesn't really affect the bottom line.) > I'm unsure what is happening next. I notice that an index scan is occurring > on actor_summary_pk, with an "actual time" of 9.15, but then it looks like > a nested loop occurs at the next level to join these tables. Does this mean > that each probe of the actor_summary index will take 9.15 msec, but the > nested loop is going to do this once for each actor_id? That's right - you need to multiply the actual time by the number of loops. In your case this would seem to be about 33 seconds. > -> Index Scan using actor_summary_pk on > actor_summary (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15 > rows=1 loops=3639) >Index Cond: ("outer".actor_id = > actor_summary.actor_id) > The nested loop appears to be where most of my time is going, so I'm > focusing on this area, but don't know if there is a better approach to this > join. > > Is there a more efficient means than a nested loop to handle such a join? > Would a different method be chosen if there was exactly one row in > actor_summary for every row in actor? Hmm - tricky to say in your case. PG has decided to filter on actor then look up the corresponding values in actor_summary. Given that you have 3 million rows in both tables that seems a reasonable approach. You could always try forcing different plans by switching the various ENABLE_HASHJOIN etc options (see the runtime configuration section of the manuals). I'm not sure that will help you here though. The fact that it's taking you 9ms to do each index lookup suggests to me that it's going to disk each time. Does that sound plausible, or do you think you have enough RAM to cache your large indexes? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Nested loop question
> The fact that it's taking you 9ms to do each index lookup > suggests to me that > it's going to disk each time. Does that sound plausible, or do > you think you > have enough RAM to cache your large indexes? I'm sure we don't have enough RAM to cache all of our large indexes, so your supposition makes sense. We have 1GB on this machine. In responding to the performance problems we're having, one of the questions has been adding memory vs crafting "helper" tables to speed things up. The issue is that this database needs to be able to scale easily to about 10 times the size, so although we could easily triple the memory at reasonable expense, we'd eventually hit a wall. Is there any solid method to insure that a particular index always resides in memory? A hybrid approach that might scale reliably would be to bump up our memory and then make sure key indexes are cached. however, I'm concerned that if we didn't have a way to ensure that the indexes that we choose remain cached, we would have very inconsistent responses. Thanks for your ideas! -Nick ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Nested loop performance
> It seems that your basic problem is that you're fetching lots of rows > from two big ol' tables. > It doesn't seem to me that there would be a substantially better plan > for this query with your tables as they stand. That's more or less the conclusion I had come to. I was just hoping someone else could point out an approach I've been missing. (sigh!) > If your data were more > normalised, then your big scans might be quicker (because their rows > would be smaller so they would hit fewer disk pages), This started off as a 5-table join on well-normalized data. Unfortunately, the actor table doesn't get any smaller, and the work involved in calculating the "case_count" information on the fly was clearly becoming a problem- particularly with actors that had a heavy caseload. (Busy attorneys and judges.) The actor_summary approach makes these previous problem cases go away, but the payback is that (as you correctly pointed out) queries on average citizens who only have one case suffer from the de-normalized approach. We're currently considering the approach of just returning all of the rows to our application, and doing the aggregation and limit work in the app. The inconsistency of the data makes it very tough for the query planner to come up with an strategy that is always a winner. Thanks for your thoughts! -Nick ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Nested loop performance
> As a question, what does explain analyze give you if you > set enable_nestloop=false; before trying the query? Here are the results- It looks quite a bit more painful than the other plan, although the wall time is in the same ballpark. alpha=# explain analyze alpha-# select alpha-# min(actor.actor_id) as actor_id, alpha-# min(actor.actor_entity_type) as actor_entity_type, alpha-# min(actor.role_class_code) as role_class_code, alpha-# min(actor.actor_full_name) as actor_full_name, alpha-# min(actor.actor_person_date_of_birth) as actor_person_date_of_birth, alpha-# min(actor.actor_entity_acronym) as actor_entity_acronym, alpha-# min(actor.actor_person_last_name) as actor_person_last_name, alpha-# min(actor.actor_person_first_name) as actor_person_first_name, alpha-# min(actor.actor_person_middle_name) as actor_person_middle_name, alpha-# min(actor.actor_person_name_suffix) as actor_person_name_suffix, alpha-# min(actor.actor_person_place_of_birth) as actor_person_place_of_birth, alpha-# min(actor.actor_person_height) as actor_person_height, alpha-# min(actor.actor_person_height_unit) as actor_person_height_unit, alpha-# min(actor.actor_person_weight) as actor_person_weight, alpha-# min(actor.actor_person_weight_unit) as actor_person_weight_unit, alpha-# min(actor.actor_person_ethnicity) as actor_person_ethnicity, alpha-# min(actor.actor_person_citizenship_count) as actor_person_citizenship_count, alpha-# min(actor.actor_person_hair_color) as actor_person_hair_color, alpha-# min(actor.actor_person_scars_marks_tatto) as actor_person_scars_marks_tatto, alpha-# min(actor.actor_person_marital_status) as actor_person_marital_status, alpha-# min(actor.actor_alias_for_actor_id) as actor_alias_for_actor_id, alpha-# min(to_char(data_source.source_last_update, 'MM/DD/ HH12:MI AM TZ')) as last_update, alpha-# min(actor_summary.single_case_public_id) as case_public_id, alpha-# min(actor_summary.single_case_id) as case_id, alpha-# sum(actor_summary.case_count)as case_count alpha-# from alpha-# actor, alpha-# actor_summary, alpha-# data_source alpha-# where alpha-# actor.actor_id = actor_summary.actor_id alpha-# and data_source.source_id = actor.source_id alpha-# and actor.actor_full_name_uppercase like upper('sanders%') alpha-# group by alpha-# actor.actor_id alpha-# order by alpha-# min(actor.actor_full_name_uppercase), alpha-# case_count desc, alpha-# min(actor_summary.case_disp_global_code) alpha-# limit alpha-# 1000; QUERY PLAN -- Limit (cost=168919.98..168920.03 rows=20 width=548) (actual time=91247.95..91249.05 rows=1000 loops=1) -> Sort (cost=168919.98..168920.03 rows=20 width=548) (actual time=91247.95..91248.35 rows=1001 loops=1) Sort Key: min((actor.actor_full_name_uppercase)::text), sum(actor_summary.case_count), min((actor_summary.case_disp_global_code)::text) -> Aggregate (cost=168904.95..168919.54 rows=20 width=548) (actual time=91015.00..91164.68 rows=3590 loops=1) -> Group (cost=168904.95..168905.95 rows=201 width=548) (actual time=90999.87..91043.25 rows=3594 loops=1) -> Sort (cost=168904.95..168905.45 rows=201 width=548) (actual time=90999.83..91001.57 rows=3594 loops=1) Sort Key: actor.actor_id -> Hash Join (cost=903.08..168897.24 rows=201 width=548) (actual time=25470.63..90983.45 rows=3594 loops=1) Hash Cond: ("outer".actor_id = "inner".actor_id) -> Seq Scan on actor_summary (cost=0.00..150715.43 rows=3455243 width=73) (actual time=8.03..52902.24 rows=3455243 loops=1) -> Hash (cost=902.57..902.57 rows=204 width=475) (actual time=25459.92..25459.92 rows=0 loops=1) -> Hash Join (cost=1.14..902.57 rows=204 width=475) (actual time=155.92..25451.25 rows=3639 loops=1) Hash Cond: ("outer".source_id = "inner".source_id) -> Index Scan using actor_full_name_uppercase on actor (cost=0.00..897.20 rows=223 width=463) (actual time=144.93..25404. 10 rows=3639 loops=1) Index Cond: ((actor_full_name_uppercase >= 'SANDERS'::character varying) AND (actor_full_name_uppercase < 'SANDERT':: character varying)) Filter: (actor_full_name_uppercase ~~ 'SANDERS%'::text) -> Hash (cost=1.11..1.11 rows=11 width=12) (actual time=10.66..10.66 rows=0 loops=1)
Re: [PERFORM] Excessive rows/tuples seriously degrading query
Hannu Krosing <[EMAIL PROTECTED]> writes: > Chadwick, Russell kirjutas L, 13.12.2003 kell 00:40: >> Can anyone explain why this table which has never had more than a >> couple rows in it shows > 500k in the query planner even after running >> vacuum full. > It can be that there is an idle transaction somewhere that has locked a > lot of rows (i.e. all your updates have been running inside the same > transaction for hour or days) In fact an old open transaction is surely the issue, given that the VACUUM report shows a huge number of "kept" tuples: >> INFO: Pages 4773: Changed 1, reaped 767, Empty 0, New 0; Tup 613737: Vac 57620, >> Keep/VTL 613735/613713, UnUsed 20652, MinLen 52, MaxLen 52; Re-using: Free/Avail. >> Space 4322596/4322596; EndEmpty/Avail. Pages 0/4773. >> CPU 9.11s/13.68u sec elapsed 22.94 sec. "Keep" is the number of tuples that are committed dead but can't be removed yet because there is some other open transaction that is old enough that it should be able to see them if it looks. Apparently the access pattern on this table is constant updates of the two logical rows, leaving lots and lots of dead versions. You need to vacuum it more often to keep down the amount of deadwood, and you need to avoid having very-long-running transactions open when you vacuum. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"
Hi- After having done my best to squeeze better performance out of our application by tuning within our existing resources, I'm falling back on adding memory as a short-term solution while we get creative for a long-term fix. I'm curious about what experiences others have had with the process of adding big chunks of RAM. In particular, if I'm trying to encourage the OS to cache more of my index information in RAM, what sort of configuration should I do at both the PostgreSQL and OS level? In a slightly off-topic vein, I'd also like to hear about it if anyone knows about any gotchas at the OS level that might become a problem. The server is a dual processor Athlon 1.2GHz box with hardware SCSI RAID. It currently has 1 GB RAM, and we're planning to add one GB more for a total of 2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL v7.3.2 My current memory related settings are: SHMMAX and SHMALL set to 128MB (OS setting) shared buffers 8192 (64MB) sort_mem 16384 (16MB) effective_cache_size 65536 (512MB) We support up to 70 active users, sharing a connection pool of 16 connections. Most of the queries center around 3 tables that are about 1.5 GB each. Thanks. -Nick - Nick Fankhauser [EMAIL PROTECTED] Phone 1.765.965.7363 Fax 1.765.962.9788 doxpop - Court records at your fingertips - http://www.doxpop.com/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"
On Dec 17, 2003, at 11:57 AM, Nick Fankhauser wrote: Hi- After having done my best to squeeze better performance out of our application by tuning within our existing resources, I'm falling back on adding memory as a short-term solution while we get creative for a long-term fix. I'm curious about what experiences others have had with the process of adding big chunks of RAM. In particular, if I'm trying to encourage the OS to cache more of my index information in RAM, what sort of configuration should I do at both the PostgreSQL and OS level? You need bigmem compiled in the kernel, which you should already have at the 1 gig level iirc. You should bump up your effective cache size, probably to around 1.75 gig. I wouldn't bump up the shared buffers beyond where you have them now. If you're swapping out sorts to disk, you may gain boosting sortmem some since you have the additional memory to use. The server is a dual processor Athlon 1.2GHz box with hardware SCSI RAID. It currently has 1 GB RAM, and we're planning to add one GB more for a total of 2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL v7.3.2 I've got a machine running Debian Stable w/2.4.x, 1.3 ghz p3, 1.5 gig ram, pg 7.2.4 and it's rock solid. eric ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"
If you have 3 1.5GB tables then you might as well go for 4GB while you're at it. Make sure you've got a bigmem kernel either running or available, and boost effective_cache_size by whatever amount you increase the RAM by. We run a Quad Xeon/4GB server on Redhat 7.3 and it's solid as a rock. There is no way I know of to get indexes preferentially cached over data though. Matt > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Nick > Fankhauser > Sent: 17 December 2003 19:57 > To: [EMAIL PROTECTED] Org > Subject: [PERFORM] Adding RAM: seeking advice & warnings of hidden > "gotchas" > > > Hi- > > After having done my best to squeeze better performance out of our > application by tuning within our existing resources, I'm falling back on > adding memory as a short-term solution while we get creative for > a long-term > fix. I'm curious about what experiences others have had with the > process of > adding big chunks of RAM. In particular, if I'm trying to encourage the OS > to cache more of my index information in RAM, what sort of configuration > should I do at both the PostgreSQL and OS level? > > In a slightly off-topic vein, I'd also like to hear about it if > anyone knows > about any gotchas at the OS level that might become a problem. > > The server is a dual processor Athlon 1.2GHz box with hardware > SCSI RAID. It > currently has 1 GB RAM, and we're planning to add one GB more for > a total of > 2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL v7.3.2 > > My current memory related settings are: > > SHMMAX and SHMALL set to 128MB (OS setting) > shared buffers 8192 (64MB) > sort_mem 16384 (16MB) > effective_cache_size 65536 (512MB) > > > We support up to 70 active users, sharing a connection pool of 16 > connections. Most of the queries center around 3 tables that are about 1.5 > GB each. > > > Thanks. >-Nick > > - > Nick Fankhauser > > [EMAIL PROTECTED] Phone 1.765.965.7363 Fax 1.765.962.9788 > doxpop - Court records at your fingertips - http://www.doxpop.com/ > > > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Why is restored database faster?
Dennis, Shridhar, and Neil, Thanks for your input. Here are my responses: I ran VACUUM FULL on the table in question. Although that did reduce "Pages" and "UnUsed", the "SELECT *" query is still much slower on this installation than in the new, restored one. Old server: # VACUUM FULL abc; VACUUM # VACUUM VERBOSE abc; NOTICE: --Relation abc-- NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed 32. Total CPU 0.07s/0.52u sec elapsed 0.60 sec. VACUUM New server: # VACUUM VERBOSE abc; NOTICE: --Relation abc-- NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed 0. Total CPU 0.02s/0.00u sec elapsed 0.02 sec. VACUUM max_fsm_pages is at its default value, 1. People don't have the practice of dumping and restoring just for the purpose of improving performance, do they? Neil asked how much disk space the database directory takes on each machine. What directory is of interest? The whole thing takes up about 875 MB on each machine. -David ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Why is restored database faster?
On Thursday 18 December 2003 09:24, David Shadovitz wrote: > Old server: > # VACUUM FULL abc; > VACUUM > # VACUUM VERBOSE abc; > NOTICE: --Relation abc-- > NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed > 32. Total CPU 0.07s/0.52u sec elapsed 0.60 sec. > VACUUM > > New server: > # VACUUM VERBOSE abc; > NOTICE: --Relation abc-- > NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed > 0. Total CPU 0.02s/0.00u sec elapsed 0.02 sec. > VACUUM > > max_fsm_pages is at its default value, 1. Well, then the only issue left is file sytem defragmentation. Which file system is this anyway > People don't have the practice of dumping and restoring just for the > purpose of improving performance, do they? Well, at times it is required. Especially if it is update intensive environment. An no database is immune to that > Neil asked how much disk space the database directory takes on each > machine. What directory is of interest? The whole thing takes up about 875 > MB on each machine. That is fairly small.. Should not take much time..in my guess, the time it takes to vacuum is more than time to dump and reload. Another quick way to defragment a file system is to copy entire data directory to another partition(Shutdown postmaster first), delete it from original partition and move back. Contegous wriing to a partition results in defragmentation effectively. Try it and see if it helps. It could be much less trouble than dump/restore.. HTH Shridhar ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly