[PERFORM] Query kills machine.
Hi all, I've attached all the query in query.sql I'm using postgres 7.3.4 on Linux version 2.4.26-custom ( /proc/sys/vm/overcommit_memory = 0 this time ) free : total used free sharedbuffers cached Mem: 18102121767384 42828 0 56041663908 -/+ buffers/cache: 978721712340 Swap: 505912 131304 374608 After I rebuilt the database, the query was fast (28255.12 msec). After one night's insertion into the tables that the query select from, the query all of a sudden uses up all resources , and the kernel starts swapping, and I haven't seen the query actually finish when this happens. I did vacuum analyze AND reindex, but that doesn't help. I attached the explain analyze of the query before this happens, and the explain plan from when it actually happens that the query doesn't finish. The one noticeable difference, was that before, it used merge joins, and after, it used hash joins. When the query was slow, I tried to : set enable_hashjoin to off for this query, and the query finished relatively fast again (316245.16 msec) I attached the output of that explain analyze as well, as well as the postgres settings. Can anyone shed some light on what's happening here. I can't figure it out. Kind Regards Stefan query.sql Description: Binary data Aggregate (cost=87597.84..89421.82 rows=2702 width=484) (actual time=22727.88..28164.74 rows=12040 loops=1) Filter: sum(qty_on_hand) + sum(qty_in_goods)) - sum(qty_dp)) - sum(qty_out_goods)) < 0::numeric) -> Group (cost=87597.84..88003.17 rows=27022 width=484) (actual time=22727.45..23242.01 rows=42705 loops=1) -> Sort (cost=87597.84..87665.40 rows=27022 width=484) (actual time=22727.43..22756.74 rows=42705 loops=1) Sort Key: s.sku, s.stktype_code, i.sku_descr, br.cluster_code, br.cluster_descr -> Merge Join (cost=84388.96..85608.78 rows=27022 width=484) (actual time=20303.41..21814.25 rows=42705 loops=1) Merge Cond: (("outer".group_code = "inner".group_code) AND ("outer".sku = "inner".sku)) -> Sort (cost=64472.34..64489.67 rows=6930 width=388) (actual time=16503.56..16530.23 rows=42705 loops=1) Sort Key: s.group_code, os.sku -> Merge Join (cost=63006.13..64030.25 rows=6930 width=388) (actual time=14394.48..15794.71 rows=42705 loops=1) Merge Cond: (("outer".cluster_brn = "inner".cluster_code) AND ("outer".sku = "inner".sku)) -> Index Scan using old_sku_uidx1 on old_sku os (cost=0.00..797.79 rows=17799 width=64) (actual time=0.02..47.66 rows=17799 loops=1) -> Sort (cost=63006.13..63045.07 rows=15574 width=324) (actual time=14393.77..14556.50 rows=132703 loops=1) Sort Key: br.cluster_code, s.sku -> Merge Join (cost=61645.75..61921.64 rows=15574 width=324) (actual time=4862.56..6078.94 rows=132703 loops=1) Merge Cond: ("outer".brn_code = "inner".brn_code) -> Sort (cost=61587.79..61626.73 rows=15574 width=228) (actual time=4859.23..5043.43 rows=132703 loops=1) Sort Key: s.brn_code -> Index Scan using stmst_sku_idx4 on stmst_sku s (cost=0.00..60503.30 rows=15574 width=228) (actual time=0.07..1078.30 rows=132703 loops=1) Index Cond: (fpp_code = '200408'::text) -> Sort (cost=57.96..59.62 rows=667 width=96) (actual time=3.26..91.93 rows=133005 loops=1) Sort Key: br.brn_code -> Seq Scan on master_branch_descr br (cost=0.00..26.67 rows=667 width=96) (actual time=0.02..1.13 rows=667 loops=1) -> Sort (cost=19916.61..20306.53 rows=155968 width=96) (actual time=3797.71..3914.26 rows=184223 loops=1) Sort Key: i.group_code, i.sku -> Seq Scan on master_sku_descr i (cost=0.00..6463.68 rows=155968 width=96) (actual time=0.01..293.74 rows=155968 loops=1) SubPlan -> Aggregate (cost=6.02..6.02 rows=1 width=8) (actual time=0.04..0.04 rows=1 loops=14456) -> Index Scan using gir_oustanding_idx1 on gir_outstanding (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 rows=0 loops=14456) Index Cond: ((cluster_brn = $0) AND (sku = $1) AND (stktype_code = $2)) -> Aggregate (cost=6.02..6.02 rows=1 width=8) (actual time=0.04..0.04 rows=1 loops=14456) -> Index Scan using gir_oustanding_idx1 on gir_outstanding
Re: [PERFORM] Query kills machine.
For starters, shared_buffers = 110592 wal_buffers = 400 sort_mem = 30720 vacuum_mem = 10240 checkpoint_segments = 30 commit_delay = 5000 commit_siblings = 100 effective_cache_size = 201413 Try more like this: shared_buffers = 3 wal_buffers = sort_mem = 4096 vacuum_mem = 10240 checkpoint_segments = 30 commit_delay = commit_siblings = effective_cache_size = 10 Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] postgresql performance with multimedia
On 8/24/2004 1:08 AM, my ho wrote: --- Jan Wieck <[EMAIL PROTECTED]> wrote: On 8/17/2004 8:44 PM, my thi ho wrote: > Hi, > I am working on a project which explore postgresql to > store multimedia data. > In details, i am trying to work with the buffer > management part of postgres source code. And try to > improve the performance. I had search on the web but > could not find much usefull information. What version of PostgreSQL are you looking at? Note that the buffer cache replacement strategy was completely changed for version 8.0, which is currently in BETA test. A description of the algorithm can be found in the README file in src/backend/storage/bufmgr. oki, Thanks for the information. I have a look at 8.0 beta, but cannot start the statistic collector. (I had post this err message before for help, but havent really got any clue to fix it) LOG: could not create IPv6 socket: Address family not supported by protocol LOG: could not bind socket for statistics collector: Cannot assign requested address LOG: disabling statistics collector for lack of working socket Tom Lane answered to that question. The code in question does resolve "localhost" with getaddrinfo() and then tries to create and bind a UDP socket to all returned addresses. For some reason "localhost" on your system resolves to an address that is not available for bind(2). btw, what i want to ask here is does postgreSQL have any kind of read-ahead buffer implemented? 'cos it would be useful in multimedia case when we always scan the large table for continous data. Since there is no mechanism to control that data is stored contiguously in the tables, what would that be good for? Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query kills machine.
Stef wrote: > Christopher Kings-Lynne mentioned : > => sort_mem = 4096 > > Reducing sort_mem to 4096 seems to make it run in a reasonable time > again. Any idea why? The database does a whole lot of huge sorts > every day, so I thought upping this parameter would help. > > A couple of queries do seem to run slower now that I reduced > the sort_mem. > The shared buffers still makes a significant difference when I increase it. > Well you have to take in account that sort_mem is not the total memory allocated for sorting but per connection and in complex expressions serveral times that too. So if you sort a lot it can push your operating system off the cliff and it might start reaping things that shouldn't be reaped and start swapping. If that happens _everything_ on that box will get slow... Shared buffers on the other hand is only allocated once. Regards, Magnus ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query kills machine.
Stef <[EMAIL PROTECTED]> writes: > Reducing sort_mem to 4096 seems to make it run in a reasonable time > again. Any idea why? The database does a whole lot of huge sorts > every day, so I thought upping this parameter would help. Not if you haven't got the RAM to support it :-( Another thing you might look at is ANALYZEing the tables again after you've loaded all the new data. The row-count estimates seem way off in these plans. You might need to increase the statistics target, too, to get better plans. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] [FUN] Performance increase?
Do you think that adopting the "chip tuning" product postgresql could increase the performances as well ? Regards Gaetano Mendola ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query kills machine.
Stef wrote: Christopher Kings-Lynne mentioned : => sort_mem = 4096 Reducing sort_mem to 4096 seems to make it run in a reasonable time again. Any idea why? The database does a whole lot of huge sorts every day, so I thought upping this parameter would help. A couple of queries do seem to run slower now that I reduced the sort_mem. The shared buffers still makes a significant difference when I increase it. Well you have to take in account that sort_mem is not the total memory allocated for sorting but per connection and in complex expressions serveral times that too. So if you sort a lot it can push your operating system off the cliff and it might start reaping things that shouldn't be reaped and start swapping. If that happens _everything_ on that box will get slow... Shared buffers on the other hand is only allocated once. Regards, Magnus ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] What is the best way to do attribute/values?
Hi list, I have a database with 1M "people" in it. Each person has about 20 attributes, such as height, weight, eye color, etc. I need to be able to search for people based on these attributes. A search can be conducted on one attribute, all attributes, or any number in between. How would _you_ do this? I have already attempted to answer this. My attempts are detailed here: http://sh.nu/email.txt This is the email I was originally going to send to this list. Since it's so large, I decided to link to it instead. If you feel that it belongs in a post to the list, let me know, and I'll post again. I've discussed these attempts with people in #postgresql on irc.freenode.net. Agliodbs (I presume you know who this is) was very helpful, but in end was at a loss. I find myself in the same postition at this time. He suggested I contact this list. My ultimate goal is performance. This _must_ be fast. And by fast, I mean, < 1 second, for every permutation of the number of attributes searched for. Flexibility would be a bonus, but at this point I'll settle for something that's harder to maintain if I can get the speed gain I need. Thanks, Daniel Ceregatti ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] What is the best way to do attribute/values?
Folks, > I've discussed these attempts with people in #postgresql on > irc.freenode.net. Agliodbs (I presume you know who this is) was very > helpful, but in end was at a loss. I find myself in the same postition > at this time. He suggested I contact this list. There's a couple of issues here to attack: 1) PostgreSQL is not using the most optimal plan.First, it's ignoring the fact that all referenced columns are indexed and only using the first column, then filtering based on the other criteria. Second, testing has shown that a hash join would actually be faster. We've tried upping the statistics, but it doesn't seem to have an effect on the planner's erroneous estimates. 2) Even were it using the most optimal plan, it's still to slow. As you can see from the plan, each merge join takes about 1.5 to 2 seconds.(hash joins are only about 0.5 seconds slower). Mysteriously, a big chunk of this time is spent *in bewtween* planner steps, as if there was some hold-up in retrieving the index or table pages. There may be, but Daniel and I have not been able to diagnose the cause. It's particularly mysterious since a filter-and-sort on a *single* criteria set, without join, takes < 400ms. Things we've already tried to avoid going over old ground: 1) increasing statistics; 2) increasing sort_mem (to 256MB, which is overkill) 3) testing on 8.0 beta, which does not affect the issue. At this point I'm looking for ideas. Suggestions, anyone? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] What is the best way to do attribute/values?
Daniel Ceregatti wrote: Hi list, I have a database with 1M "people" in it. Each person has about 20 attributes, such as height, weight, eye color, etc. I need to be able to search for people based on these attributes. A search can be conducted on one attribute, all attributes, or any number in between. How would _you_ do this? I have already attempted to answer this. My attempts are detailed here: http://sh.nu/email.txt Hmm... interesting. Shot in the dark - try a tsearch2 full-text index. Your problem could be translated into searching strings of the form "hair=black eyes=blue age=117" Not pretty, but might give you the speed you want. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] postgresql performance with multimedia
> Tom Lane answered to that question. The code in > question does resolve > "localhost" with getaddrinfo() and then tries to > create and bind a UDP > socket to all returned addresses. For some reason > "localhost" on your > system resolves to an address that is not available > for bind(2). I tried to put my_ip instead of "localhost" in bufmng.c and it seems to work (no more complaining). However i check the pg_statio_all_tables and dont see any recorded statistic at all. (all the columns are '0') some time postmaster shut down with this err msg: LOG: statistics collector process () exited with exit code 1 i starts postmaster with this command: postmaster -i -p $PORT -D $PGDATA -k $PGDATA -N 32 -B 64 -o -s > > btw, what i want to ask here is does postgreSQL > have > > any kind of read-ahead buffer implemented? 'cos it > > would be useful in multimedia case when we always > scan > > the large table for continous data. > > Since there is no mechanism to control that data is > stored contiguously > in the tables, what would that be good for? i thought that rows in the table will be stored contiguously? in that case, if the user is requesting 1 row, we make sure that the continue rows are ready in the buffer pool so that when they next requested, they wont be asked to read from disk. For multimedia data, this is important 'cos data needs to be presented continuously without any waiting. thanks again for your help MT Ho __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster