Re: [PERFORM] hash join vs nested loop join
Huan Ruan wrote: > Kevin Grittner wrote: >> With a low cache hit rate, that would generally be when the number >> of lookups into the table exceeds about 10% of the table's rows. > > So far, my main performance issue comes down to this pattern where > Postgres chooses hash join that's slower than a nest loop indexed join. By > changing those cost parameters, this query works as expected now, but there > are others fall into the same category and appear to be harder to convince > the optimiser. > > I'm still a bit worried about this query as Postgres gets the record count > right, and knows the index is a primary key index, therefore it knows it's > 0.05m out of 170m records (0.03%) but still chooses the sequential scan. > Hopefully this is just related to that big index penalty bug introduced in > 9.2. Quite possibly, but it could be any of a number of other things, like a type mismatch. It might be best to rule out other causes. If you post the new query and EXPLAIN ANALYZE output, along with the settings you have now adopted, someone may be able to spot something. It wouldn't hurt to repeat OS and hardware info with it so people have it handy for reference. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] backend suddenly becomes slow, then remains slow
One of my clients has an odd problem. Every so often a backend will suddenly become very slow. The odd thing is that once this has happened it remains slowed down, for all subsequent queries. Zone reclaim is off. There is no IO or CPU spike, no checkpoint issues or stats timeouts, no other symptom that we can see. The problem was a lot worse that it is now, but two steps have alleviated it mostly, but not completely: much less aggressive autovacuuming and reducing the maximum lifetime of backends in the connection pooler to 30 minutes. It's got us rather puzzled. Has anyone seen anything like this? cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
Does FK Constraint help to improve performance? Or it is only for maintaining data integrity? On Thu, Dec 13, 2012 at 7:38 PM, Claudio Freire wrote: > On Thu, Dec 13, 2012 at 8:50 PM, Evgeny Shishkin > wrote: > >>> OP joins 8 tables, and i suppose join collapse limit is set to default > 8. I thought postgresql's optimiser is not mysql's. > >> > >> It's not obvious to me that there's anything very wrong with the plan. > >> An 8-way join that produces 150K rows is unlikely to run in milliseconds > >> no matter what the plan. The planner would possibly have done the last > >> join step differently if it had had a better rowcount estimate, but even > >> if that were free the query would still have been 7 seconds (vs 8.5). > >> > > > > May be in this case it is. I once wrote to this list regarding similar > problem - joining 4 tables, result set are off by 2257 times - 750ms vs > less then 1ms. Unfortunately the question was not accepted to the list. > > > > I spoke to Bruce Momjian about that problem on one local conference, he > said shit happens :) > > I think it's more likely a missing FK constraint. >
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
On Fri, Dec 14, 2012 at 4:01 PM, AI Rumman wrote: > Does FK Constraint help to improve performance? Or it is only for > maintaining data integrity? I'm not entirely sure it's taken into account, I think it is, but a FK would tell the planner that every non-null value will produce a row. It seems to think there are a large portion of non-null values that don't. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
AI Rumman wrote: > Claudio Freire wrote: >> I think it's more likely a missing FK constraint. > Does FK Constraint help to improve performance? Or it is only > for maintaining data integrity? I'm not aware of any situation where adding a foreign key constraint would improve performance. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
"Kevin Grittner" writes: > AI Rumman wrote: >> Does FK Constraint help to improve performance? Or it is only >> for maintaining data integrity? > I'm not aware of any situation where adding a foreign key > constraint would improve performance. There's been talk of teaching the planner to use the existence of FK constraints to improve plans, but I don't believe any such thing is in the code today. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
So I am going to change join_collapse_limit and from_collapse_limit to 20. Do I need to set geqo_threshold to greater than 20. Now it is 12 ( default). And could you let me know why geqo_optimizer is not working good in this case? On Fri, Dec 14, 2012 at 2:22 PM, Tom Lane wrote: > "Kevin Grittner" writes: > > AI Rumman wrote: > >> Does FK Constraint help to improve performance? Or it is only > >> for maintaining data integrity? > > > I'm not aware of any situation where adding a foreign key > > constraint would improve performance. > > There's been talk of teaching the planner to use the existence of FK > constraints to improve plans, but I don't believe any such thing is > in the code today. > > regards, tom lane >
Re: [PERFORM] backend suddenly becomes slow, then remains slow
Andrew Dunstan writes: > One of my clients has an odd problem. Every so often a backend will > suddenly become very slow. The odd thing is that once this has happened > it remains slowed down, for all subsequent queries. Zone reclaim is off. > There is no IO or CPU spike, no checkpoint issues or stats timeouts, no > other symptom that we can see. The problem was a lot worse that it is > now, but two steps have alleviated it mostly, but not completely: much > less aggressive autovacuuming and reducing the maximum lifetime of > backends in the connection pooler to 30 minutes. > It's got us rather puzzled. Has anyone seen anything like this? Maybe the kernel is auto-nice'ing the process once it's accumulated X amount of CPU time? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
On Fri, Dec 14, 2012 at 4:22 PM, Tom Lane wrote: > "Kevin Grittner" writes: >> AI Rumman wrote: >>> Does FK Constraint help to improve performance? Or it is only >>> for maintaining data integrity? > >> I'm not aware of any situation where adding a foreign key >> constraint would improve performance. > > There's been talk of teaching the planner to use the existence of FK > constraints to improve plans, but I don't believe any such thing is > in the code today. That made me look the code. So, eqjoinsel_inner in selfuncs.c would need those smarts. Cool. Anyway, reading the code, I think I can now spot the possible issue behind all of this. Selectivity is decided based on the number of distinct values on both sides, and the table's name "entity" makes me think it's a table that is reused for several things. That could be a problem, since that inflates distinct values, feeding misinformation to the planner. Rather than a generic "entity" table, perhaps it would be best to separate them different entities into different tables. Failing that, maybe if you have an "entity type" kind of column, you could try refining the join condition to filter by that kind, hopefully there's an index over entity kind and the planner can use more accurate MCV data. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] backend suddenly becomes slow, then remains slow
On 12/14/2012 02:56 PM, Tom Lane wrote: Andrew Dunstan writes: One of my clients has an odd problem. Every so often a backend will suddenly become very slow. The odd thing is that once this has happened it remains slowed down, for all subsequent queries. Zone reclaim is off. There is no IO or CPU spike, no checkpoint issues or stats timeouts, no other symptom that we can see. The problem was a lot worse that it is now, but two steps have alleviated it mostly, but not completely: much less aggressive autovacuuming and reducing the maximum lifetime of backends in the connection pooler to 30 minutes. It's got us rather puzzled. Has anyone seen anything like this? Maybe the kernel is auto-nice'ing the process once it's accumulated X amount of CPU time? That was my initial thought, but the client said not. We'll check again. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
Yes, I do have a column in entity table like setype where the values are 'Contacts', 'Candidate' etc. I have an index on it also. Are you suggesting to make different table for Contacts, Candidate etc. On Fri, Dec 14, 2012 at 3:10 PM, Claudio Freire wrote: > On Fri, Dec 14, 2012 at 4:22 PM, Tom Lane wrote: > > "Kevin Grittner" writes: > >> AI Rumman wrote: > >>> Does FK Constraint help to improve performance? Or it is only > >>> for maintaining data integrity? > > > >> I'm not aware of any situation where adding a foreign key > >> constraint would improve performance. > > > > There's been talk of teaching the planner to use the existence of FK > > constraints to improve plans, but I don't believe any such thing is > > in the code today. > > That made me look the code. > > So, eqjoinsel_inner in selfuncs.c would need those smarts. Cool. > > Anyway, reading the code, I think I can now spot the possible issue > behind all of this. > > Selectivity is decided based on the number of distinct values on both > sides, and the table's name "entity" makes me think it's a table that > is reused for several things. That could be a problem, since that > inflates distinct values, feeding misinformation to the planner. > > Rather than a generic "entity" table, perhaps it would be best to > separate them different entities into different tables. Failing that, > maybe if you have an "entity type" kind of column, you could try > refining the join condition to filter by that kind, hopefully there's > an index over entity kind and the planner can use more accurate MCV > data. >
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
On Fri, Dec 14, 2012 at 5:25 PM, AI Rumman wrote: > Are you suggesting to make different table for Contacts, Candidate etc. Yes -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
Claudio Freire wrote: > Selectivity is decided based on the number of distinct values on > both sides, and the table's name "entity" makes me think it's a > table that is reused for several things. That could be a problem, > since that inflates distinct values, feeding misinformation to > the planner. > > Rather than a generic "entity" table, perhaps it would be best to > separate them different entities into different tables. I missed that; good catch. Good advice. Don't try to build a "database within a database" by having one table for different types of data, with a code to sort them out. EAV is a seriously bad approach for every situation where I've seen someone try to use it. I was about to say it's like trying to drive a nail with a pipe wrench, then realized it's more like putting a bunch of hammers in a bag and swinging the bag at the nail. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
On Fri, Dec 14, 2012 at 3:34 PM, Kevin Grittner wrote: > Claudio Freire wrote: > > > Selectivity is decided based on the number of distinct values on > > both sides, and the table's name "entity" makes me think it's a > > table that is reused for several things. That could be a problem, > > since that inflates distinct values, feeding misinformation to > > the planner. > > > > Rather than a generic "entity" table, perhaps it would be best to > > separate them different entities into different tables. > > I missed that; good catch. Good advice. > > Don't try to build a "database within a database" by having one > table for different types of data, with a code to sort them out. > EAV is a seriously bad approach for every situation where I've seen > someone try to use it. I was about to say it's like trying to drive > a nail with a pipe wrench, then realized it's more like putting a > bunch of hammers in a bag and swinging the bag at the nail. > > -Kevin > The ENTITY table has 2164493 rows with data as follows: type | count ---+ Contacts | 327352 Candidate| 34668 Emailst | 33604 Calendar | 493956 Contacts Image| 7 PriceBooks| 2 Notes Attachment | 17 SalesOrder| 6 Acc | 306832 ... .. (29 rows) Do you think partitioning will improve the overall performance of the application where all the queries have join with this table?
Re: [PERFORM] Why does the number of rows are different in actual and estimated.
AI Rumman wrote: > The ENTITY table has 2164493 rows with data as follows: > > type | count > ---+ > Contacts | 327352 > Candidate | 34668 > Emailst | 33604 > Calendar | 493956 > Contacts Image | 7 > PriceBooks | 2 > Notes Attachment | 17 > SalesOrder | 6 > Acc | 306832 > ... > .. > (29 rows) > > Do you think partitioning will improve the overall performance of > the application where all the queries have join with this table? I would not consider putting contacts, calendars, and sales orders in separate tables as "partitioning". It is normalizing. That will be useful if you happen to discover, for instance, that the data elements needed or relationships to other types of data for a calendar don't exactly match those for a contact image or a sales order. And yes, I would expect that using separate tables for fundamentally different types of data would improve performance. If some of these objects (like contacts and candidates) have common elements, you might want to either have both inherit from a common Person table, or (usually better, IMO) have both reference rows in a Person table. The latter is especially important if a contact can be a candidate and you want to be able to associate them. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Do I have a hardware or a software problem?
On Tue, Dec 11, 2012 at 6:03 PM, Craig Ringer wrote: > On 12/12/2012 06:44 AM, Evgeny Shishkin wrote: > > > On Dec 12, 2012, at 2:41 AM, Niels Kristian Schjødt > wrote: > > Are you using a hardware based raid controller with them? > > Yes, of course. Hardware raid with cache and bbu is a must. You can't get > fast fsync without it. > > > Most SSDs should offer fairly fast fsync without a hardware RAID controller, > as they do write-back caching. The trick is to find ones that do write-back > caching safely, so you don't get severe data corruption on power-loss. > > A HW RAID controller is an absolute must for rotating magnetic media, > though. > > > Also mdadm is a pain in the ass and is suitable only on amazon and other > cloud shit. > > > I've personally been pretty happy with mdadm. I find the array portability > it offers very useful, so I don't need to buy a second RAID controller just > in case my main controller dies and I need a compatible one to get the array > running again. If you don't need a BBU for safe write-back caching then > mdadm has advantages over hardware RAID. > > I'll certainly use mdadm over onboard fakeraid solutions or low-end hardware > RAID controllers. I suspect a mid- to high end HW RAID unit will generally > win. Also for sequential throughput md RAID is usually faster than most RAID controllers, even the high end Areca and LSI ones. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance