Re: [PERFORM] hash join vs nested loop join

2012-12-14 Thread Kevin Grittner
Huan Ruan wrote:
> Kevin Grittner wrote:

>> With a low cache hit rate, that would generally be when the number
>> of lookups into the table exceeds about 10% of the table's rows.
>
> So far, my main performance issue comes down to this pattern where
> Postgres chooses hash join that's slower than a nest loop indexed join. By
> changing those cost parameters, this query works as expected now, but there
> are others fall into the same category and appear to be harder to convince
> the optimiser.
> 
> I'm still a bit worried about this query as Postgres gets the record count
> right, and knows the index is a primary key index, therefore it knows it's
> 0.05m out of 170m records (0.03%) but still chooses the sequential scan.
> Hopefully this is just related to that big index penalty bug introduced in
> 9.2.

Quite possibly, but it could be any of a number of other things,
like a type mismatch. It might be best to rule out other causes. If
you post the new query and EXPLAIN ANALYZE output, along with the
settings you have now adopted, someone may be able to spot
something. It wouldn't hurt to repeat OS and hardware info with it
so people have it handy for reference.

-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] backend suddenly becomes slow, then remains slow

2012-12-14 Thread Andrew Dunstan
One of my clients has an odd problem. Every so often a backend will 
suddenly become very slow. The odd thing is that once this has happened 
it remains slowed down, for all subsequent queries. Zone reclaim is off. 
There is no IO or CPU spike, no checkpoint issues or stats timeouts, no 
other symptom that we can see. The problem was a lot worse that it is 
now, but two steps have alleviated it mostly, but not completely: much 
less aggressive autovacuuming and reducing the maximum lifetime of 
backends in the connection pooler to 30 minutes.


It's got us rather puzzled. Has anyone seen anything like this?

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread AI Rumman
Does FK Constraint help to improve performance? Or it is only
for maintaining data integrity?

On Thu, Dec 13, 2012 at 7:38 PM, Claudio Freire wrote:

> On Thu, Dec 13, 2012 at 8:50 PM, Evgeny Shishkin 
> wrote:
> >>> OP joins 8 tables, and i suppose join collapse limit is set to default
> 8. I thought postgresql's optimiser is not mysql's.
> >>
> >> It's not obvious to me that there's anything very wrong with the plan.
> >> An 8-way join that produces 150K rows is unlikely to run in milliseconds
> >> no matter what the plan.  The planner would possibly have done the last
> >> join step differently if it had had a better rowcount estimate, but even
> >> if that were free the query would still have been 7 seconds (vs 8.5).
> >>
> >
> > May be in this case it is. I once wrote to this list regarding similar
> problem - joining 4 tables, result set are off by 2257 times - 750ms vs
> less then 1ms. Unfortunately the question was not accepted to the list.
> >
> > I spoke to Bruce Momjian about that problem on one local conference, he
> said shit happens :)
>
> I think it's more likely a missing FK constraint.
>


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread Claudio Freire
On Fri, Dec 14, 2012 at 4:01 PM, AI Rumman  wrote:
> Does FK Constraint help to improve performance? Or it is only for
> maintaining data integrity?

I'm not entirely sure it's taken into account, I think it is, but a FK
would tell the planner that every non-null value will produce a row.
It seems to think there are a large portion of non-null values that
don't.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread Kevin Grittner
AI Rumman wrote:
> Claudio Freire wrote:
>> I think it's more likely a missing FK constraint.

> Does FK Constraint help to improve performance? Or it is only
> for maintaining data integrity?

I'm not aware of any situation where adding a foreign key
constraint would improve performance.

-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread Tom Lane
"Kevin Grittner"  writes:
> AI Rumman wrote:
>> Does FK Constraint help to improve performance? Or it is only
>> for maintaining data integrity?

> I'm not aware of any situation where adding a foreign key
> constraint would improve performance.

There's been talk of teaching the planner to use the existence of FK
constraints to improve plans, but I don't believe any such thing is
in the code today.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread AI Rumman
So I am going to change
join_collapse_limit
and
from_collapse_limit
to 20.

Do I need to set geqo_threshold to greater than 20. Now it is 12 ( default).

And could you let me know why geqo_optimizer is not working good in this
case?



On Fri, Dec 14, 2012 at 2:22 PM, Tom Lane  wrote:

> "Kevin Grittner"  writes:
> > AI Rumman wrote:
> >> Does FK Constraint help to improve performance? Or it is only
> >> for maintaining data integrity?
>
> > I'm not aware of any situation where adding a foreign key
> > constraint would improve performance.
>
> There's been talk of teaching the planner to use the existence of FK
> constraints to improve plans, but I don't believe any such thing is
> in the code today.
>
> regards, tom lane
>


Re: [PERFORM] backend suddenly becomes slow, then remains slow

2012-12-14 Thread Tom Lane
Andrew Dunstan  writes:
> One of my clients has an odd problem. Every so often a backend will 
> suddenly become very slow. The odd thing is that once this has happened 
> it remains slowed down, for all subsequent queries. Zone reclaim is off. 
> There is no IO or CPU spike, no checkpoint issues or stats timeouts, no 
> other symptom that we can see. The problem was a lot worse that it is 
> now, but two steps have alleviated it mostly, but not completely: much 
> less aggressive autovacuuming and reducing the maximum lifetime of 
> backends in the connection pooler to 30 minutes.

> It's got us rather puzzled. Has anyone seen anything like this?

Maybe the kernel is auto-nice'ing the process once it's accumulated X
amount of CPU time?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread Claudio Freire
On Fri, Dec 14, 2012 at 4:22 PM, Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> AI Rumman wrote:
>>> Does FK Constraint help to improve performance? Or it is only
>>> for maintaining data integrity?
>
>> I'm not aware of any situation where adding a foreign key
>> constraint would improve performance.
>
> There's been talk of teaching the planner to use the existence of FK
> constraints to improve plans, but I don't believe any such thing is
> in the code today.

That made me look the code.

So, eqjoinsel_inner in selfuncs.c would need those smarts. Cool.

Anyway, reading the code, I think I can now spot the possible issue
behind all of this.

Selectivity is decided based on the number of distinct values on both
sides, and the table's name "entity" makes me think it's a table that
is reused for several things. That could be a problem, since that
inflates distinct values, feeding misinformation to the planner.

Rather than a generic "entity" table, perhaps it would be best to
separate them different entities into different tables. Failing that,
maybe if you have an "entity type" kind of column, you could try
refining the join condition to filter by that kind, hopefully there's
an index over entity kind and the planner can use more accurate MCV
data.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] backend suddenly becomes slow, then remains slow

2012-12-14 Thread Andrew Dunstan

On 12/14/2012 02:56 PM, Tom Lane wrote:

Andrew Dunstan  writes:

One of my clients has an odd problem. Every so often a backend will
suddenly become very slow. The odd thing is that once this has happened
it remains slowed down, for all subsequent queries. Zone reclaim is off.
There is no IO or CPU spike, no checkpoint issues or stats timeouts, no
other symptom that we can see. The problem was a lot worse that it is
now, but two steps have alleviated it mostly, but not completely: much
less aggressive autovacuuming and reducing the maximum lifetime of
backends in the connection pooler to 30 minutes.
It's got us rather puzzled. Has anyone seen anything like this?

Maybe the kernel is auto-nice'ing the process once it's accumulated X
amount of CPU time?





That was my initial thought, but the client said not. We'll check again.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread AI Rumman
Yes, I do have a column in entity table like
setype where the values are 'Contacts', 'Candidate' etc.
I have an index on it also.
Are you suggesting to make different table for Contacts, Candidate etc.

On Fri, Dec 14, 2012 at 3:10 PM, Claudio Freire wrote:

> On Fri, Dec 14, 2012 at 4:22 PM, Tom Lane  wrote:
> > "Kevin Grittner"  writes:
> >> AI Rumman wrote:
> >>> Does FK Constraint help to improve performance? Or it is only
> >>> for maintaining data integrity?
> >
> >> I'm not aware of any situation where adding a foreign key
> >> constraint would improve performance.
> >
> > There's been talk of teaching the planner to use the existence of FK
> > constraints to improve plans, but I don't believe any such thing is
> > in the code today.
>
> That made me look the code.
>
> So, eqjoinsel_inner in selfuncs.c would need those smarts. Cool.
>
> Anyway, reading the code, I think I can now spot the possible issue
> behind all of this.
>
> Selectivity is decided based on the number of distinct values on both
> sides, and the table's name "entity" makes me think it's a table that
> is reused for several things. That could be a problem, since that
> inflates distinct values, feeding misinformation to the planner.
>
> Rather than a generic "entity" table, perhaps it would be best to
> separate them different entities into different tables. Failing that,
> maybe if you have an "entity type" kind of column, you could try
> refining the join condition to filter by that kind, hopefully there's
> an index over entity kind and the planner can use more accurate MCV
> data.
>


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread Claudio Freire
On Fri, Dec 14, 2012 at 5:25 PM, AI Rumman  wrote:
> Are you suggesting to make different table for Contacts, Candidate etc.

Yes


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread Kevin Grittner
Claudio Freire wrote:

> Selectivity is decided based on the number of distinct values on
> both sides, and the table's name "entity" makes me think it's a
> table that is reused for several things. That could be a problem,
> since that inflates distinct values, feeding misinformation to
> the planner.
> 
> Rather than a generic "entity" table, perhaps it would be best to
> separate them different entities into different tables.

I missed that; good catch. Good advice.

Don't try to build a "database within a database" by having one
table for different types of data, with a code to sort them out.
EAV is a seriously bad approach for every situation where I've seen
someone try to use it. I was about to say it's like trying to drive
a nail with a pipe wrench, then realized it's more like putting a
bunch of hammers in a bag and swinging the bag at the nail.

-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread AI Rumman
On Fri, Dec 14, 2012 at 3:34 PM, Kevin Grittner  wrote:

> Claudio Freire wrote:
>
> > Selectivity is decided based on the number of distinct values on
> > both sides, and the table's name "entity" makes me think it's a
> > table that is reused for several things. That could be a problem,
> > since that inflates distinct values, feeding misinformation to
> > the planner.
> >
> > Rather than a generic "entity" table, perhaps it would be best to
> > separate them different entities into different tables.
>
> I missed that; good catch. Good advice.
>
> Don't try to build a "database within a database" by having one
> table for different types of data, with a code to sort them out.
> EAV is a seriously bad approach for every situation where I've seen
> someone try to use it. I was about to say it's like trying to drive
> a nail with a pipe wrench, then realized it's more like putting a
> bunch of hammers in a bag and swinging the bag at the nail.
>
> -Kevin
>

The ENTITY table has 2164493 rows with data as follows:

type | count
---+
 Contacts  | 327352
 Candidate|  34668
 Emailst |  33604
 Calendar  | 493956
 Contacts Image|  7
 PriceBooks|  2
 Notes Attachment  | 17
 SalesOrder|  6
 Acc  | 306832
...
..
(29 rows)

Do you think partitioning will improve the overall performance of the
application where all the queries have join with this table?


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-14 Thread Kevin Grittner
AI Rumman wrote:

> The ENTITY table has 2164493 rows with data as follows:
> 
>  type | count
> ---+
>  Contacts | 327352
>  Candidate | 34668
>  Emailst | 33604
>  Calendar | 493956
>  Contacts Image | 7
>  PriceBooks | 2
>  Notes Attachment | 17
>  SalesOrder | 6
>  Acc | 306832
> ...
> ..
> (29 rows)
> 
> Do you think partitioning will improve the overall performance of
> the application where all the queries have join with this table?

I would not consider putting contacts, calendars, and sales orders
in separate tables as "partitioning". It is normalizing. That will
be useful if you happen to discover, for instance, that the data
elements needed or relationships to other types of data for a
calendar don't exactly match those for a contact image or a sales
order.

And yes, I would expect that using separate tables for
fundamentally different types of data would improve performance. If
some of these objects (like contacts and candidates) have common
elements, you might want to either have both inherit from a common
Person table, or (usually better, IMO) have both reference rows in
a Person table. The latter is especially important if a contact can
be a candidate and you want to be able to associate them.

-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-14 Thread Scott Marlowe
On Tue, Dec 11, 2012 at 6:03 PM, Craig Ringer  wrote:
> On 12/12/2012 06:44 AM, Evgeny Shishkin wrote:
>
>
> On Dec 12, 2012, at 2:41 AM, Niels Kristian Schjødt
>  wrote:
>
> Are you using a hardware based raid controller with them?
>
> Yes, of course. Hardware raid with cache and bbu is a must. You can't get
> fast fsync without it.
>
>
> Most SSDs should offer fairly fast fsync without a hardware RAID controller,
> as they do write-back caching. The trick is to find ones that do write-back
> caching safely, so you don't get severe data corruption on power-loss.
>
> A HW RAID controller is an absolute must for rotating magnetic media,
> though.
>
>
> Also mdadm is a pain in the ass and is suitable only on amazon and other
> cloud shit.
>
>
> I've personally been pretty happy with mdadm. I find the array portability
> it offers very useful, so I don't need to buy a second RAID controller just
> in case my main controller dies and I need a compatible one to get the array
> running again. If you don't need a BBU for safe write-back caching then
> mdadm has advantages over hardware RAID.
>
> I'll certainly use mdadm over onboard fakeraid solutions or low-end hardware
> RAID controllers. I suspect a mid- to high end HW RAID unit will generally
> win.

Also for sequential throughput md RAID is usually faster than most
RAID controllers, even the high end Areca and LSI ones.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance