Re: [PERFORM] Optimization idea

2010-05-01 Thread Cédric Villemain
2010/4/28 Robert Haas :
> On Mon, Apr 26, 2010 at 5:33 AM, Cédric Villemain
>  wrote:
>> In the first query, the planner doesn't use the information of the 2,3,4.
>> It just does a : I'll bet I'll have 2 rows in t1 (I think it should
>> say 3, but it doesn't)
>> So it divide the estimated number of rows in the t2 table by 5
>> (different values) and multiply by 2 (rows) : 40040.
>
> I think it's doing something more complicated.  See scalararraysel().
>
>> In the second query the planner use a different behavior : it did
>> expand the value of t1.t to t2.t for each join relation and find a
>> costless plan. (than the one using seqscan on t2)
>
> I think the problem here is one we've discussed before: if the query
> planner knows that something is true of x (like, say, x =
> ANY('{2,3,4}')) and it also knows that x = y, it doesn't infer that
> the same thing holds of y (i.e. y = ANY('{2,3,4}') unless the thing
> that is known to be true of x is that x is equal to some constant.
> Tom doesn't think it would be worth the additional CPU time that it
> would take to make these sorts of deductions.  I'm not sure I believe
> that, but I haven't tried to write the code, either.

Relative to this too :
http://archives.postgresql.org/pgsql-general/2010-05/msg9.php  ?

>
> ...Robert
>



-- 
Cédric Villemain

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Robert Haas
On Fri, Apr 30, 2010 at 6:50 PM, Josh Berkus  wrote:
> Which is the opposite of my experience; currently we have several
> clients who have issues which required more-frequent analyzes on
> specific tables.

That's all fine, but probably not too relevant to the original
complaint - the OP backed off the default settings by several orders
of magnitude, which might very well cause a problem with both VACUUM
and ANALYZE.

I don't have a stake in the ground on what the right settings are, but
I think it's fair to say that if you vacuum OR analyze much less
frequently than what we recommend my default, it might break.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimization idea

2010-05-01 Thread Cédric Villemain
2010/5/1 Cédric Villemain :
> 2010/4/28 Robert Haas :
>> On Mon, Apr 26, 2010 at 5:33 AM, Cédric Villemain
>>  wrote:
>>> In the first query, the planner doesn't use the information of the 2,3,4.
>>> It just does a : I'll bet I'll have 2 rows in t1 (I think it should
>>> say 3, but it doesn't)
>>> So it divide the estimated number of rows in the t2 table by 5
>>> (different values) and multiply by 2 (rows) : 40040.
>>
>> I think it's doing something more complicated.  See scalararraysel().
>>
>>> In the second query the planner use a different behavior : it did
>>> expand the value of t1.t to t2.t for each join relation and find a
>>> costless plan. (than the one using seqscan on t2)
>>
>> I think the problem here is one we've discussed before: if the query
>> planner knows that something is true of x (like, say, x =
>> ANY('{2,3,4}')) and it also knows that x = y, it doesn't infer that
>> the same thing holds of y (i.e. y = ANY('{2,3,4}') unless the thing
>> that is known to be true of x is that x is equal to some constant.
>> Tom doesn't think it would be worth the additional CPU time that it
>> would take to make these sorts of deductions.  I'm not sure I believe
>> that, but I haven't tried to write the code, either.
>
> Relative to this too :
> http://archives.postgresql.org/pgsql-general/2010-05/msg9.php  ?

not, sorry ,misread about prepared statement in the other thread ...

>
>>
>> ...Robert
>>
>
>
>
> --
> Cédric Villemain
>



-- 
Cédric Villemain

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Wed, Apr 28, 2010 at 8:20 AM, Thomas Kellerer  wrote:
> Rick, 22.04.2010 22:42:
>>
>> So, in a large table, the scale_factor is the dominant term. In a
>> small table, the threshold is the dominant term. But both are taken into
>> account.
>>
>> The default values are set for small tables; it is not being run for
>> large tables.
>
> With 8.4 you can adjust the autovacuum settings per table...

You can as well with 8.3, but it's not made by alter table but by
pg_autovacuum table entries.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus  wrote:
> Which is the opposite of my experience; currently we have several
> clients who have issues which required more-frequent analyzes on
> specific tables.   Before 8.4, vacuuming more frequently, especially on
> large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
> with 8.4 it's not something you want to increase without thinking about
> the tradeoff

Actually I would think that statement would be be that before 8.3
vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
had to do with moving FSM to disk, making seldom vacuumed tables
easier to keep track of, and making autovac work better in the
presence of long running transactions.  The ability to tune IO load
etc was basically unchanged in 8.4.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Greg Smith

Robert Haas wrote:

I don't have a stake in the ground on what the right settings are, but
I think it's fair to say that if you vacuum OR analyze much less
frequently than what we recommend my default, it might break.
  


I think the default settings are essentially minimum recommended 
frequencies.  They aren't too terrible for the giant data warehouse case 
Josh was suggesting they came from--waiting until there's 20% worth of 
dead stuff before kicking off an intensive vacuum is OK when vacuum is 
expensive and you're mostly running big queries anyway.  And for smaller 
tables, the threshold helps it kick in a little earlier.  It's unlikely 
anyone wants to *increase* those, so that autovacuum runs even less; out 
of the box it's not tuned to run very often at all.


If anything, I'd expect people to want to increase how often it runs, 
for tables where much less than 20% dead is a problem.  The most common 
situation I've seen where that's the case is when you have a hotspot of 
heavily updated rows in a large table, and this may match some of the 
situations that Robert was alluding to seeing.  Let's say you have a big 
table where 0.5% of the users each update their respective records 
heavily, averaging 30 times each.  That's only going to result in 15% 
dead rows, so no autovacuum.  But latency for those users will suffer 
greatly, because they might have to do lots of seeking around to get 
their little slice of the data.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Tom Lane
Greg Smith  writes:
> If anything, I'd expect people to want to increase how often it runs, 
> for tables where much less than 20% dead is a problem.  The most common 
> situation I've seen where that's the case is when you have a hotspot of 
> heavily updated rows in a large table, and this may match some of the 
> situations that Robert was alluding to seeing.  Let's say you have a big 
> table where 0.5% of the users each update their respective records 
> heavily, averaging 30 times each.  That's only going to result in 15% 
> dead rows, so no autovacuum.  But latency for those users will suffer 
> greatly, because they might have to do lots of seeking around to get 
> their little slice of the data.

With a little luck, HOT will alleviate that case, since HOT updates can
be reclaimed without running vacuum per se.  I agree there's a risk
there though.

Now that partial vacuum is available, it'd be a real good thing to
revisit these numbers.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Robert Haas
On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe  wrote:
> On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus  wrote:
>> Which is the opposite of my experience; currently we have several
>> clients who have issues which required more-frequent analyzes on
>> specific tables.   Before 8.4, vacuuming more frequently, especially on
>> large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
>> with 8.4 it's not something you want to increase without thinking about
>> the tradeoff
>
> Actually I would think that statement would be be that before 8.3
> vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
> had to do with moving FSM to disk, making seldom vacuumed tables
> easier to keep track of, and making autovac work better in the
> presence of long running transactions.  The ability to tune IO load
> etc was basically unchanged in 8.4.

What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Sat, May 1, 2010 at 1:08 PM, Robert Haas  wrote:
> On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe  
> wrote:
>> On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus  wrote:
>>> Which is the opposite of my experience; currently we have several
>>> clients who have issues which required more-frequent analyzes on
>>> specific tables.   Before 8.4, vacuuming more frequently, especially on
>>> large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
>>> with 8.4 it's not something you want to increase without thinking about
>>> the tradeoff
>>
>> Actually I would think that statement would be be that before 8.3
>> vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
>> had to do with moving FSM to disk, making seldom vacuumed tables
>> easier to keep track of, and making autovac work better in the
>> presence of long running transactions.  The ability to tune IO load
>> etc was basically unchanged in 8.4.
>
> What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?

That really only has an effect no tables that aren't updated very
often.  Unless you've got a whole bunch of those, it's not that big of
a deal.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Sat, May 1, 2010 at 1:17 PM, Scott Marlowe  wrote:
> On Sat, May 1, 2010 at 1:08 PM, Robert Haas  wrote:
>> On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe  
>> wrote:
>>> On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus  wrote:
 Which is the opposite of my experience; currently we have several
 clients who have issues which required more-frequent analyzes on
 specific tables.   Before 8.4, vacuuming more frequently, especially on
 large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
 with 8.4 it's not something you want to increase without thinking about
 the tradeoff
>>>
>>> Actually I would think that statement would be be that before 8.3
>>> vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
>>> had to do with moving FSM to disk, making seldom vacuumed tables
>>> easier to keep track of, and making autovac work better in the
>>> presence of long running transactions.  The ability to tune IO load
>>> etc was basically unchanged in 8.4.
>>
>> What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?
>
> That really only has an effect no tables that aren't updated very
> often.  Unless you've got a whole bunch of those, it's not that big of
> a deal.

sigh, s/ no / on /

Anyway, my real point was that the big improvements that made vacuum
so much better came in 8.3, with HOT updates and multi-threaded vacuum
(that might have shown up in 8.2 even) 8.3 was a huge improvement and
compelling upgrade from 8.1 for me.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Robert Haas
On Sat, May 1, 2010 at 1:11 PM, Greg Smith  wrote:
> Robert Haas wrote:
>>
>> I don't have a stake in the ground on what the right settings are, but
>> I think it's fair to say that if you vacuum OR analyze much less
>> frequently than what we recommend my default, it might break.
>>
>
> I think the default settings are essentially minimum recommended
> frequencies.  They aren't too terrible for the giant data warehouse case
> Josh was suggesting they came from--waiting until there's 20% worth of dead
> stuff before kicking off an intensive vacuum is OK when vacuum is expensive
> and you're mostly running big queries anyway.  And for smaller tables, the
> threshold helps it kick in a little earlier.  It's unlikely anyone wants to
> *increase* those, so that autovacuum runs even less; out of the box it's not
> tuned to run very often at all.
>
> If anything, I'd expect people to want to increase how often it runs, for
> tables where much less than 20% dead is a problem.  The most common
> situation I've seen where that's the case is when you have a hotspot of
> heavily updated rows in a large table, and this may match some of the
> situations that Robert was alluding to seeing.  Let's say you have a big
> table where 0.5% of the users each update their respective records heavily,
> averaging 30 times each.  That's only going to result in 15% dead rows, so
> no autovacuum.  But latency for those users will suffer greatly, because
> they might have to do lots of seeking around to get their little slice of
> the data.

For me it's more that my applications are typically really fast, and
when they run at half-speed people think "oh, it's slow today" but
they can still work and attribute the problem to their computer, or
the network, or something.  When they slow down by like 10x then they
file a bug.  I'm typically dealing with a situation where the whole
database can be easily cached in RAM and the CPU is typically 90%
idle, which cushions the blow quite a bit.

A few months ago someone reported that "the portal was slow" and the
problem turned out to be that the database was bloated by in excess of
a factor a factor of 10 due to having blown out the free space map.  I
wasn't a regular user of that system at that time so hadn't had the
opportunity to notice myself.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance