[PERFORM] join vs exists

2010-06-20 Thread AI Rumman
Which one is good - join between table or using exists in where condition?

Query 1;

Select a.*
from a
where exists
(
select 1 from b inner join c on b.id1 = c.id where a.id = b.id)

Query 2:
select a.*
from a
inner join
(select b.id from b inner join c on b.id1 = c.id) as q
on a.id = q.id

Any suggestion please.


[PERFORM] Obtaining the exact size of the database.

2010-06-20 Thread venu madhav
Hi All,
  I am using Postgres 8.1.9 for my application. My application also has
a clean up module which cleans up specified percentage of total database
size at regular intervals. Now the problem is I use *pg_database_size* to
obtain the size of the database. After deleting the records, we run *Vacuum
Analyze* to reorder the indexes. The problem here is even though some
records are cleared, it still shows the original DB Size. Is there any way
to find out the actual DB Size or it would be more useful, if I can get the
size of each table.
 I can't run *Vacuum Full* because the application should be run 24*7
without downtime.

Can someone please help me in solving this.

Please let me know if you need any clarifications.

Thank you,
Venu.


[PERFORM] HashAggregate slower than sort?

2010-06-20 Thread Jatinder Sangha
Hi,

I've noticed something that I find strange with the hash-aggregate
feature of Postgres. I'm currently running Postgres v8.4.1 on Debian
Linux 64-bit.

I have a simple query that when planned either uses hash-aggregates or a
sort depending on the amount of working memory available. The problem is
that when it uses the hash-aggregates, the query runs 25% slower than
when using the sort method.

The table in question contains about 60 columns, many of which are
boolean, 32-bit integers and some are 64-bit integers. Many fields are
text - and some of these can be quite long (eg 32Kb).



The SQL is as follows:

explain analyse
select distinct T1.*
  from role T1
 where T1.endDate is null and T1.latest=true and T1.active=true and
   T1.deceased=false and T1.desk in (BIG LIST OF INTEGERS);


select version() --> "PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu,
compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit"
show enable_hashagg --> "on"
set work_mem='8MB'
show work_mem --> "8MB"

Explain analyse of the SQL above:
Unique  (cost=47033.71..48410.27 rows=8881 width=1057) (actual
time=18.803..38.969 rows=6449 loops=1)
  ->  Sort  (cost=47033.71..47055.91 rows=8881 width=1057) (actual
time=18.801..20.560 rows=6449 loops=1)
Sort Key: id, version, latest, active, deceased, person,
formalnotes, informalnotes, description, desk, rolelevel, roletype,
promotiondate, primaryrole, headofplace, careergrading, startdate,
enddate, percentsalary, deskf, rolelevelf, roletypef, promotiondatef,
primaryrolef, headofplacef, careergradingf, startdatef, enddatef,
percentsalaryf, descriptionf, deskmv, rolelevelmv, roletypemv,
promotiondatemv, primaryrolemv, headofplacemv, careergradingmv,
startdatemv, enddatemv, percentsalarymv, descriptionmv, hasattachments,
hasrelationships, hasprojects, audwho, audwhen, audcreated, costcentre,
reportsto, manages, startdateest, enddateest, hasstarperformers,
projectnames, sourcefrom, sourceto, checkedwho, checkedwhen,
checkednotes, hasqueries, querytitles
Sort Method:  quicksort  Memory: 2001kB
->  Bitmap Heap Scan on role t1  (cost=4888.59..42321.27
rows=8881 width=1057) (actual time=7.041..12.504 rows=6449 loops=1)
  Recheck Cond: (desk = ANY ('BIG LIST OF
INTEGERS'::bigint[]))
  Filter: ((enddate IS NULL) AND latest AND active AND (NOT
deceased))
  ->  Bitmap Index Scan on role_ix2  (cost=0.00..4886.37
rows=10984 width=0) (actual time=6.948..6.948 rows=9296 loops=1)
Index Cond: ((latest = true) AND (active = true) AND
(deceased = false) AND (desk = ANY ('BIG LIST OF INTEGERS'::bigint[])))
Total runtime: 40.777 ms



This execution of the query used a sort to perform the "distinct".



Now for the second run:

select version() --> "PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu,
compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit"
show enable_hashagg --> "on"
set work_mem='64MB'
show work_mem --> "64MB"

Explain analyse of the SQL above:
HashAggregate  (cost=43675.63..43764.44 rows=8881 width=1057) (actual
time=46.556..55.694 rows=6449 loops=1)
  ->  Bitmap Heap Scan on role t1  (cost=4888.59..42321.27 rows=8881
width=1057) (actual time=7.179..13.023 rows=6449 loops=1)
Recheck Cond: (desk = ANY ('BIG LIST OF INTEGERS'::bigint[]))
Filter: ((enddate IS NULL) AND latest AND active AND (NOT
deceased))
->  Bitmap Index Scan on role_ix2  (cost=0.00..4886.37
rows=10984 width=0) (actual time=7.086..7.086 rows=9296 loops=1)
  Index Cond: ((latest = true) AND (active = true) AND
(deceased = false) AND (desk = ANY ('BIG LIST OF INTEGERS'::bigint[])))
Total runtime: 57.536 ms




I've tested this with v8.4.4 as well with the same results. I also
tested the same query with our previous production version of Postgres
(v8.3.8) and that version only appears to use sorting not
hash-aggregates.



Obviously, I can re-write the query to use a "distinct on (...)" clause
to improve performance - which is what I've done, but my question is:
Why is the hash-aggregate slower than the sort?


Is it something to do with the number of columns? ie. When sorting, the
first few columns defined on the table (id, version) make the row unique
- but when using the hash-aggregate feature, presumably every column
needs to be hashed which takes longer especially for long text fields?

Thanks,
--Jatinder



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow function in queries SELECT clause.

2010-06-20 Thread Davor J.
I think I have read what is to be read about queries being prepared in 
plpgsql functions, but I still can not explain the following, so I thought 
to post it here:

Suppose 2 functions: factor(int,int) and offset(int, int).
Suppose a third function: convert(float,int,int) which simply returns 
$1*factor($2,$3)+offset($2,$3)
All three functions are IMMUTABLE.

Very simple, right? Now I have very fast AND very slow executing queries on 
some 150k records:

VERY FAST (half a second):

SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;

VERY SLOW (a minute):

SELECT convert(data, 1, 2) FROM tbl_data;

The slowness cannot be due to calling a function 150k times. If I define 
convert2(float,int,int) to return a constant value, then it executes in 
about a second. (still half as slow as the VERY FAST query).

I assume that factor and offset are cached in the VERY FAST query, and not 
in the slow one? If so, why not and how can I "force" it? Currently I need 
only one function for conversions.

Regards,
Davor 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow function in queries SELECT clause.

2010-06-20 Thread Szymon Guz
2010/6/19 Davor J. 

> I think I have read what is to be read about queries being prepared in
> plpgsql functions, but I still can not explain the following, so I thought
> to post it here:
>
> Suppose 2 functions: factor(int,int) and offset(int, int).
> Suppose a third function: convert(float,int,int) which simply returns
> $1*factor($2,$3)+offset($2,$3)
> All three functions are IMMUTABLE.
>
> Very simple, right? Now I have very fast AND very slow executing queries on
> some 150k records:
>
> VERY FAST (half a second):
> 
> SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;
>
> VERY SLOW (a minute):
> 
> SELECT convert(data, 1, 2) FROM tbl_data;
>
> The slowness cannot be due to calling a function 150k times. If I define
> convert2(float,int,int) to return a constant value, then it executes in
> about a second. (still half as slow as the VERY FAST query).
>
> I assume that factor and offset are cached in the VERY FAST query, and not
> in the slow one? If so, why not and how can I "force" it? Currently I need
> only one function for conversions.
>
> Regards,
> Davor
>
>
>
>
Hi,
show us the code of those two functions and explain analyze of those
queries.

regards
Szymon Guz


Re: [PERFORM] Slow function in queries SELECT clause.

2010-06-20 Thread Davor J.
I didn't consider them to be important as they showed the same, only the 
execution time was different. Also, they are a bit more complex than the ones 
put in the previous post. But here they are:

Definitions:
---
CREATE OR REPLACE FUNCTION appfunctions.fnc_unit_conversion_factor(_tree_id 
integer, _unit_to_id integer)
  RETURNS real AS
$BODY$ 
DECLARE
BEGIN 
RETURN (SELECT unit_conv_factor AS factor
  FROM vew_unit_conversions AS c
  INNER JOIN tbl_sensors AS s ON (s.unit_id = c.unit_id_from)
  INNER JOIN tbl_trees USING (sens_id)
  WHERE tree_id = _tree_id AND unit_id_to = _unit_to_id)::real;
END; 
$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE
--
CREATE OR REPLACE FUNCTION appfunctions.fnc_unit_conversion_offset(_tree_id 
integer, _unit_to_id integer)
  RETURNS real AS
$BODY$ 
DECLARE
BEGIN 
RETURN (SELECT unit_conv_offset AS offset
  FROM vew_unit_conversions AS c
  INNER JOIN tbl_sensors AS s ON (s.unit_id = c.unit_id_from)
  INNER JOIN tbl_trees USING (sens_id)
  WHERE tree_id = _tree_id AND unit_id_to = _unit_to_id)::real;
END; 
$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE
--
CREATE OR REPLACE FUNCTION appfunctions.fnc_unit_convert(_rawdata real, 
_tree_id integer, _unit_to_id integer)
  RETURNS real AS
$BODY$ 
DECLARE 
BEGIN 
RETURN _rawdata
 * fnc_unit_conversion_factor(_tree_id, _unit_to_id) 
 + fnc_unit_conversion_offset(_tree_id, _unit_to_id);
END; 
$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE



Executions:
---
EXPLAIN ANALYSE SELECT timestamp,

data_from_tree_id_70 AS "flow_11" 

 FROM 
(SELECT sens_chan_data_timestamp AS timestamp, sens_chan_data_data AS 
data_from_tree_id_70 FROM tbl_sensor_channel_data WHERE tree_id = 70 AND 
sens_chan_data_timestamp >= '2008-06-11T00:00:00' AND sens_chan_data_timestamp 
<= '2008-06-18T00:00:00' ) AS "70" 

 ORDER BY timestamp;

"Sort  (cost=175531.00..175794.64 rows=105456 width=12) (actual 
time=598.454..638.400 rows=150678 loops=1)"
"  Sort Key: tbl_sensor_channel_data.sens_chan_data_timestamp"
"  Sort Method:  external sort  Disk: 3240kB"
"  ->  Bitmap Heap Scan on tbl_sensor_channel_data  (cost=3005.29..166732.66 
rows=105456 width=12) (actual time=34.810..371.099 rows=150678 loops=1)"
"Recheck Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= 
'2008-06-11 00:00:00'::timestamp without time zone) AND 
(sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time 
zone))"
"->  Bitmap Index Scan on tbl_sensor_channel_data_pkey  
(cost=0.00..2978.92 rows=105456 width=0) (actual time=28.008..28.008 
rows=150678 loops=1)"
"  Index Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= 
'2008-06-11 00:00:00'::timestamp without time zone) AND 
(sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time 
zone))"
"Total runtime: 663.478 ms"
---
EXPLAIN ANALYSE SELECT timestamp,

fnc_unit_convert(data_from_tree_id_70, 70, 7) AS "flow_11" 

 FROM 
(SELECT sens_chan_data_timestamp AS timestamp, sens_chan_data_data AS 
data_from_tree_id_70 FROM tbl_sensor_channel_data WHERE tree_id = 70 AND 
sens_chan_data_timestamp >= '2008-06-11T00:00:00' AND sens_chan_data_timestamp 
<= '2008-06-18T00:00:00' ) AS "70" 

 ORDER BY timestamp;

"Sort  (cost=201895.00..202158.64 rows=105456 width=12) (actual 
time=35334.017..35372.977 rows=150678 loops=1)"
"  Sort Key: tbl_sensor_channel_data.sens_chan_data_timestamp"
"  Sort Method:  external sort  Disk: 3240kB"
"  ->  Bitmap Heap Scan on tbl_sensor_channel_data  (cost=3005.29..193096.66 
rows=105456 width=12) (actual time=60.012..35037.129 rows=150678 loops=1)"
"Recheck Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= 
'2008-06-11 00:00:00'::timestamp without time zone) AND 
(sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time 
zone))"
"->  Bitmap Index Scan on tbl_sensor_channel_data_pkey  
(cost=0.00..2978.92 rows=105456 width=0) (actual time=21.884..21.884 
rows=150678 loops=1)"
"  Index Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= 
'2008-06-11 00:00:00'::timestamp without time zone) AND 
(sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time 
zone))"
"Total runtime: 35397.841 ms"
---
EXPLAIN ANALYSE SELECT timestamp,

data_from_tree_id_70*fnc_unit_conversion_factor(70, 7)+ 
fnc_unit_conversion_offset(70, 7) AS "flow_11" 

 FROM 
(SELECT sens_chan_data_timestamp AS timestamp, sens_chan_data_data AS 
data_from_tree_id_70 FROM tbl_sensor_channel_data WHERE tree_id = 70 AND 
sens_chan_data_timestamp >= '2008-06-11T00:00:00' AND sens_chan_data_timestamp 
<= '2008-06-18T00:00:00' ) AS "70" 

 ORDER BY timestamp;

EXPLAIN ANALYSE SELECT timestamp,

"Sort  (cost=176058.28..176321.92 rows=105456 width=12) (actual 
time=630.350..669.843 rows=1506

Re: [PERFORM] Obtaining the exact size of the database.

2010-06-20 Thread Dave Crooke
Hi there

1. PG 8.1.9 is ancient ... you should upgrade.

2. The database gross size on disk is not affected by VACUUM ANALYZE ... all
this does is return space used by deleted row-versions to PG for re-use. The
only way to reduce it and thus return disk space to the OS is to do a VACUUM
FULL, or to delete the entire table.

3. If you can suspend writes for a while, you can pull off an "online"
VACCUM FULL, or copy and delete the table in order to repack it. Check out
the CLUSTER command.

4. If you're trying to figure out the net size of the table, i.e. how much
free space is inside the table files for reuse by PG, then you need the
pg_stat_tuple function ... this is built in to PG 8.4, and a plug-in
activated by a script for PG 8.3, don't know if it exists in 8.1 or not.
Like SELECT COUNT(*) this requires a full table scan.

Cheers
Dave

sent from my Android phone

On Jun 20, 2010 6:18 AM, "venu madhav"  wrote:

Hi All,
  I am using Postgres 8.1.9 for my application. My application also has
a clean up module which cleans up specified percentage of total database
size at regular intervals. Now the problem is I use *pg_database_size* to
obtain the size of the database. After deleting the records, we run *Vacuum
Analyze* to reorder the indexes. The problem here is even though some
records are cleared, it still shows the original DB Size. Is there any way
to find out the actual DB Size or it would be more useful, if I can get the
size of each table.
 I can't run *Vacuum Full* because the application should be run 24*7
without downtime.

Can someone please help me in solving this.

Please let me know if you need any clarifications.

Thank you,
Venu.


Re: [PERFORM] join vs exists

2010-06-20 Thread Kevin Grittner
AI Rumman  wrote:
 
> Which one is good - join between table or using exists in where
> condition?
 
Your example wouldn't return the same results unless there was at
most one matching row in b and one matching row in c, at least
without resorting to DISTINCT (which you don't show).  So, be careful
of not getting the wrong results in an attempt to optimize.
 
You don't say which version of PostgreSQL you're using, but if its a
fairly recent major version, I would expect nearly identical
performance if the queries returned the same results without
DISTINCT, and would usually expect better results for the EXISTS than
the JOIN with DISTINCT.
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow function in queries SELECT clause.

2010-06-20 Thread Tom Lane
"Davor J."  writes:
> Suppose 2 functions: factor(int,int) and offset(int, int).
> Suppose a third function: convert(float,int,int) which simply returns 
> $1*factor($2,$3)+offset($2,$3)
> All three functions are IMMUTABLE.

You should write the third function as a SQL function, which'd allow it
to be inlined.

> VERY FAST (half a second):
> 
> SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;

In this case both factor() calls are folded to constants, hence executed
only once.

> VERY SLOW (a minute):
> 
> SELECT convert(data, 1, 2) FROM tbl_data;

Without inlining, there's no hope of any constant-folding here.
The optimizer just sees the plpgsql function as a black box and
can't do anything with it.

BTW, your later mail shows that the factor() functions are not really
IMMUTABLE, since they select from tables that presumably are subject to
change.  The "correct" declaration would be STABLE.  If you're relying
on constant-folding to get reasonable application performance, you're
going to have to continue to mislabel them as IMMUTABLE; but be aware
that you're likely to have issues any time you do change the table
contents.  The changes won't get reflected into existing query plans.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Obtaining the exact size of the database.

2010-06-20 Thread Tom Lane
Dave Crooke  writes:
> 4. If you're trying to figure out the net size of the table, i.e. how much
> free space is inside the table files for reuse by PG, then you need the
> pg_stat_tuple function ... this is built in to PG 8.4, and a plug-in
> activated by a script for PG 8.3, don't know if it exists in 8.1 or not.
> Like SELECT COUNT(*) this requires a full table scan.

I think what the OP actually wants is the number of live rows, so plain
old SELECT COUNT(*) would do it.  If that's too slow, a good alternative
is to ANALYZE the table and then look at its pg_class.reltuples entry
--- of course that will only be an approximate count.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Aggressive autovacuuming ?

2010-06-20 Thread Jesper Krogh

Hi.

I have been wondering if anyone has been experimenting with "really 
agressive"
autovacuuming. The database I'm adminstrating rarely have "long running" 
transactions

(over several minutes). And a fair amount of buffercache and an OS cache of
(at best 64GB). A lot of the OS cache is being used for read-caching.

My thought was that if I tuned autovacuum to be "really aggressive" then
I could get autovacuum to actually vacuum the tuples before they
get evicted from the OS cache thus effectively "saving" the IO-overhead
of vacuuming.

The largest consequence I can see at the moment is that when I get a
full vacuum (for preventing transaction-id wraparound) it would be
run with the same aggressive settings, thus giving a real performance
hit in that situation.

Has anyone tried to do similar? What is your experience?
Is the idea totally bogus?

Jesper

--
Jesper Krogh

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-20 Thread Greg Smith

Robert Haas wrote:

This is drifting a bit off-topic for this thread, but it's not so easy
to figure out from looking at the TODO which things are actually
important.  Performance-related improvements are mixed in with
non-performance related improvements, which are mixed in with things
that are probably not improvements at all.  And even to the extent
that you can identify the stuff that's performance-related, it's far
from obvious which things are most important.  Any thoughts on that


I don't think it's off topic at all actually, and as usually I'll be 
happy to argue why.  Reorganizing the TODO so that it's easier for 
newcomers to consume is certainly a worthwhile but hard to "fund" (find 
time to do relative to more important things) effort itself.  My point 
was more that statistically, *anything* on that list is likely a better 
candidate for something to work on usefully than one of the random 
theoretical performance improvements from research that pop on the lists 
from time to time.  People get excited about these papers and blog posts 
sometimes, but the odds of those actually being in the critical path 
where it represents a solution to a current PostgreSQL bottleneck is 
dramatically lower than that you'll find one reading the list of *known* 
issues.  Want to improve PostgreSQL performance?  Spend more time 
reading the TODO, less looking around elsewhere for problems the 
database may or may not have.


I have a major time sink I'm due to free myself from this week, and the 
idea of providing some guidance for a "low hanging performance fruit" 
section of the TODO is a good one I should take a look at.  I have a 
personal list of that sort already I should probably just make public, 
since the ideas for improving things are not the valuable part I should 
worry about keeping private anyway.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Obtaining the exact size of the database.

2010-06-20 Thread Greg Smith

venu madhav wrote:
The problem here is even though some records are cleared, it still 
shows the original DB Size. Is there any way to find out the actual DB 
Size or it would be more useful, if I can get the size of each table.


One of the queries at http://wiki.postgresql.org/wiki/Disk_Usage should 
give you the breakdown per table.  Regular VACUUM doesn't ever shrink 
the database from the operating system perspective unless you hit a very 
unusual situation (all of the free space is at the end).  There is no 
way to do that without system downtime of sorts in the form a 
potentially long database lock, such as VACUUM FULL (the main option on 
8.1, the alternative of using CLUSTER isn't a good idea until 8.3).  The 
best you can do is making sure you VACUUM often enough that space is 
regularly reused.


It's hard to run a 24x7 environment on 8.1.  Much easier on 8.4, where 
the major things that regularly left people with quite bad VACUUM 
cleanup situations are all less likely to occur than on any previous 
version.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Aggressive autovacuuming ?

2010-06-20 Thread Scott Marlowe
On Sun, Jun 20, 2010 at 11:44 AM, Jesper Krogh  wrote:
> Hi.
>
> I have been wondering if anyone has been experimenting with "really
> agressive"
> autovacuuming.

I have been using moderately aggressive autovac, with 6 or more
threads running with 1ms sleep, then keeping track of them to see if
they're being too aggresive.  Basically as long as io utilization
doesn't hit 100% it doesn't seem to have any negative or even
noticeable effect.

I head more in the direction of running a few more threads than I
absolutely need to keep up with bloat.  If I'm set for 5 threads and I
always have five threads running, I go to 6, 7, 8 or wherever they're
never all active.

But you need the IO capability to use aggresive vacuuming.

> The database I'm adminstrating rarely have "long running"
> transactions
> (over several minutes). And a fair amount of buffercache and an OS cache of
> (at best 64GB). A lot of the OS cache is being used for read-caching.
>
> My thought was that if I tuned autovacuum to be "really aggressive" then
> I could get autovacuum to actually vacuum the tuples before they
> get evicted from the OS cache thus effectively "saving" the IO-overhead
> of vacuuming.

But vacuuming by design has to write out and that's the real resource
you're likely to use up first.

> The largest consequence I can see at the moment is that when I get a
> full vacuum (for preventing transaction-id wraparound) it would be

I assume you mean the automatic database wide vacuum.  I don't think
8.4 and above need that anymore.  I thnk 8.3 does that too, but I'm
not 100% sure.

> run with the same aggressive settings, thus giving a real performance
> hit in that situation.
>
> Has anyone tried to do similar? What is your experience?
> Is the idea totally bogus?

Cranking up autovacuum is not a bogus idea, but it directly impacts
your IO subsystem, and if you get too aggressive (zero naptime is way
aggressive) you have to back off on the number of threads to keep
things sane.  If your IO subsystem is one 7200RPM SATA drive with
write cache disabled / fsync properly working, you're not gonna be
able to get very aggresive before you make your IO subsystem bog down.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Obtaining the exact size of the database.

2010-06-20 Thread Scott Marlowe
On Sun, Jun 20, 2010 at 2:04 PM, Greg Smith  wrote:

> It's hard to run a 24x7 environment on 8.1.  Much easier on 8.4, where the
> major things that regularly left people with quite bad VACUUM cleanup
> situations are all less likely to occur than on any previous version.

Here here.  keeping anything before 8.2 fed and happy is pretty
difficult in 24/7 environments.  8.2 and 8.3 are ok if you keep a
close eye on them.  And it just gets better from there.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance