[PERFORM] Queries related to checkpoints

2011-07-29 Thread Rohan Malhotra
Hi,

I am a Noob with db tuning and trying to analyze pg_stats_brwriter data

checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
maxwritten_clean | buffers_backend | buffers_alloc


---+-++---+--+-+---+--
35241 |  58 | 699136 |581839 |
  1597 | 1663650 |2205969940


Almost all checkpoints (99.8%) that happened are because of
checkpoint_timeout passing. Is this good or should I increaase my
checkpoint_segments?
During checkpoints, 699136 8K buffers were written out which is pretty low
(less than 1MB).

buffers allocated (2205969940 8K), 1663650 times a database backend
(probably the client itself) had to write a page in order to make space for
the new allocation. Buffer allocated seems to be too high than backend
buffers.

How to read more into the data?

Regards
Rohan


Re: [PERFORM] Queries related to checkpoints

2011-07-29 Thread Kevin Grittner
Rohan Malhotra  wrote:
 
First off, for a one-row result with too many values to fit on one
line, you might want to use this in psql:
 
\x on
 
More importantly, you seem to be misinterpreting the numbers.
 
You've allocated 2,205,969,940 buffers.  Of those allocations, the
allocating backend had to first write a dirty buffer to free up a
buffer to use 1,663,650 times.  That's pretty small as a percentage
of allocations, but since it's larger than the other causes of dirty
buffer writes (699,136 during checkpoints and 581,839 by the
background writer), I would be tempted to make the background writer
a little more aggressive.  Assuming you're currently at the defaults
for these, perhaps:
 
bgwriter_lru_maxpages = 200
bgwriter_lru_multiplier = 4
 
This may (or may not) increase the physical writes on your system,
so you want to closely monitor the impact of the change in terms of
whatever metrics matter most to you.  For example, in our shop, we
tend to tune our big databases which back a website such that we get
zero "write storms" which cause delays of 20 seconds or more on
queries which rormally run in less than a millisecond.  Your
concerns may be different.
 
For more detailed treatment of the issue look for posts by Greg
Smith; or better yet, buy his book:
 
http://www.postgresql.org/docs/books/
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Trigger or Function

2011-07-29 Thread alan
> My first approach would be to remove WeekAvg and MonthAvg from the
> table and create a view which calculates appropriate values.

Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
features.
Here is how I set it up. If anyone sees an issue, please let me know.
I'm new to postgres.

Basically, my "daily_vals" table contains HOST, DATE, & VALUE columns.
What I wanted was a way to automatically populate a 4th column
called "rolling_average", which would be the sum of  preceding
columns.

testdb=# select * from daily_vals;
 rid |date|  host  |  value
-+++-
  1  | 2011-07-01 |  hosta | 100.
  2  | 2011-07-02 |  hosta | 200.
  3  | 2011-07-03 |  hosta | 400.
  4  | 2011-07-04 |  hosta | 500.
  5  | 2011-07-05 |  hosta | 100.
  6  | 2011-07-06 |  hosta | 700.
  7  | 2011-07-07 |  hosta | 200.
  8  | 2011-07-08 |  hosta | 100.
  9  | 2011-07-09 |  hosta | 100.
 10  | 2011-07-10 |  hosta | 100.
 11  | 2011-07-01 |  hostb |   5.7143
 12  | 2011-07-02 |  hostb |   8.5714
 13  | 2011-07-03 |  hostb |  11.4286
 14  | 2011-07-04 |  hostb |   8.5714
 15  | 2011-07-05 |  hostb |   2.8571
 16  | 2011-07-06 |  hostb |   1.4286
 17  | 2011-07-07 |  hostb |   1.4286


I created a view called weekly_average using this VIEW statement.

CREATE OR REPLACE
  VIEW weekly_average
AS SELECT *, sum(value) OVER (PARTITION BY host
ORDER BY rid
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) as rolling_average FROM daily_vals;


The I query the view just like a regular table.
the rolling average is calulated from the previuous 6 rows (for each
host).

testdb=# select * from weekly_average;
 rid |date|  host  |  value   | rolling_average
-+++--+--
  1  | 2011-07-01 |  hosta | 100. |   100.
  2  | 2011-07-02 |  hosta | 200. |   300.
  3  | 2011-07-03 |  hosta | 400. |   700.
  4  | 2011-07-04 |  hosta | 500. |  1200.
  5  | 2011-07-05 |  hosta | 100. |  1300.
  6  | 2011-07-06 |  hosta | 700. |  2000.
  7  | 2011-07-07 |  hosta | 200. |  1400.
  8  | 2011-07-08 |  hosta | 100. |  1400.
  9  | 2011-07-09 |  hosta | 100. |  1200.
 10  | 2011-07-10 |  hosta | 100. |   600.
 11  | 2011-07-01 |  hostb |   5.7143 | 5.7143
 12  | 2011-07-02 |  hostb |   8.5714 |14.2857
 13  | 2011-07-03 |  hostb |  11.4286 |25.7143
 14  | 2011-07-04 |  hostb |   8.5714 |34.2857
 15  | 2011-07-05 |  hostb |   2.8571 |37.1428
 16  | 2011-07-06 |  hostb |   1.4286 |38.5714
 17  | 2011-07-07 |  hostb |   1.4286 |40.

Alan



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad query plan

2011-07-29 Thread Gavin Flower

On 25/07/11 02:06, Дмитрий Васильев wrote:

I have a problem with poor query plan.

My PostgreSQL is "PostgreSQL 8.4.8, compiled by Visual C++ build 1400,
32-bit" installed by EnterpriseDB installer on Windows 7 32 bit.

Steps to reproduce:

Start with fresh installation and execute the following:

drop table if exists small;
drop table if exists large;

CREATE TABLE small
(
   id bigint,
   primary key(id)
);

CREATE TABLE large
(
   id bigint,
   primary key(id)
);

--Insert 10 rows into large
CREATE or replace FUNCTION populate_large() RETURNS bigint AS $$
DECLARE
 id1 bigint := 0;
BEGIN
 LOOP
 insert into large(id) values(id1);
 id1 := id1 +1;
 if id1>10 then
 exit;
 end if;
 END LOOP;
 return id1;
END
$$ LANGUAGE plpgsql;

--Insert 1000 rows into small
CREATE or replace FUNCTION populate_small() RETURNS bigint AS $$
DECLARE
 id1 bigint := 0;
BEGIN
 LOOP
 insert into small(id) values(id1);
 id1 := id1 +1;
 if id1>1000 then
 exit;
 end if;
 END LOOP;
 return id1;
END
$$ LANGUAGE plpgsql;

select populate_large(),populate_small();
analyze;

Then execute

explain analyze insert into large(id) select id from small where id
not in(select id from large);

It gives

"Seq Scan on small  (cost=1934.01..823278.28 rows=500 width=8) (actual
time=6263.588..6263.588 rows=0 loops=1)"
"  Filter: (NOT (SubPlan 1))"
"  SubPlan 1"
"->   Materialize  (cost=1934.01..3325.02 rows=11 width=8)
(actual time=0.007..3.012 rows=501 loops=1001)"
"  ->   Seq Scan on large  (cost=0.00..1443.01 rows=11
width=8) (actual time=0.010..5.810 rows=1001 loops=1)"
"Total runtime: 6263.703 ms"

But

explain analyze insert into large(id) select id from small where not
exists (select id from large l where small.id=l.id);

exeutes much faster:

"Merge Anti Join  (cost=0.00..85.58 rows=1 width=8) (actual
time=15.793..15.793 rows=0 loops=1)"
"  Merge Cond: (small.id = l.id)"
"  ->   Index Scan using small_pkey on small  (cost=0.00..43.27
rows=1001 width=8) (actual time=0.025..3.515 rows=1001 loops=1)"
"  ->   Index Scan using large_pkey on large l  (cost=0.00..3050.28
rows=11 width=8) (actual time=0.017..2.932 rows=1001 loops=1)"
"Total runtime: 15.863 ms"

Both queries are semantically the same.


Out of interest, I ran your code on my existing 9.1beta3 installation.

Notes
(1) the second SELECT ran a faster than the first.
(2) both plans are different to the ones you got

$ psql
psql (9.1beta3)
[...]
gavin=> explain analyze insert into large(id) select id from small where id
gavin-> not in(select id from large);
QUERY PLAN
---
 Insert on large  (cost=1543.01..1559.02 rows=500 width=8) (actual 
time=51.090..51.090 rows=0 loops=1)
   ->  Seq Scan on small  (cost=1543.01..1559.02 rows=500 width=8) 
(actual time=51.087..51.087 rows=0 loops=1)

 Filter: (NOT (hashed SubPlan 1))
 SubPlan 1
   ->  Seq Scan on large  (cost=0.00..1443.01 rows=11 
width=8) (actual time=0.008..13.867 rows=11 loops=1)

 Total runtime: 51.582 ms
(6 rows)

gavin=> explain analyze insert into large(id) select id from small where not
gavin-> exists (select id from large l where small.id=l.id);
QUERY PLAN
---
 Insert on large  (cost=0.00..80.94 rows=1 width=8) (actual 
time=0.907..0.907 rows=0 loops=1)
   ->  Merge Anti Join  (cost=0.00..80.94 rows=1 width=8) (actual 
time=0.906..0.906 rows=0 loops=1)

 Merge Cond: (small.id = l.id)
 ->  Index Scan using small_pkey on small  (cost=0.00..40.61 
rows=1001 width=8) (actual time=0.010..0.225 rows=1001 loops=1)
 ->  Index Scan using large_pkey on large l  
(cost=0.00..2800.12 rows=11 width=8) (actual time=0.006..0.235 
rows=1001 loops=1)

 Total runtime: 1.000 ms
(6 rows)

postgresql.conf parameters changed:
shared_buffers = 2GB
temp_buffers = 64MB
work_mem = 16MB
maintenance_work_mem = 512MB
max_stack_depth = 6MB
checkpoint_segments = 8
cpu_index_tuple_cost = 0.0025
cpu_operator_cost = 0.001
effective_cache_size = 2GB




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert

2011-07-29 Thread alan
I think I figured it out myself.
If anyone sees issues with this (simple) approach, please let me know.

I changed my table definitions to this:

CREATE SEQUENCE public.product_id_seq
CREATE TABLE products (
product_id INTEGER DEFAULT nextval('product_id_seq'::regclass) NOT
NULL,
name VARCHAR(60) NOT NULL,
category SMALLINT  NOT NULL,
CONSTRAINT product_id PRIMARY KEY (product_id)
);
CREATE SEQUENCE public.category_id_seq
CREATE TABLE category (
   category_id INTEGER DEFAULT nextval('category_id_seq'::regclass)
NOT NULL,
   name VARCHAR(20) NOT NULL,
   CONSTRAINT category_id PRIMARY KEY (category_id)
);
ALTER TABLE products ADD CONSTRAINT category_products_fk
FOREIGN KEY (category)
REFERENCES category (category_id)
ON DELETE NO ACTION ON UPDATE CASCADE
;

Then created this function:

CREATE OR REPLACE FUNCTION getid(_table text,_pk text,_name text)
RETURNS integer AS $$
DECLARE _id integer;
BEGIN
  EXECUTE 'SELECT '
|| _pk
|| ' FROM '
|| _table::regclass
|| ' WHERE name'
|| ' = '
|| quote_literal(_name)
   INTO _id;

  IF _id > 0 THEN
return _id;
  ELSE
EXECUTE 'INSERT INTO '
 || _table
 || ' VALUES (DEFAULT,' || quote_literal(_name) || ')'
 || ' RETURNING ' || _pk
INTO _id;
return _id;
  END IF;
END;
$$
 LANGUAGE 'plpgsql' VOLATILE;

Now I can just insert into the products table via:

INSERT INTO products VALUES(DEFAULT,'Postgresql for
Dummies',getid('category','category_id','books'));

For example:

testdb=# select * from products;
 product_id | name | category
+--+--
(0 rows)

iims_test=# select * from category;
 category_id | name
-+--
(0 rows)

testdb=# insert into products values(DEFAULT,'Postgresql for
Dummies',getid('category','category_id','books'));
INSERT 0 1

testdb=# select * from
category;
 category_id | name
-+---
   1 | books

testdb=# select * from products;
 product_id |  name  | category
++--
  1 | Postgresql for Dummies |1

Updating the category_id in category table are also cascaded to the
product table.

testdb=# UPDATE category SET category_id = 2 WHERE category_id = 1;
UPDATE 1

testdb=# SELECT * FROM products;
 product_id |  name  | category
++--
  1 | Postgresql for Dummies |2


Alan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] heavy load-high cpu itilization

2011-07-29 Thread Filippos
Dear all

first of all congratulations on your greak work here since from time to time
i 've found many answers to my problems. unfortunately for this specific
problem i didnt find much relevant information, so i would ask for your
guidance dealing with the following situation:

we have a dedicated server (8.4.4, redhat) with 24 cpus and 36 GB or RAM. i
would say that the traffic in the server is huge and the cpu utilization is
pretty high too (avg ~ 75% except during the nights when is it much lower).
i am trying to tune the server a little bit to handle this problem. the
incoming data in the database are about 30-40 GB /day. 

at first the checkpoint_segments were set to 50, the checkpoint_timeout at
15 min and the checkpoint_completion_target was 0.5 sec.

i noticed that the utilization of the server was higher when it was close to
making a checkpoint and since the parameter of full_page_writes is ON , i
changed the parameters mentioned above to (i did that after reading a lot of
stuff online):
checkpoint_segments->250
checkpoint_timeout->40min
checkpoint_completion_target -> 0.8

but the cpu utilization is not significantly lower. another parameter i will
certainly change is the wal_buffers which is now set at 64KB and i plan to
make it 16MB. can this parameter cause a significant percentage of the
problem?

are there any suggestions what i can do to tune better the server? i can
provide any information you find relevant for the configuration of the
server, the OS, the storage etc

thank you in advance

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/heavy-load-high-cpu-itilization-tp4631760p4631760.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] Restore database after drop command

2011-07-29 Thread Vibhor Kumar

On Jul 25, 2011, at 12:08 PM, Adarsh Sharma wrote:

> I restore globedatabase from a .sql file on yesterday morning.I insert some 
> new data in that database.
> In the evening, by mistake I issued a drop database globedatabase command.
> Today morning, I restore again the same database from backup (.sql) file.
> My .sql file have data till yesterday morning but I want newly insert data 
> now. Is it possible.
> Is it possible to get the data back till the state before drop database 
> command.

No you won't be able to recover. 

If you have Online Backup, then PITR would help you.

Thanks & Regards,
Vibhor Kumar
Blogs: http://vibhork.blogspot.com
http://vibhorkumar.wordpress.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance penalty when using WITH

2011-07-29 Thread Li Jin
Hi guys,

I met with the problem that when I was using WITH clause to reuse a subquery, I 
got a huge performance penalty because of query planner.

Here are the details, the original query is 

EXPLAIN ANALYZE WITH latest_identities AS
(
SELECT DISTINCT ON (memberid) memberid, username, changedate
FROM t_username_history
WHERE memberid IN (SELECT memberid FROM t_member WHERE firstname || ' ' || 
substring(lastname,1,1) = 'Eddie T')
ORDER BY memberid, changedate DESC
)
SELECT t_member.email as email, t_member.username as username, 
t_member.location as location, t_member.locale as locale, t_member.status as 
status, t_member.creationdate as creationdate, t_forum_member.pos\
ts as posts, t_forum_member.expertlocations as expertlocations, 
t_nexus_member.pages_created as pages_created, t_nexus_member.pages_edited as 
pages_edited, t_member_contributions.hotel_reviews as hotel_rev\
iews, t_member_contributions.restaurant_reviews as restaurant_reviews, 
t_member_contributions.attraction_reviews as attraction_reviews, 
t_member_contributions.geo_reviews as geo_reviews, t_member_contribut\
ions.photos as photos, t_member_contributions.videos as videos, 
t_recent_contribution.recent_contribution_date as recent_contribution_date, 
t_recent_contribution.recent_contribution_type as recent_contribu\
tion_type, t_owner_member.memberid as owner_memberid, 
t_member_interaction.flags as interaction_flags, t_media.path as 
ta_avatar_path, t_external_member.externalid as facebookid, 
latest_identities.username\
 as latest_identity
FROM t_member left join t_forum_member on (t_member.memberid = 
t_forum_member.memberid) left join t_nexus_member on (t_member.memberid = 
t_nexus_member.memberid) left join t_member_contributions on (t_memb\
er.memberid = t_member_contributions.memberid) left join t_recent_contribution 
on (t_member.memberid = t_recent_contribution.memberid) left join 
t_owner_member on (t_member.memberid = t_owner_member.member\
id) left join t_member_interaction on (t_member.memberid = 
t_member_interaction.memberid) left join t_media on (t_member.avatar = 
t_media.id) left join t_external_member on (t_member.memberid = t_external_\
member.memberid AND t_external_member.idtype = 'FB') left join 
latest_identities on (t_member.memberid = latest_identities.memberid)
WHERE t_member.firstname || ' ' || substring(t_member.lastname,1,1) = 'Eddie T';

The may seems scary, but what it really does is searching for members with 
certain name and joining with a bunch of other tables on memberid. The 
t_username_history table has multiple rows for a memberid therefore I just get 
the most recent record for each memberid that I am interested in before the 
join.

Here is the link to explain:
http://explain.depesz.com/s/ZKb

Since the red part looks suboptimal to me, I changed it using WITH subquery:

EXPLAIN WITH memberids AS
(
   SELECT memberid FROM t_member WHERE firstname || ' ' || 
substring(lastname,1,1) = 'Eddie T'
),
latest_identities AS
(
SELECT DISTINCT ON (memberid) memberid, username, changedate
FROM t_username_history
WHERE memberid IN (SELECT memberid FROM memberids)
ORDER BY memberid, changedate DESC
)
SELECT t_member.email as email, t_member.username as username, 
t_member.location as location, t_member.locale as locale, t_member.status as 
status, t_member.creationdate as creationdate, t_forum_member.pos\
ts as posts, t_forum_member.expertlocations as expertlocations, 
t_nexus_member.pages_created as pages_created, t_nexus_member.pages_edited as 
pages_edited, t_member_contributions.hotel_reviews as hotel_rev\
iews, t_member_contributions.restaurant_reviews as restaurant_reviews, 
t_member_contributions.attraction_reviews as attraction_reviews, 
t_member_contributions.geo_reviews as geo_reviews, t_member_contribut\
ions.photos as photos, t_member_contributions.videos as videos, 
t_recent_contribution.recent_contribution_date as recent_contribution_date, 
t_recent_contribution.recent_contribution_type as recent_contribu\
tion_type, t_owner_member.memberid as owner_memberid, 
t_member_interaction.flags as interaction_flags, t_media.path as 
ta_avatar_path, t_external_member.externalid as facebookid, 
latest_identities.username\
 as latest_identity
FROM t_member left join t_forum_member on (t_member.memberid = 
t_forum_member.memberid) left join t_nexus_member on (t_member.memberid = 
t_nexus_member.memberid) left join t_member_contributions on (t_memb\
er.memberid = t_member_contributions.memberid) left join t_recent_contribution 
on (t_member.memberid = t_recent_contribution.memberid) left join 
t_owner_member on (t_member.memberid = t_owner_member.member\
id) left join t_member_interaction on (t_member.memberid = 
t_member_interaction.memberid) left join t_media on (t_member.avatar = 
t_media.id) left join t_external_member on (t_member.memberid = t_external_\
member.memberid AND t_external_member.idtype = 'FB') left join 
latest_identities on (t_member.memberid = latest_ide

[PERFORM] insert

2011-07-29 Thread alan
next question.

I have a product table with a 'category" column that I want to
maintain in a separate table.

CREATE TABLE products (
product_id INTEGER DEFAULT
nextval('product_id_seq'::regclass) NOT NULL,
name VARCHAR(60) NOT NULL,
category SMALLINT  NOT NULL,
CONSTRAINT product_id PRIMARY KEY (product_id)
);
CREATE TABLE products (
category_id INTEGER DEFAULT
nextval('category_id_seq'::regclass) NOT NULL,
name VARCHAR(20) NOT NULL,
CONSTRAINT category_id PRIMARY KEY (category_id)
);

Every product must have a category,
Since many (but not all) products have the same category I only want 1
table with unique categories.

To do the insert into the products table I need to retrieve or insert
the category_id in categories first.
Which means more code on my client app (if ($cat_id =
get_cat_id($cat)) }else { $cat_id = insert_cat($cat)})

Can I write a BEFORE ROW trigger for the products table  to runs on
INSERT or UPDATE to
 1. insert a new category & return the new category_id  OR
 2.  return the existing category_id for the (to be inserted row)

Alan
I donproducts.category to be a foreign key that points to the uniqie
category_id id in the want to keep I need to do get the cate

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

2011-07-29 Thread lars hofhansl
Thanks Pavan!


I think the most important points are still that:
1. The WAL write should be happening asynchronously (if that is possible)
2. There should be an option do not perform these compactions if the page is 
only touched by reads.

(Assuming that when most of the databaseresides in the cache these 
optimizations are less important.)


-- Lars


- Original Message -
From: Pavan Deolasee 
To: Merlin Moncure 
Cc: lars ; Kevin Grittner ; 
Ivan Voras ; pgsql-performance@postgresql.org
Sent: Wednesday, July 27, 2011 7:15 AM
Subject: Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

On Wed, Jul 13, 2011 at 10:52 AM, Merlin Moncure  wrote:
...

There are couple of other (very important) things that HOT does, but
probably its not advertised a lot. Even for non-HOT updates (which
means either indexed columns were changed or page ran out of free
space) or deletes, HOT prunes those tuples and instead mark the line
pointer as DEAD. The page is defragmented and dead space is recovered.
Each such dead tuple now only consumes two bytes in the page until
vacuum removes the dead line pointers. Thats the reason why OP is
seeing the behavior even when index columns are being updated.

We made a few adjustments to ensure that a page is not pruned too
early. So we track the oldest XID that did any updates/deletes to the
page and attempt pruning only when the RecentXmin is past the XID. We
also mark the page as "full" if some previous update did not find
enough free space to do in-block update and use that hint to decide if
we should attempt to prune the page. Finally, we prune only if we get
the cleanup lock without blocking.

What might be worth looking at this condition in pruneheap.c:

/*
     * We prune when a previous UPDATE failed to find enough space on the page
     * for a new tuple version, or when free space falls below the relation's
     * fill-factor target (but not less than 10%).
     *
     * Checking free space here is questionable since we aren't holding any
     * lock on the buffer; in the worst case we could get a bogus answer. It's
     * unlikely to be *seriously* wrong, though, since reading either pd_lower
     * or pd_upper is probably atomic.  Avoiding taking a lock seems more
     * important than sometimes getting a wrong answer in what is after all
     * just a heuristic estimate.
     */
    minfree = RelationGetTargetPageFreeSpace(relation,
                                             HEAP_DEFAULT_FILLFACTOR);
    minfree = Max(minfree, BLCKSZ / 10);

    if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
    {


So if the free space in a page falls below the fill-factor or 10% of
the block size, we would try to prune the page. We probably need to
revisit this area and see if we need to tune HOT ever better. One
option could be to see how much space we are going to free and carry
out the operation only if its significant enough to justify the cost.

I know we had done several benchmarking tests while HOT development,
but the tuning mechanism still may not be perfect for all kinds of
work loads and it would probably never be.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] issue related to logging facility of postgres

2011-07-29 Thread shailesh singh
Hi,
I want to configure Logging of postgres in such a way that messages of
different severity should be logged in different log file. eg: all ERROR
message should be written in error-msg.log file while all NOTICE mesage
should be written in notice-msg.log file.

In order to do that what changes should i need to do in configuration file ?
Could you pl give a solution.


-- 
  With Regards,
  Shailesh Singh


Re: [PERFORM] Performance penalty when using WITH

2011-07-29 Thread Tom Lane
Li Jin  writes:
> Anyone knows why the planner is doing this?

WITH is an optimization fence.  This is intentional and documented.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] heavy load-high cpu itilization

2011-07-29 Thread Scott Marlowe
On Mon, Jul 25, 2011 at 12:00 PM, Filippos  wrote:
> Dear all
>
> first of all congratulations on your greak work here since from time to time
> i 've found many answers to my problems. unfortunately for this specific
> problem i didnt find much relevant information, so i would ask for your
> guidance dealing with the following situation:
>
> we have a dedicated server (8.4.4, redhat) with 24 cpus and 36 GB or RAM. i

There are known data eating bugs in 8.4.4 you should upgrade to
8.4.latest as soon as possible.

> would say that the traffic in the server is huge and the cpu utilization is
> pretty high too (avg ~ 75% except during the nights when is it much lower).
> i am trying to tune the server a little bit to handle this problem. the
> incoming data in the database are about 30-40 GB /day.

So you're either CPU or IO bound.  We need to see which.

Look at these two pages:
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
http://wiki.postgresql.org/wiki/SlowQueryQuestions

to get started.

> at first the checkpoint_segments were set to 50, the checkpoint_timeout at
> 15 min and the checkpoint_completion_target was 0.5 sec.

checkpoint_completion_target is not in seconds, it's a percentage to
have completely by the time the next checkpoint arrives.  a checkpoint
completion target of 1.0 means that the bg writer should write out
data fast enough to flush everything out of WAL to the disks right as
you reach checkpoint timeout.  the more aggressive this is the more of
the data will already be flushed to disk when the timeout occurs.
However, this comes at the expense of more IO overall as multiple
updates to the same block result in multiple writes instead of just
one.

> i noticed that the utilization of the server was higher when it was close to
> making a checkpoint and since the parameter of full_page_writes is ON , i
> changed the parameters mentioned above to (i did that after reading a lot of
> stuff online):
> checkpoint_segments->250
> checkpoint_timeout->40min
> checkpoint_completion_target -> 0.8
>
> but the cpu utilization is not significantly lower. another parameter i will
> certainly change is the wal_buffers which is now set at 64KB and i plan to
> make it 16MB. can this parameter cause a significant percentage of the
> problem?

Most of the work done by checkpointing / background writing is IO
intensive, not CPU intensive.

> are there any suggestions what i can do to tune better the server? i can
> provide any information you find relevant for the configuration of the
> server, the OS, the storage etc

First you need to more accurately identify the problem.  Tools like
iostat, vmstat, top, and so forth can help you figure out if the
problem is that you're IO bound or CPU bound.  It's also possible
you've got a thundering herd issue where there's too many processes
all trying to vie for the limited number of cores at the same time.
If you've got more than 30k to 50k context switches per second in
vmstat it's likely you're getting too many things trying to run at
once.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert

2011-07-29 Thread Kevin Grittner
alan  wrote:
 
> Can I write a BEFORE ROW trigger for the products table  to runs
> on INSERT or UPDATE to
>  1. insert a new category & return the new category_id  OR
>  2.  return the existing category_id for the (to be inserted row)
 
What would you be using to match an existing category?  If this
accurately identifies a category, why not use it for the key to the
category table, rather than generating a synthetic key value?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance