Re: [PERFORM] stats collector process high CPU utilization

2007-02-09 Thread Alvaro Herrera
Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> > While exploring this problem, I've noticed that one of the frequent insert
> > processes creates a few temporary tables to do post-processing.  Is it
> > possible that the stats collector is getting bloated with stats from these
> > short-lived temporary tables?  During periods of high activity it could be
> > creating temporary tables as often as two per second.
> 
> Hmmm ... that's an interesting point, but offhand I don't see why it'd
> cause more of a problem in 8.2 than 8.1.  Alvaro, any thoughts?

No idea.  I do have a very crude piece of code to read a pgstat.stat
file and output some info about what it finds (table OIDs basically
IIRC).  Maybe it can be helpful to examine what's in the bloated stat
file.

Regarding temp tables, I'd think that the pgstat entries should be
getting dropped at some point in both releases.  Maybe there's a bug
preventing that in 8.2?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] stats collector process high CPU utilization

2007-02-09 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Regarding temp tables, I'd think that the pgstat entries should be
> getting dropped at some point in both releases.  Maybe there's a bug
> preventing that in 8.2?

Hmmm ... I did rewrite the backend-side code for that just recently for
performance reasons ... could I have broken it?  Anyone want to take a
second look at
http://archives.postgresql.org/pgsql-committers/2007-01/msg00171.php

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] stats collector process high CPU utilization

2007-02-09 Thread Tom Lane
I wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> Regarding temp tables, I'd think that the pgstat entries should be
>> getting dropped at some point in both releases.  Maybe there's a bug
>> preventing that in 8.2?

> Hmmm ... I did rewrite the backend-side code for that just recently for
> performance reasons ... could I have broken it?

I did some testing with HEAD and verified that pgstat_vacuum_tabstat()
still seems to do what it's supposed to, so that theory falls down.

Alvaro, could you send Benjamin your stat-file-dumper tool so we can
get some more info?  Alternatively, if Benjamin wants to send me a copy
of his stats file (off-list), I'd be happy to take a look.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] stats collector process high CPU utilization

2007-02-09 Thread Benjamin Minshall

Tom Lane wrote:

I wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

Regarding temp tables, I'd think that the pgstat entries should be
getting dropped at some point in both releases.  Maybe there's a bug
preventing that in 8.2?



Hmmm ... I did rewrite the backend-side code for that just recently for
performance reasons ... could I have broken it?


I did some testing with HEAD and verified that pgstat_vacuum_tabstat()
still seems to do what it's supposed to, so that theory falls down.

Alvaro, could you send Benjamin your stat-file-dumper tool so we can
get some more info?



Alternatively, if Benjamin wants to send me a copy
of his stats file (off-list), I'd be happy to take a look.

regards, tom lane


When I checked on the server this morning, the huge stats file has 
returned to a normal size.  I set up a script to track CPU usage and 
stats file size, and it appears to have decreased from 90MB down to 
about 2MB over roughly 6 hours last night.  The CPU usage of the stats 
collector also decreased accordingly.


The application logs indicate that there was no variation in the 
workload over this time period, however the file size started to 
decrease soon after the nightly pg_dump backups completed.  Coincidence 
perhaps?


Nonetheless, I would appreciate a copy of Alvaro's stat file tool just 
to see if anything stands out in the collected stats.


Thanks for your help, Tom.

--
Benjamin Minshall <[EMAIL PROTECTED]>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] stats collector process high CPU utilization

2007-02-09 Thread Tom Lane
Benjamin Minshall <[EMAIL PROTECTED]> writes:
> When I checked on the server this morning, the huge stats file has 
> returned to a normal size.  I set up a script to track CPU usage and 
> stats file size, and it appears to have decreased from 90MB down to 
> about 2MB over roughly 6 hours last night.  The CPU usage of the stats 
> collector also decreased accordingly.

> The application logs indicate that there was no variation in the 
> workload over this time period, however the file size started to 
> decrease soon after the nightly pg_dump backups completed.  Coincidence 
> perhaps?

Well, that's pretty interesting.  What are your vacuuming arrangements
for this installation?  Could the drop in file size have coincided with
VACUUM operations?  Because the ultimate backstop against bloated stats
files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to
clean out any entries that shouldn't be there anymore.

It's sounding like what you had was just transient bloat, in which case
it might be useful to inquire whether anything out-of-the-ordinary had
been done to the database right before the excessive-CPU-usage problem
started.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Recreate big table

2007-02-09 Thread Gábriel Ákos

Hi,

We have a quite big table, which is heavily used by our online clients.
The table has several indexes, but no other relation to other tables.
We have an import process, which can fill/update this table.
The full import takes 1 hour, and this is very long.
We are thinking of doing the full import in another table and then just 
"swapping" the two tables.
What will happen to our indexes? What will happen to our current 
transactions (only read) ? What will the user see? :)

Should we recreate the indexes after the swap is done?

Btw is there a good practice doing this kind of work?

Thanks in advance,
Akos Gabriel

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] stats collector process high CPU utilization

2007-02-09 Thread Benjamin Minshall

Tom Lane wrote:

Well, that's pretty interesting.  What are your vacuuming arrangements
for this installation?  Could the drop in file size have coincided with
VACUUM operations?  Because the ultimate backstop against bloated stats
files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to
clean out any entries that shouldn't be there anymore.


VACUUM and ANALYZE are done by autovacuum only, no cron jobs. 
autovacuum_naptime is 30 seconds so it should make it to each database 
every 10 minutes or so.  Do you think that more aggressive vacuuming 
would prevent future swelling of the stats file?



It's sounding like what you had was just transient bloat, in which case
it might be useful to inquire whether anything out-of-the-ordinary had
been done to the database right before the excessive-CPU-usage problem
started.


I don't believe that there was any unusual activity on the server, but I 
have set up some more detailed logging to hopefully identify a pattern 
if the problem resurfaces.


Thanks.

--
Benjamin Minshall <[EMAIL PROTECTED]>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] Recreate big table

2007-02-09 Thread Daniel Cristian Cruz

Gábriel,

You could use table inheritance, like table partitioning is explained in manual:

http://www.postgresql.org/docs/8.2/interactive/ddl-partitioning.html

Kind regards,

Daniel Cristian

On 2/9/07, Gábriel Ákos <[EMAIL PROTECTED]> wrote:

Hi,

We have a quite big table, which is heavily used by our online clients.
The table has several indexes, but no other relation to other tables.
We have an import process, which can fill/update this table.
The full import takes 1 hour, and this is very long.
We are thinking of doing the full import in another table and then just
"swapping" the two tables.
What will happen to our indexes? What will happen to our current
transactions (only read) ? What will the user see? :)
Should we recreate the indexes after the swap is done?

Btw is there a good practice doing this kind of work?

Thanks in advance,
Akos Gabriel

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




--
Daniel Cristian Cruz
Analista de Sistemas
Especialista postgreSQL e Linux
Instrutor Certificado Mandriva

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] cube operations slower than geo_distance() on production server

2007-02-09 Thread Mark Stosberg

With the help of some of this list, I was able to successfully  set up
and benchmark a cube-based replacement for geo_distance() calculations.

On a development box, the cube-based variations benchmarked consistently
running in about 1/3 of the time of the gel_distance() equivalents.

After setting up the same columns and indexes on a production
database, it's a different story. All the cube operations show
themselves to be about the same as, or noticeably slower than, the same
operations done with geo_distance().

I've stared at the EXPLAIN ANALYZE output as much I can to figure what's
gone. Could you help?

Here's the plan on the production server, which seems too slow. Below is the 
plan I get in
on the development server, which is much faster.

I tried "set enable_nestloop = off", which did change the plan, but the 
performance.

The production DB has much more data in it, but I still expected comparable 
results relative
to using geo_distance() calculations.

The production db gets a "VACUUM ANALYZE"  every couple of hours now.

Thanks!

  Mark



 Sort  (cost=6617.03..6617.10 rows=27 width=32) (actual time=2482.915..2487.008 
rows=1375 loops=1)
   Sort Key: (cube_distance($0, zipcodes.earth_coords) / 1609.344::double 
precision)
   InitPlan
 ->  Index Scan using zipcodes_pkey on zipcodes  (cost=0.00..3.01 rows=1 
width=32) (actual time=0.034..0.038 rows=1 loops=1)
   Index Cond: ((zipcode)::text = '90210'::text)
 ->  Index Scan using zipcodes_pkey on zipcodes  (cost=0.00..3.01 rows=1 
width=32) (actual time=0.435..0.438 rows=1 loops=1)
   Index Cond: ((zipcode)::text = '90210'::text)
   ->  Nested Loop  (cost=538.82..6610.36 rows=27 width=32) (actual 
time=44.660..2476.919 rows=1375 loops=1)
 ->  Nested Loop  (cost=2.15..572.14 rows=9 width=36) (actual 
time=4.877..39.037 rows=136 loops=1)
   ->  Bitmap Heap Scan on zipcodes  (cost=2.15..150.05 rows=42 
width=41) (actual time=3.749..4.951 rows=240 loops=1)
 Recheck Cond: (cube_enlarge(($1)::cube, 
16093.4357308298::double precision, 3) @ earth_coords)
 ->  Bitmap Index Scan on zip_earth_coords_idx  
(cost=0.00..2.15 rows=42 width=0) (actual time=3.658..3.658 rows=240 loops=1)
   Index Cond: (cube_enlarge(($1)::cube, 
16093.4357308298::double precision, 3) @ earth_coords)
   ->  Index Scan using shelters_postal_code_for_joining_idx on 
shelters  (cost=0.00..10.02 rows=2 width=12) (actual time=0.079..0.133 rows=1 
loops=240)
 Index Cond: ((shelters.postal_code_for_joining)::text = 
("outer".zipcode)::text)
 ->  Bitmap Heap Scan on pets  (cost=536.67..670.47 rows=34 width=4) 
(actual time=16.844..17.830 rows=10 loops=136)
   Recheck Cond: ((pets.shelter_id = "outer".shelter_id) AND 
((pets.pet_state)::text = 'available'::text))
   Filter: (species_id = 1) Sort  (cost=7004.53..7004.62 rows=39 
width=32) (actual time=54.635..55.450 rows=475 loops=1)
   ->  BitmapAnd  (cost=536.67..536.67 rows=34 width=0) (actual 
time=16.621..16.621 rows=0 loops=136)
 ->  Bitmap Index Scan on pets_shelter_id_idx  
(cost=0.00..3.92 rows=263 width=0) (actual time=0.184..0.184 rows=132 loops=136)
   Index Cond: (pets.shelter_id = "outer".shelter_id)
 ->  Bitmap Index Scan on pets_pet_state_idx  
(cost=0.00..532.50 rows=39571 width=0) (actual time=26.922..26.922 rows=40390 
loops=82)
   Index Cond: ((pet_state)::text = 'available'::text)
 Total runtime: 2492.852 ms


### Faster plan in development:

 Sort  (cost=7004.53..7004.62 rows=39 width=32) (actual time=54.635..55.450 
rows=475 loops=1)
   Sort Key: (cube_distance($0, earth_distance.earth_coords) / 1609.344::double 
precision)
   InitPlan
 ->  Bitmap Heap Scan on earth_distance  (cost=4.74..624.60 rows=212 
width=32) (actual time=0.113..0.115 rows=1 loops=1)
   Recheck Cond: ((zipcode)::text = '90210'::text)
   ->  Bitmap Index Scan on earth_distance_zipcode_idx  
(cost=0.00..4.74 rows=212 width=0) (actual time=0.101..0.101 rows=2 loops=1)
 Index Cond: ((zipcode)::text = '90210'::text)
 ->  Bitmap Heap Scan on earth_distance  (cost=4.74..624.60 rows=212 
width=32) (actual time=0.205..0.208 rows=1 loops=1)
   Recheck Cond: ((zipcode)::text = '90210'::text)
   ->  Bitmap Index Scan on earth_distance_zipcode_idx  
(cost=0.00..4.74 rows=212 width=0) (actual time=0.160..0.160 rows=2 loops=1)
 Index Cond: ((zipcode)::text = '90210'::text)
   ->  Hash Join  (cost=618.67..5754.30 rows=39 width=32) (actual 
time=13.499..52.924 rows=475 loops=1)
 Hash Cond: ("outer".shelter_id = "inner".shelter_id)
 ->  Bitmap Heap Scan on pets  (cost=44.85..5158.42 rows=4298 width=4) 
(actual time=4.278..34.192 rows=3843 loops=1)
   Recheck Cond: ((pet_state)::tex

Re: [PERFORM] Can anyone make this code tighter? Too slow, Please help!

2007-02-09 Thread Mark Stosberg
[EMAIL PROTECTED] wrote:
> I have this function in my C#.NET app that goes out to find the
> business units for each event and returns a string (for my report).
> I'm finding that for larger reports it takes too long and times out.
> 
> Does anyone know how I can speed this process up?  Is this code very
> tight or could it be cleaner?  thanks in advance for your help, this
> is a big issue used on several reports.

Perhaps try "EXPLAIN ANALYZE" on this query, given a valid event ID:

SELECT Distinct Companies.Name
FROM Companies INNER JOIN
  ExpenseAllocations ON Companies.ID =
ExpenseAllocations.BusinessUnitID
WHERE (ExpenseAllocations.EventID =
@EventID)
ORDER BY Companies.Name DESC


###
Do the columns used in the join and WHERE clause have indexes?

It's also possible the optimization needs to happen at a different level. 
Perhaps you are frequently
looking up the same results in a large report, or throughout the day.

If this part doesn't need to be up-to-second fresh, perhaps your application 
could
cache some of the results of this function, instead of repeatedly asking the 
database
to recompute it.

   Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Is there an equivalent for Oracle's user_tables.num_rows

2007-02-09 Thread Virag Saksena
All,
I am looking to automate analyze table in my application.

I have some insert only tables in my application which I need to analyze as 
data grows.
Since the inserts are application controlled, I can choose to run analyze when 
I determine the
data has grown more than x% since last analyze.

However since users can truncate the tables too, I need to be able to tell the 
numbers of rows
in the table as perceived by the optimizer. 

I could not decipher a good way of telling the number of table rows from 
pg_stats/pg_statistics.

Does someone know of a way of telling what the optimizer believes the number of 
rows are ?

The tables in question have multi-column primary keys.

Regards,

Virag

Re: [PERFORM] Is there an equivalent for Oracle's user_tables.num_rows

2007-02-09 Thread Tom Lane
"Virag Saksena" <[EMAIL PROTECTED]> writes:
> Does someone know of a way of telling what the optimizer believes the =
> number of rows are ?

You're looking in the wrong place; see pg_class.relpages and reltuples.

But note that in recent releases neither one is taken as gospel.
Instead the planner uses the current physical table size in place of
relpages, and scales reltuples correspondingly.  So neither steady
growth nor truncation create a need for re-ANALYZE; at least not as long
as the other statistics don't change too much.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Is there an equivalent for Oracle's user_tables.num_rows

2007-02-09 Thread Virag Saksena

Thanks, that is exactly what I was looking for

I know that number of rows may not be the best indicator, but it is a 
heuristic that can be tracked
easily, causing analyze for the first x insert events, and then only doing 
it only when an insert event causes

total rows to exceed y % of the optimizer perceived rows

Other more accurate heuristics like relative distribution of columns would 
be harder to track in the application,

and I'd rather let the database do that by issuing the analyze

Regards,

Virag

- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>

To: "Virag Saksena" <[EMAIL PROTECTED]>
Cc: 
Sent: Friday, February 09, 2007 4:45 PM
Subject: Re: [PERFORM] Is there an equivalent for Oracle's 
user_tables.num_rows




"Virag Saksena" <[EMAIL PROTECTED]> writes:

Does someone know of a way of telling what the optimizer believes the =
number of rows are ?


You're looking in the wrong place; see pg_class.relpages and reltuples.

But note that in recent releases neither one is taken as gospel.
Instead the planner uses the current physical table size in place of
relpages, and scales reltuples correspondingly.  So neither steady
growth nor truncation create a need for re-ANALYZE; at least not as long
as the other statistics don't change too much.

regards, tom lane




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Is there an equivalent for Oracle'suser_tables.num_rows

2007-02-09 Thread Simon Riggs
On Fri, 2007-02-09 at 19:45 -0500, Tom Lane wrote:
> "Virag Saksena" <[EMAIL PROTECTED]> writes:
> > Does someone know of a way of telling what the optimizer believes the =
> > number of rows are ?
> 
> You're looking in the wrong place; see pg_class.relpages and reltuples.
> 
> But note that in recent releases neither one is taken as gospel.
> Instead the planner uses the current physical table size in place of
> relpages, and scales reltuples correspondingly.  So neither steady
> growth nor truncation create a need for re-ANALYZE; at least not as long
> as the other statistics don't change too much.

That does work very well for Production systems, but not for
Development.

In 8.4, I'll be looking for a way to export Production system stats to a
Dev server that *acts* as if it really had 10^lots rows in it. That will
also help us support the optimiser when it is acting in extreme
conditions that are not sensibly reproducible in reality by hackers. It
will also provide us with what-if capability for system expansion.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] cube operations slower than geo_distance() on production server

2007-02-09 Thread Merlin Moncure

On 2/10/07, Mark Stosberg <[EMAIL PROTECTED]> wrote:


With the help of some of this list, I was able to successfully  set up
and benchmark a cube-based replacement for geo_distance() calculations.

On a development box, the cube-based variations benchmarked consistently
running in about 1/3 of the time of the gel_distance() equivalents.

After setting up the same columns and indexes on a production
database, it's a different story. All the cube operations show
themselves to be about the same as, or noticeably slower than, the same
operations done with geo_distance().

I've stared at the EXPLAIN ANALYZE output as much I can to figure what's
gone. Could you help?

Here's the plan on the production server, which seems too slow. Below is the 
plan I get in
on the development server, which is much faster.

I tried "set enable_nestloop = off", which did change the plan, but the 
performance.

The production DB has much more data in it, but I still expected comparable 
results relative
to using geo_distance() calculations.


any objection to posting the query (any maybe tables, keys, indexes, etc)?

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings