Re: [PERFORM] stats collector process high CPU utilization
Tom Lane wrote: > [EMAIL PROTECTED] writes: > > While exploring this problem, I've noticed that one of the frequent insert > > processes creates a few temporary tables to do post-processing. Is it > > possible that the stats collector is getting bloated with stats from these > > short-lived temporary tables? During periods of high activity it could be > > creating temporary tables as often as two per second. > > Hmmm ... that's an interesting point, but offhand I don't see why it'd > cause more of a problem in 8.2 than 8.1. Alvaro, any thoughts? No idea. I do have a very crude piece of code to read a pgstat.stat file and output some info about what it finds (table OIDs basically IIRC). Maybe it can be helpful to examine what's in the bloated stat file. Regarding temp tables, I'd think that the pgstat entries should be getting dropped at some point in both releases. Maybe there's a bug preventing that in 8.2? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] stats collector process high CPU utilization
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Regarding temp tables, I'd think that the pgstat entries should be > getting dropped at some point in both releases. Maybe there's a bug > preventing that in 8.2? Hmmm ... I did rewrite the backend-side code for that just recently for performance reasons ... could I have broken it? Anyone want to take a second look at http://archives.postgresql.org/pgsql-committers/2007-01/msg00171.php regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] stats collector process high CPU utilization
I wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: >> Regarding temp tables, I'd think that the pgstat entries should be >> getting dropped at some point in both releases. Maybe there's a bug >> preventing that in 8.2? > Hmmm ... I did rewrite the backend-side code for that just recently for > performance reasons ... could I have broken it? I did some testing with HEAD and verified that pgstat_vacuum_tabstat() still seems to do what it's supposed to, so that theory falls down. Alvaro, could you send Benjamin your stat-file-dumper tool so we can get some more info? Alternatively, if Benjamin wants to send me a copy of his stats file (off-list), I'd be happy to take a look. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] stats collector process high CPU utilization
Tom Lane wrote: I wrote: Alvaro Herrera <[EMAIL PROTECTED]> writes: Regarding temp tables, I'd think that the pgstat entries should be getting dropped at some point in both releases. Maybe there's a bug preventing that in 8.2? Hmmm ... I did rewrite the backend-side code for that just recently for performance reasons ... could I have broken it? I did some testing with HEAD and verified that pgstat_vacuum_tabstat() still seems to do what it's supposed to, so that theory falls down. Alvaro, could you send Benjamin your stat-file-dumper tool so we can get some more info? Alternatively, if Benjamin wants to send me a copy of his stats file (off-list), I'd be happy to take a look. regards, tom lane When I checked on the server this morning, the huge stats file has returned to a normal size. I set up a script to track CPU usage and stats file size, and it appears to have decreased from 90MB down to about 2MB over roughly 6 hours last night. The CPU usage of the stats collector also decreased accordingly. The application logs indicate that there was no variation in the workload over this time period, however the file size started to decrease soon after the nightly pg_dump backups completed. Coincidence perhaps? Nonetheless, I would appreciate a copy of Alvaro's stat file tool just to see if anything stands out in the collected stats. Thanks for your help, Tom. -- Benjamin Minshall <[EMAIL PROTECTED]> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz smime.p7s Description: S/MIME Cryptographic Signature
Re: [PERFORM] stats collector process high CPU utilization
Benjamin Minshall <[EMAIL PROTECTED]> writes: > When I checked on the server this morning, the huge stats file has > returned to a normal size. I set up a script to track CPU usage and > stats file size, and it appears to have decreased from 90MB down to > about 2MB over roughly 6 hours last night. The CPU usage of the stats > collector also decreased accordingly. > The application logs indicate that there was no variation in the > workload over this time period, however the file size started to > decrease soon after the nightly pg_dump backups completed. Coincidence > perhaps? Well, that's pretty interesting. What are your vacuuming arrangements for this installation? Could the drop in file size have coincided with VACUUM operations? Because the ultimate backstop against bloated stats files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to clean out any entries that shouldn't be there anymore. It's sounding like what you had was just transient bloat, in which case it might be useful to inquire whether anything out-of-the-ordinary had been done to the database right before the excessive-CPU-usage problem started. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Recreate big table
Hi, We have a quite big table, which is heavily used by our online clients. The table has several indexes, but no other relation to other tables. We have an import process, which can fill/update this table. The full import takes 1 hour, and this is very long. We are thinking of doing the full import in another table and then just "swapping" the two tables. What will happen to our indexes? What will happen to our current transactions (only read) ? What will the user see? :) Should we recreate the indexes after the swap is done? Btw is there a good practice doing this kind of work? Thanks in advance, Akos Gabriel ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] stats collector process high CPU utilization
Tom Lane wrote: Well, that's pretty interesting. What are your vacuuming arrangements for this installation? Could the drop in file size have coincided with VACUUM operations? Because the ultimate backstop against bloated stats files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to clean out any entries that shouldn't be there anymore. VACUUM and ANALYZE are done by autovacuum only, no cron jobs. autovacuum_naptime is 30 seconds so it should make it to each database every 10 minutes or so. Do you think that more aggressive vacuuming would prevent future swelling of the stats file? It's sounding like what you had was just transient bloat, in which case it might be useful to inquire whether anything out-of-the-ordinary had been done to the database right before the excessive-CPU-usage problem started. I don't believe that there was any unusual activity on the server, but I have set up some more detailed logging to hopefully identify a pattern if the problem resurfaces. Thanks. -- Benjamin Minshall <[EMAIL PROTECTED]> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz smime.p7s Description: S/MIME Cryptographic Signature
Re: [PERFORM] Recreate big table
Gábriel, You could use table inheritance, like table partitioning is explained in manual: http://www.postgresql.org/docs/8.2/interactive/ddl-partitioning.html Kind regards, Daniel Cristian On 2/9/07, Gábriel Ákos <[EMAIL PROTECTED]> wrote: Hi, We have a quite big table, which is heavily used by our online clients. The table has several indexes, but no other relation to other tables. We have an import process, which can fill/update this table. The full import takes 1 hour, and this is very long. We are thinking of doing the full import in another table and then just "swapping" the two tables. What will happen to our indexes? What will happen to our current transactions (only read) ? What will the user see? :) Should we recreate the indexes after the swap is done? Btw is there a good practice doing this kind of work? Thanks in advance, Akos Gabriel ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Daniel Cristian Cruz Analista de Sistemas Especialista postgreSQL e Linux Instrutor Certificado Mandriva ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] cube operations slower than geo_distance() on production server
With the help of some of this list, I was able to successfully set up and benchmark a cube-based replacement for geo_distance() calculations. On a development box, the cube-based variations benchmarked consistently running in about 1/3 of the time of the gel_distance() equivalents. After setting up the same columns and indexes on a production database, it's a different story. All the cube operations show themselves to be about the same as, or noticeably slower than, the same operations done with geo_distance(). I've stared at the EXPLAIN ANALYZE output as much I can to figure what's gone. Could you help? Here's the plan on the production server, which seems too slow. Below is the plan I get in on the development server, which is much faster. I tried "set enable_nestloop = off", which did change the plan, but the performance. The production DB has much more data in it, but I still expected comparable results relative to using geo_distance() calculations. The production db gets a "VACUUM ANALYZE" every couple of hours now. Thanks! Mark Sort (cost=6617.03..6617.10 rows=27 width=32) (actual time=2482.915..2487.008 rows=1375 loops=1) Sort Key: (cube_distance($0, zipcodes.earth_coords) / 1609.344::double precision) InitPlan -> Index Scan using zipcodes_pkey on zipcodes (cost=0.00..3.01 rows=1 width=32) (actual time=0.034..0.038 rows=1 loops=1) Index Cond: ((zipcode)::text = '90210'::text) -> Index Scan using zipcodes_pkey on zipcodes (cost=0.00..3.01 rows=1 width=32) (actual time=0.435..0.438 rows=1 loops=1) Index Cond: ((zipcode)::text = '90210'::text) -> Nested Loop (cost=538.82..6610.36 rows=27 width=32) (actual time=44.660..2476.919 rows=1375 loops=1) -> Nested Loop (cost=2.15..572.14 rows=9 width=36) (actual time=4.877..39.037 rows=136 loops=1) -> Bitmap Heap Scan on zipcodes (cost=2.15..150.05 rows=42 width=41) (actual time=3.749..4.951 rows=240 loops=1) Recheck Cond: (cube_enlarge(($1)::cube, 16093.4357308298::double precision, 3) @ earth_coords) -> Bitmap Index Scan on zip_earth_coords_idx (cost=0.00..2.15 rows=42 width=0) (actual time=3.658..3.658 rows=240 loops=1) Index Cond: (cube_enlarge(($1)::cube, 16093.4357308298::double precision, 3) @ earth_coords) -> Index Scan using shelters_postal_code_for_joining_idx on shelters (cost=0.00..10.02 rows=2 width=12) (actual time=0.079..0.133 rows=1 loops=240) Index Cond: ((shelters.postal_code_for_joining)::text = ("outer".zipcode)::text) -> Bitmap Heap Scan on pets (cost=536.67..670.47 rows=34 width=4) (actual time=16.844..17.830 rows=10 loops=136) Recheck Cond: ((pets.shelter_id = "outer".shelter_id) AND ((pets.pet_state)::text = 'available'::text)) Filter: (species_id = 1) Sort (cost=7004.53..7004.62 rows=39 width=32) (actual time=54.635..55.450 rows=475 loops=1) -> BitmapAnd (cost=536.67..536.67 rows=34 width=0) (actual time=16.621..16.621 rows=0 loops=136) -> Bitmap Index Scan on pets_shelter_id_idx (cost=0.00..3.92 rows=263 width=0) (actual time=0.184..0.184 rows=132 loops=136) Index Cond: (pets.shelter_id = "outer".shelter_id) -> Bitmap Index Scan on pets_pet_state_idx (cost=0.00..532.50 rows=39571 width=0) (actual time=26.922..26.922 rows=40390 loops=82) Index Cond: ((pet_state)::text = 'available'::text) Total runtime: 2492.852 ms ### Faster plan in development: Sort (cost=7004.53..7004.62 rows=39 width=32) (actual time=54.635..55.450 rows=475 loops=1) Sort Key: (cube_distance($0, earth_distance.earth_coords) / 1609.344::double precision) InitPlan -> Bitmap Heap Scan on earth_distance (cost=4.74..624.60 rows=212 width=32) (actual time=0.113..0.115 rows=1 loops=1) Recheck Cond: ((zipcode)::text = '90210'::text) -> Bitmap Index Scan on earth_distance_zipcode_idx (cost=0.00..4.74 rows=212 width=0) (actual time=0.101..0.101 rows=2 loops=1) Index Cond: ((zipcode)::text = '90210'::text) -> Bitmap Heap Scan on earth_distance (cost=4.74..624.60 rows=212 width=32) (actual time=0.205..0.208 rows=1 loops=1) Recheck Cond: ((zipcode)::text = '90210'::text) -> Bitmap Index Scan on earth_distance_zipcode_idx (cost=0.00..4.74 rows=212 width=0) (actual time=0.160..0.160 rows=2 loops=1) Index Cond: ((zipcode)::text = '90210'::text) -> Hash Join (cost=618.67..5754.30 rows=39 width=32) (actual time=13.499..52.924 rows=475 loops=1) Hash Cond: ("outer".shelter_id = "inner".shelter_id) -> Bitmap Heap Scan on pets (cost=44.85..5158.42 rows=4298 width=4) (actual time=4.278..34.192 rows=3843 loops=1) Recheck Cond: ((pet_state)::tex
Re: [PERFORM] Can anyone make this code tighter? Too slow, Please help!
[EMAIL PROTECTED] wrote: > I have this function in my C#.NET app that goes out to find the > business units for each event and returns a string (for my report). > I'm finding that for larger reports it takes too long and times out. > > Does anyone know how I can speed this process up? Is this code very > tight or could it be cleaner? thanks in advance for your help, this > is a big issue used on several reports. Perhaps try "EXPLAIN ANALYZE" on this query, given a valid event ID: SELECT Distinct Companies.Name FROM Companies INNER JOIN ExpenseAllocations ON Companies.ID = ExpenseAllocations.BusinessUnitID WHERE (ExpenseAllocations.EventID = @EventID) ORDER BY Companies.Name DESC ### Do the columns used in the join and WHERE clause have indexes? It's also possible the optimization needs to happen at a different level. Perhaps you are frequently looking up the same results in a large report, or throughout the day. If this part doesn't need to be up-to-second fresh, perhaps your application could cache some of the results of this function, instead of repeatedly asking the database to recompute it. Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Is there an equivalent for Oracle's user_tables.num_rows
All, I am looking to automate analyze table in my application. I have some insert only tables in my application which I need to analyze as data grows. Since the inserts are application controlled, I can choose to run analyze when I determine the data has grown more than x% since last analyze. However since users can truncate the tables too, I need to be able to tell the numbers of rows in the table as perceived by the optimizer. I could not decipher a good way of telling the number of table rows from pg_stats/pg_statistics. Does someone know of a way of telling what the optimizer believes the number of rows are ? The tables in question have multi-column primary keys. Regards, Virag
Re: [PERFORM] Is there an equivalent for Oracle's user_tables.num_rows
"Virag Saksena" <[EMAIL PROTECTED]> writes: > Does someone know of a way of telling what the optimizer believes the = > number of rows are ? You're looking in the wrong place; see pg_class.relpages and reltuples. But note that in recent releases neither one is taken as gospel. Instead the planner uses the current physical table size in place of relpages, and scales reltuples correspondingly. So neither steady growth nor truncation create a need for re-ANALYZE; at least not as long as the other statistics don't change too much. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Is there an equivalent for Oracle's user_tables.num_rows
Thanks, that is exactly what I was looking for I know that number of rows may not be the best indicator, but it is a heuristic that can be tracked easily, causing analyze for the first x insert events, and then only doing it only when an insert event causes total rows to exceed y % of the optimizer perceived rows Other more accurate heuristics like relative distribution of columns would be harder to track in the application, and I'd rather let the database do that by issuing the analyze Regards, Virag - Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> To: "Virag Saksena" <[EMAIL PROTECTED]> Cc: Sent: Friday, February 09, 2007 4:45 PM Subject: Re: [PERFORM] Is there an equivalent for Oracle's user_tables.num_rows "Virag Saksena" <[EMAIL PROTECTED]> writes: Does someone know of a way of telling what the optimizer believes the = number of rows are ? You're looking in the wrong place; see pg_class.relpages and reltuples. But note that in recent releases neither one is taken as gospel. Instead the planner uses the current physical table size in place of relpages, and scales reltuples correspondingly. So neither steady growth nor truncation create a need for re-ANALYZE; at least not as long as the other statistics don't change too much. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Is there an equivalent for Oracle'suser_tables.num_rows
On Fri, 2007-02-09 at 19:45 -0500, Tom Lane wrote: > "Virag Saksena" <[EMAIL PROTECTED]> writes: > > Does someone know of a way of telling what the optimizer believes the = > > number of rows are ? > > You're looking in the wrong place; see pg_class.relpages and reltuples. > > But note that in recent releases neither one is taken as gospel. > Instead the planner uses the current physical table size in place of > relpages, and scales reltuples correspondingly. So neither steady > growth nor truncation create a need for re-ANALYZE; at least not as long > as the other statistics don't change too much. That does work very well for Production systems, but not for Development. In 8.4, I'll be looking for a way to export Production system stats to a Dev server that *acts* as if it really had 10^lots rows in it. That will also help us support the optimiser when it is acting in extreme conditions that are not sensibly reproducible in reality by hackers. It will also provide us with what-if capability for system expansion. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] cube operations slower than geo_distance() on production server
On 2/10/07, Mark Stosberg <[EMAIL PROTECTED]> wrote: With the help of some of this list, I was able to successfully set up and benchmark a cube-based replacement for geo_distance() calculations. On a development box, the cube-based variations benchmarked consistently running in about 1/3 of the time of the gel_distance() equivalents. After setting up the same columns and indexes on a production database, it's a different story. All the cube operations show themselves to be about the same as, or noticeably slower than, the same operations done with geo_distance(). I've stared at the EXPLAIN ANALYZE output as much I can to figure what's gone. Could you help? Here's the plan on the production server, which seems too slow. Below is the plan I get in on the development server, which is much faster. I tried "set enable_nestloop = off", which did change the plan, but the performance. The production DB has much more data in it, but I still expected comparable results relative to using geo_distance() calculations. any objection to posting the query (any maybe tables, keys, indexes, etc)? merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings