[PERFORM] Architecting a database
I am in the process of moving a system that has been built around FoxPro tables for the last 18 years into a PostgreSQL based system. Over time I came up with decent strategies for making the FoxPro tables work well with the workload that was placed on them, but we are getting to the point that the locking mechanisms are causing problems when some of the more used tables are being written to. With the FoxPro tables I had one directory that contained the tables that had global data that was common to all clients. Things like documents that had been received and logged, checks that had been cut, etc. Then each client had his own directory which housed tables that had information relating to that specific client. Setting things up like this kept me from having any tables that were too terribly large so record addition and index creation were not very time consuming. I am wondering how I should architect this in PostgreSQL. Should I follow a similar strategy and have a separate database for each client and one database that contains the global data? With the dBase and ISAM tables I have a good idea of how to handle them since I have been working with them since dBASE originally came out. With the PostgreSQL type tables I am not so certain how the data is arranged within the one file. Does having the data all in one database allow PostgreSQL to better utilize indexes and caches or does having a number of smaller databases provide performance increases? In case it is important, there are 2000 clients involved, so that would be 2000 databases if I followed my current FoxPro related structure. Of course, I suppose it is always possible to combine a number of groups into a database if the number of databases is an issue. Tables within the client specific databases are generally name and address information as well as tables for 10 different types of accounts which require different structures and those tables hold anywhere from 10,000 transactions a piece for some smaller groups and 1 million for larger groups. I believe we have read to write ratio of about 1 to 15. Thanks for any input. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Architecting a database
Thanks for all of the input everyone. I believe I am going to put together a test case using schemas and partitioning and then doubling the amount of data currently in the system to give me an idea of how things will be performing a couple of years down the road. I was looking at a server using the new Opteron 6100 series for the new server and it would have 32 cores, but the speed is 2ghz. I read a post earlier today that mentioned in passing that it was better to have a faster processor than more cores. I was wondering whether or not this would be a good selection since there are CPUs in the Intel branch that are quad core up to 3.3ghz. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Monitoring Postgresql performance
On 9/28/05, Matthew Nuzum <[EMAIL PROTECTED]> wrote: > On 9/28/05, Arnau <[EMAIL PROTECTED]> wrote: > > Hi all, > > > >I have been "googling" a bit searching info about a way to monitor > > postgresql (CPU & Memory, num processes, ... ) and I haven't found > > anything relevant. I'm using munin to monitor others parameters of my > > servers and I'd like to include postgresql or have a similar tool. Any > > of you is using anything like that? all kind of hints are welcome :-) We are also using cricket + nagios. On each DB server: Setup snmpd and use snmpd.conf to set disk quotas and mark processes that need to be running (like postmaster,syslog,sshd) On the monitoring server(s): Use cricket for long term trends & graphs. Use nagios for current status and alerting and some trending. (Nagios has plugins over SNMP for load,cpu,memory,disk and processes) Here's the nagios plugins I have hacked up over the past few months and what they do. I'd imagine some could use better names. I can provide these of package them up if anyone is interested. check_pgconn.pl - Shows percentage of connections available. It uses "SELECT COUNT(*) FROM pg_stat_activity" / "SHOW max_connections". It can also alert when less than a certain number of connections are available. check_pgqueries.pl - If you have query logging enabled this summarizes the types of queries running (SELECT ,INSERT ,DELETE ,UPDATE ,ALTER ,CREATE ,TRUNCATE, VACUUM, COPY) and warns if any queries have been running longer than 5 minutes (configurable). check_pglocks.pl - Look for locks that block and for baselining lock activity. check_pgtime.pl - Makes sure that postgresql's time is in sync with the monitoring server. check_pgqueries.pl - Whines if any queries are in the "waiting" state. The script that runs on each DB server does "ps auxww | grep postgres | grep -i "[W]aiting"" and exposes that through SNMP using the exec functionality. Nagios then alerts if queries are being blocked. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] oracle to psql migration - slow query in postgres
We are in the process of testing migration of our oracle data warehouse over to postgres. A potential showstopper are full table scans on our members table. We can't function on postgres effectively unless index scans are employed. I'm thinking I don't have something set correctly in my postgresql.conf file, but I'm not sure what. This table has approximately 300million rows. Version: SELECT version(); version -- PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; QUERY PLAN -- Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) Total runtime: 850306.220 ms (3 rows) show shared_buffers ; shared_buffers 7680MB (1 row) show effective_cache_size ; effective_cache_size -- 22GB (1 row) show work_mem ; work_mem -- 768MB (1 row) show enable_seqscan ; enable_seqscan on (1 row) Below are the data definitions for the table/indexes in question: \d members Table "members" Column|Type | Modifiers -+-+--- memberid| numeric | not null firstname | character varying(50) | lastname| character varying(50) | emailaddress| character varying(50) | password| character varying(50) | address1| character varying(50) | address2| character varying(50) | city| character varying(50) | statecode | character varying(50) | zipcode | character varying(50) | birthdate | date| emailok | numeric(2,0)| gender | character varying(1)| addeddate | timestamp without time zone | emailbounced| numeric(2,0)| changedate | timestamp without time zone | optoutsource| character varying(100) | websiteid | numeric | promotionid | numeric | sourceid| numeric | siteid | character varying(64) | srcwebsiteid| numeric | homephone | character varying(20) | homeareacode| character varying(10) | campaignid | numeric | srcmemberid | numeric | optoutdate | date| regcomplete | numeric(1,0)| regcompletesourceid | numeric | ipaddress | character varying(25) | pageid | numeric | streetaddressstatus | numeric(1,0)| middlename | character varying(50) | optinprechecked | numeric(1,0)| optinposition | numeric | homephonestatus | numeric | addeddate_id| numeric | changedate_id | numeric | rpmindex| numeric | optmode | numeric(1,0)| countryid | numeric | confirmoptin| numeric(2,0)| bouncedate | date| memberageid | numeric | sourceid2 | numeric | remoteuserid| character varying(50) | goal| numeric(1,0)| flowdepth | numeric | pagetype| numeric | savepassword| character varying(50) | customerprofileid | numeric | Indexes: "email_website_unq" UNIQUE, btree (emailaddress, websiteid), tablespace "members_idx" "member_addeddateid_idx" btree (addeddate_id), tablespace "members_idx" "member_changedateid_idx"
Re: [PERFORM] oracle to psql migration - slow query in postgres
The recommendations on the numeric columns are fantastic. Thank you very much. We will revisit our methods of assigning datatypes when we migrate our data over from Oracle. Regarding the full table scans; it appears inevitable that full table scans are necessary for the volume of data involved and the present design of our indexes. Over time, indexes were added/removed to satisfy particular functionality. Considering this is our most important table, I will research exactly how this table is queried to better optimize/reorganize our indexes. Thanks for your help. Tony On Thu, 2010-10-14 at 23:59 -0400, Mladen Gogala wrote: > On 10/14/2010 4:10 PM, Jon Nelson wrote: > > The first thing I'd do is think real hard about whether you really > > really want 'numeric' instead of boolean, smallint, or integer. The > > second thing is that none of your indices (which specify a whole bunch > > of fields, by the way) have only just emailok, emailbounced, or only > > the pair of them. Without knowing the needs of your app, I would > > reconsider your index choices and go with fewer columns per index. > > > Also, make sure that the statistics is good, that histograms are large > enough and that Geico (the genetic query optimizer) will really work > hard to save you 15% or more on the query execution time. You can also > make sure that any index existing index is used, by disabling the > sequential scan and then activating and de-activating indexes with the > dummy expressions, just as it was done with Oracle's rule based optimizer. > I agree that a good data model is even more crucial for Postgres than is > the case with Oracle. Oracle, because of its rich assortment of tweaking > & hacking tools and parameters, can be made to perform, even if the > model is designed by someone who didn't apply the rules of good design. > Postgres is much more susceptible to bad models and it is much harder to > work around a badly designed model in Postgres than in Oracle. What > people do not understand is that every application in the world will > become badly designed after years of maintenance, adding columns, > creating additional indexes, views, tables and triggers and than > deploying various tools to design applications. As noted by Murphy, > things develop from bad to worse. Keep Postgres models simple and > separated, because it's much easier to keep clearly defined models > simple and effective than to keep models with 700 tables and 350 views, > frequently with conflicting names, different columns named the same and > same columns named differently. And monitor, monitor, monitor. Use > strace, ltrace, pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, > iostat and all tools you can get hold of. Without the event interface, > it's frequently a guessing game. It is, however, possible to manage > things. If working with partitioning, be very aware that PostgreSQL > optimizer has certain problems with partitions, especially with group > functions. If you want speed, everything must be prefixed with > partitioning column: indexes, expressions, joins. There is no explicit > star schema and creating hash indexes will not buy you much, as a matter > of fact, Postgres community is extremely suspicious of the hash indexes > and I don't see them widely used. > Having said that, I was able to solve the problems with my speed and > partitioning. > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
Thanks for all your responses. What's interesting is that an index is used when this query is executed in Oracle. It appears to do some parallel processing: SQL> set line 200 delete from plan_table; explain plan for select websiteid, emailaddress from members where emailok = 1 and emailbounced = 0; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); SQL> 3 rows deleted. SQL> 2345 Explained. SQL> SQL> PLAN_TABLE_OUTPUT Plan hash value: 4247959398 --- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | --- | 0 | SELECT STATEMENT|| 237M| 7248M| 469K (2)| 01:49:33 || || | 1 | PX COORDINATOR || | || || || | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | | 3 |PX BLOCK ITERATOR|| 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || --- PLAN_TABLE_OUTPUT Predicate Information (identified by operation id): --- 4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) 16 rows selected. On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote: > > > -Original Message- > > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] > > Sent: Thursday, October 14, 2010 3:43 PM > > To: pgsql-performance@postgresql.org > > Subject: oracle to psql migration - slow query in postgres > > > > We are in the process of testing migration of our oracle data > > warehouse over to postgres. A potential showstopper are full > > table scans on our members table. We can't function on > > postgres effectively unless index scans are employed. I'm > > thinking I don't have something set correctly in my > > postgresql.conf file, but I'm not sure what. > > > > This table has approximately 300million rows. > > > > Version: > > SELECT version(); > > > > version > > -- > > > > PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC > > gcc (GCC) > > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit > > > > We have 4 quad-core processors and 32GB of RAM. The below > > query uses the members_sorted_idx_001 index in oracle, but in > > postgres, the optimizer chooses a sequential scan. > > > > explain analyze create table tmp_srcmem_emws1 as select > > emailaddress, websiteid > > from members > > where emailok = 1 > >and emailbounced = 0; > > QUERY > > PLAN > > -- > > > > Seq Scan on members (cost=0.00..14137154.64 rows=238177981 > > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) > >Filter: ((emailok = 1::numeric) AND (emailbounced = > > 0::numeric)) Total runtime: 850306.220 ms > > (3 rows) > > > > show shared_buffers ; > > shared_buffers > > > > 7680MB > > (1 row) > > > > show effective_cache_size ; > > effective_cache_size > > -- > > 22GB > > (1 row) > > > > show work_mem ; > > work_mem > > -- > > 768MB > > (1 row) > > > > show enable_seqscan ; > > enable_seqscan
Re: [PERFORM] oracle to psql migration - slow query in postgres
Very true Igor! Free is my favorite price. I'll figure a way around this issue. Thanks for your help. Tony On Fri, 2010-10-15 at 14:54 -0400, Igor Neyman wrote: > > -Original Message- > > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] > > Sent: Friday, October 15, 2010 2:14 PM > > To: pgsql-performance@postgresql.org > > Subject: Re: oracle to psql migration - slow query in postgres > > > > Thanks for all your responses. What's interesting is that an > > index is used when this query is executed in Oracle. It > > appears to do some parallel processing: > > > > SQL> set line 200 > > delete from plan_table; > > explain plan for > > select websiteid, emailaddress > > from members > > where emailok = 1 > >and emailbounced = 0; > > > > SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); > > SQL> > > 3 rows deleted. > > > > SQL> 2345 > > Explained. > > > > SQL> SQL> > > PLAN_TABLE_OUTPUT > > -- > > -- > > -- > > -- > > Plan hash value: 4247959398 > > > > -- > > - > > | Id | Operation | Name | > > Rows | Bytes > > | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | > > -- > > - > > | 0 | SELECT STATEMENT|| 237M| > > 7248M| 469K (2)| 01:49:33 || || > > | 1 | PX COORDINATOR || | > > || || || > > | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | > > | 3 |PX BLOCK ITERATOR|| 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || > > |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || > > -- > > - > > > > PLAN_TABLE_OUTPUT > > -- > > -- > > -- > > -- > > > > Predicate Information (identified by operation id): > > --- > > > >4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) > > > > 16 rows selected. > > > > > > 1. Postgres doesn't have "FAST FULL SCAN" because even if all the info > is in the index, it need to visit the row in the table ("visibility" > issue). > > 2. Postgres doesn't have parallel executions. > > BUT, it's free anf has greate community support, as you already saw. > > Regards, > Igor Neyman > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] postgresql transaction id monitoring with nagios
Ever since I started working with PostgreSQL I've heard the need to watch transaction IDs. The phrase "transaction ID wraparound" still gives me a shiver. Attached it a short script that works with the monitoring system Nagios to keep an eye on transaction IDs. It should be easy to adapt to any other monitoring system. It runs the textbook query below and reports how close you are to wraparound. SELECT datname, age(datfrozenxid) FROM pg_database; The script detects a wrap at 2 billion. It starts warning once one or more databases show an age over 1 billion transactions. It reports critical at 1.5B transactions. I hope everyone out there is vacuuming *all* databases often. Hope some of you can use this script! Tony Wasson check_pg_transactionids.pl Description: Perl program ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] postgresql transaction id monitoring with nagios
On 5/2/06, Vivek Khera <[EMAIL PROTECTED]> wrote: On May 2, 2006, at 2:26 PM, Tony Wasson wrote: > The script detects a wrap at 2 billion. It starts warning once one or > more databases show an age over 1 billion transactions. It reports > critical at 1.5B transactions. I hope everyone out there is vacuuming > *all* databases often. Something seems wrong... I just ran your script against my development database server which is vacuumed daily and it said I was 53% of the way to 2B. Seemed strange to me, so I re-ran "vacuum -a - z" to vacuum all databases (as superuser), reran the script and got the same answer. Ah thanks, it's a bug in my understanding of the thresholds. "With the standard freezing policy, the age column will start at one billion for a freshly-vacuumed database." So essentially, 1B is normal, 2B is the max. The logic is now.. The script detects a wrap at 2 billion. It starts warning once one or more databases show an age over 1.5 billion transactions. It reports critical at 1.75B transactions. If anyone else understands differently, hit me with a clue bat. check_pg_transactionids.pl Description: Perl program ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] postgresql transaction id monitoring with nagios
On 5/2/06, Bruno Wolff III <[EMAIL PROTECTED]> wrote: On Tue, May 02, 2006 at 12:06:30 -0700, Tony Wasson <[EMAIL PROTECTED]> wrote: > > Ah thanks, it's a bug in my understanding of the thresholds. > > "With the standard freezing policy, the age column will start at one > billion for a freshly-vacuumed database." > > So essentially, 1B is normal, 2B is the max. The logic is now.. > > The script detects a wrap at 2 billion. It starts warning once one or > more databases show an age over 1.5 billion transactions. It reports > critical at 1.75B transactions. > > If anyone else understands differently, hit me with a clue bat. Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against wrap around. My motivation was primarily to monitor some existing PostgreSQL 8.0 servers. I'm not convinced it is "safe" to stop worrying about transaction ids even on an 8.1 box. It is comforting that 8.1 does safeguard against wraparound in at least 2 ways. First, it emits a warnings during the last 10 million transactions. If you manage to ignore all those, posgresql will shut down before a wraparound. I think PostgreSQL does everything correctly there, but I suspect someone will run into the shut down daemon problem. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Killing long-running queries
On 5/2/06, Dan Harris <[EMAIL PROTECTED]> wrote: My database is used primarily in an OLAP-type environment. Sometimes my users get a little carried away and find some way to slip past the sanity filters in the applications and end up bogging down the server with queries that run for hours and hours. And, of course, what users tend to do is to keep queuing up more queries when they don't see the first one return instantly :) So, I have been searching for a way to kill an individual query. I read in the mailing list archives that you could 'kill' the pid. I've tried this a few times and more than once, it has caused the postmaster to die(!), terminating every query that was in process, even unrelated to that query. Is there some way I can just kill a query and not risk breaking everything else when I do it? Thanks Hi Dan, You can kill a specific pid under 8.1 using SELECT pg_cancel_backend(pid). You can kill a query from the command line by doing $ kill -TERM pid or $kill -SIGINT pid. There are several tips from this thread that may be useful about killing long running SQL: http://archives.postgresql.org/pgsql-general/2006-02/msg00298.php In short, the recommendations are: 1) Use statement_timeouts if at all possible. You can do this database wide in postgresql.conf. You can also set this on a per user or per SQL statement basis. 2) Make step #1 does not kill autovacuum, or necessary automated jobs. You can do this with "ALTER USER SET statement_timeout = 0". I'm using a web page to show SELECT * FROM pg_stat_activity output from several servers. This makes it easy to see the pids of any long-running SQL. http://archives.postgresql.org/pgsql-general/2006-02/msg00427.php ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] New server to improve performance on our large and busy DB - advice? (v2)
On Thu, 14 Jan 2010 16:35:53 -0600 Dave Crooke wrote: > For any given database engine, regardless of the marketing and support > stance, there is only one true "primary" enterprise OS platform that > most big mission critical sites use, and is the best supported and > most stable platform for that RDBMS. For Oracle, that's HP-UX (but 10 > years ago, it was Solaris). For PostgreSQL, it's Linux. I am interested in this response and am wondering if this is just Dave's opinion or some sort of official PostgreSQL policy. I am learning PostgreSQL by running it on FreeBSD 8.0-STABLE. So far I have found no problems and have even read a few posts that are critical of Linux's handling of fsync. I really don't want to start a Linux vs FreeBSD flame war (I like Linux and use that too, though not for database use), I am just intrigued by the claim that Linux is somehow the natural OS for running PostgreSQL. I think if Dave had said "for PostgreSQL, it's a variant of Unix" I wouldn't have been puzzled. So I suppose the question is: what is it about Linux specifically (as contrasted with other Unix-like OSes, especially Open Source ones) that makes it particularly suitable for running PostgreSQL? Best, Tony -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Oracle v. Postgres 9.0 query performance
ix | %m-%u-%p log_lock_waits | on log_min_error_statement | panic log_min_messages | notice log_rotation_age | 0 log_rotation_size| 0 log_truncate_on_rotation | off logging_collector| on maintenance_work_mem | 1GB max_connections | 400 max_stack_depth | 2MB search_path | x server_encoding | UTF8 shared_buffers | 7680MB TimeZone | US/Eastern wal_buffers | 32MB wal_level| archive work_mem | 768MB Should this query be hashing the smaller table on Postgres rather than using nested loops? Thanks. Tony -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oracle v. Postgres 9.0 query performance
pg_dw=# set enable_nestloop =0; SET Time: 0.165 ms pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100) pg_dw-# as pg_dw-# select o.emailcampaignid, count(memberid) opencnt pg_dw-# from openactivity o,ecr_sents s pg_dw-# where s.emailcampaignid = o.emailcampaignid pg_dw-# group by o.emailcampaignid; QUERY PLAN - HashAggregate (cost=4391163.81..4391288.05 rows=9939 width=12) -> Hash Join (cost=14.78..4344767.23 rows=9279316 width=12) Hash Cond: (o.emailcampaignid = s.emailcampaignid) -> Seq Scan on openactivity o (cost=0.00..3529930.67 rows=192540967 width=12) -> Hash (cost=8.79..8.79 rows=479 width=4) -> Seq Scan on ecr_sents s (cost=0.00..8.79 rows=479 width=4) Yikes. Two sequential scans. On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote: > Tony Capobianco writes: > > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100) > > pg_dw-# as > > pg_dw-# select o.emailcampaignid, count(memberid) opencnt > > pg_dw-# from openactivity o,ecr_sents s > > pg_dw-# where s.emailcampaignid = o.emailcampaignid > > pg_dw-# group by o.emailcampaignid; > > QUERY > > PLAN > > - > > GroupAggregate (cost=0.00..1788988.05 rows=9939 width=12) > >-> Nested Loop (cost=0.00..1742467.24 rows=9279316 width=12) > > -> Index Scan using ecr_sents_ecid_idx on ecr_sents s > > (cost=0.00..38.59 rows=479 width=4) > > -> Index Scan using openact_emcamp_idx on openactivity o > > (cost=0.00..3395.49 rows=19372 width=12) > >Index Cond: (o.emailcampaignid = s.emailcampaignid) > > (5 rows) > > > Should this query be hashing the smaller table on Postgres rather than > > using nested loops? > > Yeah, seems like it. Just for testing purposes, do "set enable_nestloop > = 0" and see what plan you get then. > > regards, tom lane > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oracle v. Postgres 9.0 query performance
Here's the explain analyze: pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100) as select o.emailcampaignid, count(memberid) opencnt from openactivity o,ecr_sents s where s.emailcampaignid = o.emailcampaignid group by o.emailcampaignid; QUERY PLAN GroupAggregate (cost=0.00..1788988.05 rows=9939 width=12) (actual time=308630.967..2592279.526 rows=472 loops=1) -> Nested Loop (cost=0.00..1742467.24 rows=9279316 width=12) (actual time=31.489..2589363.047 rows=8586466 loops=1) -> Index Scan using ecr_sents_ecid_idx on ecr_sents s (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479 loops=1) -> Index Scan using openact_emcamp_idx on openactivity o (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139 rows=17926 loops=479) Index Cond: (o.emailcampaignid = s.emailcampaignid) Total runtime: 2592284.336 ms On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote: > > On Postgres, this same query takes about 58 minutes (could not run > > explain analyze because it is in progress): > > > > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100) > > pg_dw-# as > > pg_dw-# select o.emailcampaignid, count(memberid) opencnt > > pg_dw-# from openactivity o,ecr_sents s > > pg_dw-# where s.emailcampaignid = o.emailcampaignid > > pg_dw-# group by o.emailcampaignid; > > QUERY > > PLAN > > - > > GroupAggregate (cost=0.00..1788988.05 rows=9939 width=12) > >-> Nested Loop (cost=0.00..1742467.24 rows=9279316 width=12) > > -> Index Scan using ecr_sents_ecid_idx on ecr_sents s > > (cost=0.00..38.59 rows=479 width=4) > > -> Index Scan using openact_emcamp_idx on openactivity o > > (cost=0.00..3395.49 rows=19372 width=12) > >Index Cond: (o.emailcampaignid = s.emailcampaignid) > > (5 rows) > > > > Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using > explain.depesz.com. > > regards > Tomas > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oracle v. Postgres 9.0 query performance
Well, this ran much better. However, I'm not sure if it's because of set enable_nestloop = 0, or because I'm executing the query twice in a row, where previous results may be cached. I will try this setting in my code for when this process runs later today and see what the result is. Thanks! pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100) pg_dw-# as pg_dw-# select o.emailcampaignid, count(memberid) opencnt pg_dw-# from openactivity o,ecr_sents s pg_dw-# where s.emailcampaignid = o.emailcampaignid pg_dw-# group by o.emailcampaignid; QUERY PLAN HashAggregate (cost=4391163.81..4391288.05 rows=9939 width=12) (actual time=167254.751..167254.937 rows=472 loops=1) -> Hash Join (cost=14.78..4344767.23 rows=9279316 width=12) (actual time=0.300..164577.131 rows=8586466 loops=1) Hash Cond: (o.emailcampaignid = s.emailcampaignid) -> Seq Scan on openactivity o (cost=0.00..3529930.67 rows=192540967 width=12) (actual time=0.011..124351.878 rows=192542480 loops=1) -> Hash (cost=8.79..8.79 rows=479 width=4) (actual time=0.253..0.253 rows=479 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 17kB -> Seq Scan on ecr_sents s (cost=0.00..8.79 rows=479 width=4) (actual time=0.010..0.121 rows=479 loops=1) Total runtime: 167279.950 ms On Wed, 2011-06-08 at 11:51 -0400, Stephen Frost wrote: > * Tony Capobianco (tcapobia...@prospectiv.com) wrote: > > HashAggregate (cost=4391163.81..4391288.05 rows=9939 width=12) > >-> Hash Join (cost=14.78..4344767.23 rows=9279316 width=12) > > Hash Cond: (o.emailcampaignid = s.emailcampaignid) > > -> Seq Scan on openactivity o (cost=0.00..3529930.67 > > rows=192540967 width=12) > > -> Hash (cost=8.79..8.79 rows=479 width=4) > >-> Seq Scan on ecr_sents s (cost=0.00..8.79 rows=479 > > width=4) > > > > Yikes. Two sequential scans. > > Err, isn't that more-or-less exactly what you want here? The smaller > table is going to be hashed and then you'll traverse the bigger table > and bounce each row off the hash table. Have you tried actually running > this and seeing how long it takes? The bigger table doesn't look to be > *that* big, if your i/o subsystem is decent and you've got a lot of > memory available for kernel cacheing, should be quick. > > Thanks, > > Stephen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oracle v. Postgres 9.0 query performance
pg_dw=# show random_page_cost ; random_page_cost -- 4 (1 row) Time: 0.299 ms pg_dw=# show seq_page_cost ; seq_page_cost --- 1 (1 row) Time: 0.250 ms pg_dw=# show work_mem ; work_mem -- 768MB (1 row) On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote: > Hello > > what is your settings for > > random_page_cost, seq_page_cost and work_mem? > > Regards > > Pavel Stehule > > 2011/6/8 Tony Capobianco : > > Here's the explain analyze: > > > > pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100) > > as > > select o.emailcampaignid, count(memberid) opencnt > > from openactivity o,ecr_sents s > > where s.emailcampaignid = o.emailcampaignid > > group by o.emailcampaignid; > > > > QUERY > > PLAN > > > > GroupAggregate (cost=0.00..1788988.05 rows=9939 width=12) (actual > > time=308630.967..2592279.526 rows=472 loops=1) > > -> Nested Loop (cost=0.00..1742467.24 rows=9279316 width=12) > > (actual time=31.489..2589363.047 rows=8586466 loops=1) > > -> Index Scan using ecr_sents_ecid_idx on ecr_sents s > > (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479 > > loops=1) > > -> Index Scan using openact_emcamp_idx on openactivity o > > (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139 > > rows=17926 loops=479) > > Index Cond: (o.emailcampaignid = s.emailcampaignid) > > Total runtime: 2592284.336 ms > > > > > > On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote: > >> > On Postgres, this same query takes about 58 minutes (could not run > >> > explain analyze because it is in progress): > >> > > >> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100) > >> > pg_dw-# as > >> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt > >> > pg_dw-# from openactivity o,ecr_sents s > >> > pg_dw-# where s.emailcampaignid = o.emailcampaignid > >> > pg_dw-# group by o.emailcampaignid; > >> > QUERY > >> > PLAN > >> > - > >> > GroupAggregate (cost=0.00..1788988.05 rows=9939 width=12) > >> >-> Nested Loop (cost=0.00..1742467.24 rows=9279316 width=12) > >> > -> Index Scan using ecr_sents_ecid_idx on ecr_sents s > >> > (cost=0.00..38.59 rows=479 width=4) > >> > -> Index Scan using openact_emcamp_idx on openactivity o > >> > (cost=0.00..3395.49 rows=19372 width=12) > >> >Index Cond: (o.emailcampaignid = s.emailcampaignid) > >> > (5 rows) > >> > > >> > >> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using > >> explain.depesz.com. > >> > >> regards > >> Tomas > >> > >> > > > > > > > > -- > > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance > > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oracle v. Postgres 9.0 query performance
My current setting is 22G. According to some documentation, I want to set effective_cache_size to my OS disk cache + shared_buffers. In this case, I have 4 quad-core processors with 512K cache (8G) and my shared_buffers is 7680M. Therefore my effective_cache_size should be approximately 16G? Most of our other etl processes are running fine, however I'm curious if I could see a significant performance boost by reducing the effective_cache_size. On Wed, 2011-06-08 at 13:03 -0400, Tom Lane wrote: > Tony Capobianco writes: > > Well, this ran much better. However, I'm not sure if it's because of > > set enable_nestloop = 0, or because I'm executing the query twice in a > > row, where previous results may be cached. I will try this setting in > > my code for when this process runs later today and see what the result > > is. > > If the performance differential holds up, you should look at adjusting > your cost parameters so that the planner isn't so wrong about which one > is faster. Hacking enable_nestloop is a band-aid, not something you > want to use in production. > > Looking at the values you gave earlier, I wonder whether the > effective_cache_size setting isn't unreasonably high. That's reducing > the estimated cost of accessing the large table via indexscans, and > I'm thinking it reduced it too much. > > regards, tom lane > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oracle v. Postgres 9.0 query performance
Oooo...some bad math there. Thanks. On Wed, 2011-06-08 at 12:38 -0700, Samuel Gendler wrote: > > > On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco > wrote: > My current setting is 22G. According to some documentation, I > want to > set effective_cache_size to my OS disk cache + > shared_buffers. In this > case, I have 4 quad-core processors with 512K cache (8G) and > my > shared_buffers is 7680M. Therefore my effective_cache_size > should be > approximately 16G? Most of our other etl processes are > running fine, > however I'm curious if I could see a significant performance > boost by > reducing the effective_cache_size. > > > > > > disk cache, not CPU memory cache. It will be some significant > fraction of total RAM on the host. Incidentally, 16 * 512K cache = > 8MB, not 8GB. > > > http://en.wikipedia.org/wiki/CPU_cache > > > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Cursor fetch performance issue
We are migrating our Oracle warehouse to Postgres 9. This function responds well: pg=# select public.getMemberAdminPrevious_sp2(247815829, 1,'test.em...@hotmail.com', 'email', 'test'); getmemberadminprevious_sp2 (1 row) Time: 7.549 ms However, when testing, this fetch takes upwards of 38 minutes: BEGIN; select public.getMemberAdminPrevious_sp2(247815829, 1,'test.em...@hotmail.com', 'email', 'test'); FETCH ALL IN ""; How can I diagnose any performance issues with the fetch in the cursor? Thanks. Tony -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cursor fetch performance issue
Running just the sql of the function returns only 10 rows: pg=# SELECT m.memberid, m.websiteid, m.emailaddress, pg-#m.firstname, m.lastname, m.regcomplete, m.emailok pg-# FROM members m pg-# WHERE m.emailaddress LIKE 'test.em...@hotmail.com' pg-# ANDm.changedate_id < 5868 ORDER BY m.emailaddress, m.websiteid; memberid | websiteid |emailaddress| firstname | lastname | regcomplete | emailok ---+---++---+--+-+- 247815829 | 1 | test.em...@hotmail.com | email | test | 1 | 1 300960335 |62 | test.em...@hotmail.com | | | 1 | 1 300959937 | 625 | test.em...@hotmail.com | | | 1 | 1 260152830 | 1453 | test.em...@hotmail.com | | | 1 | 1 300960163 | 1737 | test.em...@hotmail.com | email | test | 1 | 1 300960259 | 1824 | test.em...@hotmail.com | email | test | 1 | 1 300959742 | 1928 | test.em...@hotmail.com | email | test | 1 | 1 368122699 | 2457 | test.em...@hotmail.com | email | test | 1 | 1 403218613 | 2464 | test.em...@hotmail.com | email | test | 1 | 0 378951994 | 2656 | test.em...@hotmail.com | | | 1 | 1 (10 rows) Time: 132.626 ms So, it would seem that's a small enough number of rows. Unfortunately, issuing: set cursor_tuple_fraction to 1.0; Did not have an effect on performance. Is it common to modify this cursor_tuple_fraction parameter each time we execute the function? On Tue, 2012-01-24 at 21:47 +0100, Pavel Stehule wrote: > Hello > > 2012/1/24 Tony Capobianco : > > We are migrating our Oracle warehouse to Postgres 9. > > > > This function responds well: > > > > pg=# select public.getMemberAdminPrevious_sp2(247815829, > > 1,'test.em...@hotmail.com', 'email', 'test'); > > getmemberadminprevious_sp2 > > > > > > (1 row) > > > > Time: 7.549 ms > > > > However, when testing, this fetch takes upwards of 38 minutes: > > > > BEGIN; > > select public.getMemberAdminPrevious_sp2(247815829, > > 1,'test.em...@hotmail.com', 'email', 'test'); > > FETCH ALL IN ""; > > > > How can I diagnose any performance issues with the fetch in the cursor? > > > > Cursors are optimized to returns small subset of result - if you plan > to read complete result, then set > > set cursor_tuple_fraction to 1.0; > > this is session config value, you can set it before selected cursors queries > > Regards > > Pavel Stehule > > > Thanks. > > Tony > > > > > > -- > > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cursor fetch performance issue
Here's the explain: pg=# explain select getMemberAdminPrevious_sp(247815829, 1,'test.em...@hotmail.com', 'Email', 'Test'); QUERY PLAN -- Result (cost=0.00..0.26 rows=1 width=0) (1 row) Time: 1.167 ms There was discussion of 'LIKE' v. '=' and wildcard characters are not being entered into the $1 parameter. This is not generating a sql string. I feel it's something to do with the fetch of the refcursor. The cursor is a larger part of a function: CREATE OR REPLACE FUNCTION PUBLIC.GETMEMBERADMINPREVIOUS_SP2 ( p_memberid IN numeric, p_websiteid IN numeric, p_emailaddress IN varchar, p_firstname IN varchar, p_lastname IN varchar) RETURNS refcursor AS $$ DECLARE refrefcursor; l_sysdateidnumeric; BEGIN l_sysdateid := sysdateid(); if (p_memberid != 0) then if (p_emailaddress IS NOT NULL) then OPEN ref FOR SELECT m.memberid, m.websiteid, m.emailaddress, m.firstname, m.lastname, m.regcomplete, m.emailok FROM members m WHERE m.emailaddress LIKE p_emailaddress ANDm.changedate_id < l_sysdateid ORDER BY m.emailaddress, m.websiteid; end if; end if; Return ref; EXCEPTION WHEN NO_DATA_FOUND THEN Return null; END; $$ LANGUAGE 'plpgsql'; On Tue, 2012-01-24 at 22:17 +0100, Pavel Stehule wrote: > Hello > > > > > So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards? > > > > SELECT m.memberid, m.websiteid, m.emailaddress, > > m.firstname, m.lastname, m.regcomplete, m.emailok > > FROM members m > > WHERE m.emailaddress LIKE $1 > > ANDm.changedate_id < $2 > > ORDER BY m.emailaddress, m.websiteid; > > > > Or is it creating the string and executing it: > > > > sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, ' > >|| ' m.firstname, m.lastname, m.regcomplete, m.emailok ' > >|| ' FROM members m > >|| ' WHERE m.emailaddress LIKE ' || arg1 > >|| ' ANDm.changedate_id < ' || arg2 > >|| ' ORDER BY m.emailaddress, m.websiteid '; > > execute(sql); > > > > Maybe its the planner doesnt plan so well with $1 arguments vs actual > > arguments thing. > > > > sure, it could be blind optimization problem in plpgsql. Maybe you > have to use a dynamic SQL - OPEN FOR EXECUTE stmt probably > > http://www.postgresql.org/docs/9.1/interactive/plpgsql-cursors.html > > Regards > > Pavel Stehule > > > -Andy > > > > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] CPU spikes and transactions
Hi, I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a 16 Opteron 6276 CPU box. We limit connections to roughly 120, but our webapp is configured to allocate a thread-local connection, so those connections are rarely doing anything more than half the time. We have been running smoothly for over a year on this configuration, and recently started having huge CPU spikes that bring the system to its knees. Given that it is a multiuser system, it has been quite hard to pinpoint the exact cause, but I think we've narrowed it down to two data import jobs that were running in semi-long transactions (clusters of row inserts). The tables affected by these inserts are used in common queries. The imports will bring in a row count of perhaps 10k on average covering 4 tables. The insert transactions are at isolation level read committed (the default for the JDBC driver). When the import would run (again, theory...we have not been able to reproduce), we would end up maxed out on CPU, with a load average of 50 for 16 CPUs (our normal busy usage is a load average of 5 out of 16 CPUs). When looking at the active queries, most of them are against the tables that are affected by these imports. Our workaround (that is holding at present) was to drop the transactions on those imports (which is not optimal, but fortunately is acceptable for this particular data). This workaround has prevented any further incidents, but is of course inconclusive. Does this sound familiar to anyone, and if so, please advise. Thanks in advance, Tony Kay
Re: [PERFORM] CPU spikes and transactions
Hi Calvin, Yes, I have sar data on all systems going back for years. Since others are going to probably want to be assured I am really "reading the data" right: - This is 92% user CPU time, 5% sys, and 1% soft - On some of the problems, I _do_ see a short spike of pgswpout's (memory pressure), but again, not enough to end up using much system time - The database disks are idle (all data being used is in RAM)..and are SSDsaverage service times are barely measurable in ms. If I had to guess, I'd say it was spinlock misbehaviorI cannot understand why ekse a transaction blocking other things would drive the CPUs so hard into the ground with user time. Tony Tony Kay TeamUnify, LLC TU Corporate Website <http://www.teamunify.com/> TU Facebook <http://www.facebook.com/teamunify> | Free OnDeck Mobile Apps<http://www.teamunify.com/__corp__/ondeck/> On Mon, Oct 14, 2013 at 4:05 PM, Calvin Dodge wrote: > Have you tried running "vmstat 1" during these times? If so, what is > the percentage of WAIT time? Given that IIRC shared buffers should be > no more than 25% of installed memory, I wonder if too little is > available for system caching of disk reads. A high WAIT percentage > would indicate excessive I/O (especially random seeks). > > Calvin Dodge > > On Mon, Oct 14, 2013 at 6:00 PM, Tony Kay wrote: > > Hi, > > > > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a 16 > > Opteron 6276 CPU box. We limit connections to roughly 120, but our > webapp is > > configured to allocate a thread-local connection, so those connections > are > > rarely doing anything more than half the time. > > > > We have been running smoothly for over a year on this configuration, and > > recently started having huge CPU spikes that bring the system to its > knees. > > Given that it is a multiuser system, it has been quite hard to pinpoint > the > > exact cause, but I think we've narrowed it down to two data import jobs > that > > were running in semi-long transactions (clusters of row inserts). > > > > The tables affected by these inserts are used in common queries. > > > > The imports will bring in a row count of perhaps 10k on average covering > 4 > > tables. > > > > The insert transactions are at isolation level read committed (the > default > > for the JDBC driver). > > > > When the import would run (again, theory...we have not been able to > > reproduce), we would end up maxed out on CPU, with a load average of 50 > for > > 16 CPUs (our normal busy usage is a load average of 5 out of 16 CPUs). > > > > When looking at the active queries, most of them are against the tables > that > > are affected by these imports. > > > > Our workaround (that is holding at present) was to drop the transactions > on > > those imports (which is not optimal, but fortunately is acceptable for > this > > particular data). This workaround has prevented any further incidents, > but > > is of course inconclusive. > > > > Does this sound familiar to anyone, and if so, please advise. > > > > Thanks in advance, > > > > Tony Kay > > >
Re: [PERFORM] CPU spikes and transactions
On Mon, Oct 14, 2013 at 4:42 PM, Tomas Vondra wrote: > On 15.10.2013 01:00, Tony Kay wrote: > > Hi, > > > > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a > > 16 Opteron 6276 CPU box. We limit connections to roughly 120, but > > our webapp is configured to allocate a thread-local connection, so > > those connections are rarely doing anything more than half the time. > > Lower your shared buffers to about 20% of your RAM, unless you've tested > it's actually helping in your particular case. It's unlikely you'll get > better performance by using more than that, especially on older > versions, so it's wiser to leave the rest for page cache. > > It might even be one of the causes of the performance issue you're > seeing, as shared buffers are not exactly overhead-free. > > See this for more details on tuning: > >http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server I had followed the general directions from several sources years ago, which indicate up to 40% of RAM. We've been running very large shared buffers for 4 years now, but it is difficult to generate a good real load without testing against users, so we have not felt the need to move it around. In general, I don't tend to tinker with a setting that has been fine for this long without good reason. I've been wanting to upgrade to the newer mmap-based versions of pgsql, but was waiting to re-tune this when I did so. Why do you suspect that shared_buffers would cause the behavior I'm seeing? > > > You're on a rather old 9.1.x version, BTW. The last version in this > branch is 9.1.10 and there are some important security fixes (e.g. in > 9.1.9). Not sure if there are any fixes relevant to the performance > issue, though. > > An upgrade to 9.1.10 is planned. > A few initial questions: > > * What OS are we dealing with? > CentOS el6 > > * So how many active connections are there on average (see > pg_stat_activity for connections running queries)? > about 40-60 > > * How much data are we talking about? In total and in the imports? > 80GB database. The imports are maybe 1-3 MB...often much smaller. 10k rows would be a probably average. > > > We have been running smoothly for over a year on this configuration, > > and recently started having huge CPU spikes that bring the system to > > its knees. Given that it is a multiuser system, it has been quite > > hard to pinpoint the exact cause, but I think we've narrowed it down > > to two data import jobs that were running in semi-long transactions > > (clusters of row inserts). > > > > The tables affected by these inserts are used in common queries. > > > > The imports will bring in a row count of perhaps 10k on average > > covering 4 tables. > > > > The insert transactions are at isolation level read committed (the > > default for the JDBC driver). > > > > When the import would run (again, theory...we have not been able to > > reproduce), we would end up maxed out on CPU, with a load average of > > 50 for 16 CPUs (our normal busy usage is a load average of 5 out of > > 16 CPUs). > > > > When looking at the active queries, most of them are against the > > tables that are affected by these imports. > > Which processes consume most CPU time? Are those backends executing the > queries, or some background processes (checkpointer, autovacuum, ...)? > > The backends executing the queries...most of the queries that seem hung usually run in a few ms. > Can you post a "top -c" output collected at the time of the CPU peak? > > Don't have process accounting, so I cannot regenerate that; however, I can tell you what queries were active at one of them. There were 36 of the queries agains table ind_event (which is one affected by the import). Those queries usually take 5-10ms, and we never see more than 2 active during normal operation. These had been active for _minutes_a sample of the running queries: time_active | datname | procpid | query -+-+-+--- 00:08:10.891105 | tudb|9058 | select * from mr_uss_ind_event_x where (tu 00:08:10.981845 | tudb|8977 | select * from mr_uss_ind_event_x where (tu 00:07:08.883347 | tudb|8930 | select * from mr_uss_ind_event_x where org 00:07:15.266393 | tudb|8927 | select * from mr_uss_ind_event_x where org 00:07:27.587133 | tudb| 11867 | update msg_result set dt_result=$1,msg_id= 00:08:06.458885 | tudb|8912 | select * from mr_uss_ind_event_x where
Re: [PERFORM] CPU spikes and transactions
Thanks for the tip. I forgot there were kernel stats on spinlocks. I'm not sure we'll be able to get it to tip in a test environment, and we're unwilling to revert the code in production in order to have our users trigger it. We'll try triggering it on our test server, and if we manage, I'll get you the stats. Thanks! Tony Tony Kay TeamUnify, LLC TU Corporate Website <http://www.teamunify.com/> TU Facebook <http://www.facebook.com/teamunify> | Free OnDeck Mobile Apps<http://www.teamunify.com/__corp__/ondeck/> On Tue, Oct 15, 2013 at 6:00 AM, Merlin Moncure wrote: > On Mon, Oct 14, 2013 at 6:45 PM, Tomas Vondra wrote: > > On 15.10.2013 01:26, Tony Kay wrote: > >> Hi Calvin, > >> > >> Yes, I have sar data on all systems going back for years. > >> > >> Since others are going to probably want to be assured I am really > >> "reading the data" right: > >> > >> - This is 92% user CPU time, 5% sys, and 1% soft > >> - On some of the problems, I _do_ see a short spike of pgswpout's > >> (memory pressure), but again, not enough to end up using much system > time > >> - The database disks are idle (all data being used is in RAM)..and are > >> SSDsaverage service times are barely measurable in ms. > > > > OK. Can you share the data? Maybe we'll notice something suspicious. > > > >> If I had to guess, I'd say it was spinlock misbehaviorI cannot > >> understand why ekse a transaction blocking other things would drive > >> the CPUs so hard into the ground with user time. > > > > Have you tried running perf, to verify the time is actually spent on > > spinlocks? > > +1 this. It is almost certainly spinlocks, but we need to know which > one and why. plz install debug symbols and run a perf during normal > and high load conditions. > > merlin > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] CPU spikes and transactions
On Tue, Oct 15, 2013 at 10:26 AM, Julien Cigar wrote: > > for two reasons: > > - some of the overhead of bgwriter and checkpoints is more or less linear > in the size of shared_buffers, for example it could be possible that a > large quantity of data could be dirty when a checkpoint occurs). > > - the OS cache is also being used for reads and writes, the larger > shared_buffers is, the more you risk double buffering (same blocks > in the OS cache and in the database buffer cache). > Excellent. Thank you for the information. My suspicion has always been that the shared_buffers are "level 1 cache", so it seems to me that you'd want that to be large enough to hold your entire database if you could. However, I'm now realizing that I was _also_ assuming the IPC shared memory was also being locked via mlock to prevent swapping it out, and I'm now getting the feeling that this isn't true, which means the double buffering could lead to swap space use on buffer cache pressure...which I do occasionally see in ways I had not expected. We do streaming replication and also store them for snapshot PITR backups, so I am intimately familiar with our write load, and I can say it is pretty low (we ship a 16MB WAL file about every 10-15 minutes during our busiest times). That said, I can see how an import that is doing a bunch of writes could possibly spread those over a large area that would then consume a lot of CPU on the writer and checkpoint; however, I do not see how either of those would cause 40-60 different postgres backgroud processes (all running a normally "light query") to spin off into oblivion unless the write work load is somehow threaded into the background workers (which I'm sure it isn't). So, I think we're still dealing with a spinlock issue. We're going to upgrade to 9.1.10 (with debug symbols) Thursday night and add another 64GB of RAM. I'll tune shared_buffers down to 2GB at that time and bump effective_cache_size up at the same time. My large history of sar data will make it apparent pretty quickly if that is a win/lose/tie. If we have another spike in production, we'll be ready to measure it more accurately. Thanks, Tony