[PERFORM] Architecting a database

2010-06-25 Thread tony
I am in the process of moving a system that has been built around FoxPro
tables for the last 18 years into a PostgreSQL based system.

Over time I came up with decent strategies for making the FoxPro tables
work well with the workload that was placed on them, but we are getting to
the point that the locking mechanisms are causing problems when some of
the more used tables are being written to.

With the FoxPro tables I had one directory that contained the tables that
had global data that was common to all clients. Things like documents that
had been received and logged, checks that had been cut, etc. Then each
client had his own directory which housed tables that had information
relating to that specific client. Setting things up like this kept me from
having any tables that were too terribly large so record addition and
index creation were not very time consuming.

I am wondering how I should architect this in PostgreSQL. Should I follow
a similar strategy and have a separate database for each client and one
database that contains the global data? With the dBase and ISAM tables I
have a good idea of how to handle them since I have been working with them
since dBASE originally came out. With the PostgreSQL type tables I am not
so certain how the data is arranged within the one file. Does having the
data all in one database allow PostgreSQL to better utilize indexes and
caches or does having a number of smaller databases provide performance
increases? In case it is important, there are 2000 clients involved, so
that would be 2000 databases if I followed my current FoxPro related
structure. Of course, I suppose it is always possible to combine a number
of groups into a database if the number of databases is an issue.

Tables within the client specific databases are generally name and address
information as well as tables for 10 different types of accounts which
require different structures and those tables hold anywhere from 10,000
transactions a piece for some smaller groups and 1 million for larger
groups. I believe we have read to write ratio of about 1 to 15.

Thanks for any input.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Architecting a database

2010-06-30 Thread tony
Thanks for all of the input everyone.

I believe I am going to put together a test case using schemas and
partitioning and then doubling the amount of data currently in the system
to give me an idea of how things will be performing a couple of years down
the road.

I was looking at a server using the new Opteron 6100 series for the new
server and it would have 32 cores, but the speed is 2ghz. I read a post
earlier today that mentioned in passing that it was better to have a
faster processor than more cores. I was wondering whether or not this
would be a good selection since there are CPUs in the Intel branch that
are quad core up to 3.3ghz.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Monitoring Postgresql performance

2005-09-29 Thread Tony Wasson
On 9/28/05, Matthew Nuzum <[EMAIL PROTECTED]> wrote:
> On 9/28/05, Arnau <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> >I have been "googling" a bit searching info about a way to monitor
> > postgresql (CPU & Memory, num processes, ... ) and I haven't found
> > anything relevant. I'm using munin to monitor others parameters of my
> > servers and I'd like to include postgresql or have a similar tool. Any
> > of you is using anything like that? all kind of hints are welcome :-)

We are also using cricket + nagios.

On each DB server: Setup snmpd and use snmpd.conf to set disk quotas
and mark processes that need to be running  (like
postmaster,syslog,sshd)

On the monitoring server(s): Use cricket for long term trends &
graphs. Use nagios for current status and alerting and some trending.
(Nagios has plugins over SNMP for load,cpu,memory,disk and processes)

Here's the nagios plugins I have hacked up over the past few months
and what they do. I'd imagine some could use better names. I can
provide these of package them up if anyone is interested.

check_pgconn.pl - Shows percentage of connections available. It uses
"SELECT COUNT(*) FROM pg_stat_activity" / "SHOW max_connections". It
can also alert when less than a certain number of connections are
available.

check_pgqueries.pl - If you have query logging enabled this summarizes
the types of queries running (SELECT ,INSERT ,DELETE ,UPDATE ,ALTER
,CREATE ,TRUNCATE, VACUUM, COPY) and warns if any queries have been
running longer than 5 minutes (configurable).

check_pglocks.pl - Look for locks that block and for baselining lock activity.

check_pgtime.pl - Makes sure that postgresql's time is in sync with
the monitoring server.

check_pgqueries.pl - Whines if any queries are in the "waiting" state.
The script that runs on each DB server does "ps auxww | grep postgres
| grep -i "[W]aiting"" and exposes that through SNMP using the exec
functionality. Nagios then alerts if queries are being blocked.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Tony Capobianco
We are in the process of testing migration of our oracle data warehouse
over to postgres.  A potential showstopper are full table scans on our
members table.  We can't function on postgres effectively unless index
scans are employed.  I'm thinking I don't have something set correctly
in my postgresql.conf file, but I'm not sure what.

This table has approximately 300million rows.

Version:
SELECT version();

version  
--
 PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit

We have 4 quad-core processors and 32GB of RAM.  The below query uses
the members_sorted_idx_001 index in oracle, but in postgres, the
optimizer chooses a sequential scan.

explain analyze create table tmp_srcmem_emws1
as
select emailaddress, websiteid
  from members
 where emailok = 1
   and emailbounced = 0;
  QUERY
PLAN  
--
 Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
(actual time=0.052..685834.785 rows=236660930 loops=1)
   Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
 Total runtime: 850306.220 ms
(3 rows)

show shared_buffers ;
 shared_buffers 

 7680MB
(1 row)

show effective_cache_size ;
 effective_cache_size 
--
 22GB
(1 row)

show work_mem ;
 work_mem 
--
 768MB
(1 row)

show enable_seqscan ;
 enable_seqscan 

 on
(1 row)

Below are the data definitions for the table/indexes in question:

\d members
 Table "members"
   Column|Type | Modifiers 
-+-+---
 memberid| numeric | not null
 firstname   | character varying(50)   | 
 lastname| character varying(50)   | 
 emailaddress| character varying(50)   | 
 password| character varying(50)   | 
 address1| character varying(50)   | 
 address2| character varying(50)   | 
 city| character varying(50)   | 
 statecode   | character varying(50)   | 
 zipcode | character varying(50)   | 
 birthdate   | date| 
 emailok | numeric(2,0)| 
 gender  | character varying(1)| 
 addeddate   | timestamp without time zone | 
 emailbounced| numeric(2,0)| 
 changedate  | timestamp without time zone | 
 optoutsource| character varying(100)  | 
 websiteid   | numeric | 
 promotionid | numeric | 
 sourceid| numeric | 
 siteid  | character varying(64)   | 
 srcwebsiteid| numeric | 
 homephone   | character varying(20)   | 
 homeareacode| character varying(10)   | 
 campaignid  | numeric | 
 srcmemberid | numeric | 
 optoutdate  | date| 
 regcomplete | numeric(1,0)| 
 regcompletesourceid | numeric | 
 ipaddress   | character varying(25)   | 
 pageid  | numeric | 
 streetaddressstatus | numeric(1,0)| 
 middlename  | character varying(50)   | 
 optinprechecked | numeric(1,0)| 
 optinposition   | numeric | 
 homephonestatus | numeric | 
 addeddate_id| numeric | 
 changedate_id   | numeric | 
 rpmindex| numeric | 
 optmode | numeric(1,0)| 
 countryid   | numeric | 
 confirmoptin| numeric(2,0)| 
 bouncedate  | date| 
 memberageid | numeric | 
 sourceid2   | numeric | 
 remoteuserid| character varying(50)   | 
 goal| numeric(1,0)| 
 flowdepth   | numeric | 
 pagetype| numeric | 
 savepassword| character varying(50)   | 
 customerprofileid   | numeric | 
Indexes:
"email_website_unq" UNIQUE, btree (emailaddress, websiteid),
tablespace "members_idx"
"member_addeddateid_idx" btree (addeddate_id), tablespace
"members_idx"
"member_changedateid_idx" 

Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Tony Capobianco
The recommendations on the numeric columns are fantastic.  Thank you
very much.  We will revisit our methods of assigning datatypes when we
migrate our data over from Oracle.
Regarding the full table scans; it appears inevitable that full table
scans are necessary for the volume of data involved and the present
design of our indexes.  Over time, indexes were added/removed to satisfy
particular functionality.  Considering this is our most important table,
I will research exactly how this table is queried to better
optimize/reorganize our indexes.

Thanks for your help.
Tony


On Thu, 2010-10-14 at 23:59 -0400, Mladen Gogala wrote:
> On 10/14/2010 4:10 PM, Jon Nelson wrote:
> > The first thing I'd do is think real hard about whether you really
> > really want 'numeric' instead of boolean, smallint, or integer.  The
> > second thing is that none of your indices (which specify a whole bunch
> > of fields, by the way) have only just emailok, emailbounced, or only
> > the pair of them. Without knowing the needs of your app, I would
> > reconsider your index choices and go with fewer columns per index.
> >
> Also, make sure that the statistics is good, that histograms are large 
> enough and that Geico (the genetic query optimizer) will really work 
> hard to save you 15% or more on the query execution time. You can also 
> make sure that any index existing index is used,  by disabling the 
> sequential scan and then activating and de-activating indexes with the 
> dummy expressions,  just as it was done with Oracle's rule based optimizer.
> I agree that a good data model is even more crucial for Postgres than is 
> the case with Oracle. Oracle, because of its rich assortment of tweaking 
> & hacking tools and parameters, can be made to perform, even if the 
> model is designed by someone who didn't apply the rules of good design. 
> Postgres is much more susceptible to bad models and it is much harder to 
> work around a badly designed model in Postgres than in Oracle. What 
> people do not understand is that every application in the world will 
> become badly designed after years of maintenance, adding columns, 
> creating additional indexes, views, tables and triggers and than 
> deploying various tools to design applications.  As noted by Murphy, 
> things develop from bad to worse. Keep Postgres models simple and 
> separated, because it's much easier to keep clearly defined models 
> simple and effective than to keep models with 700 tables and 350 views, 
> frequently with conflicting names, different columns named the same and 
> same columns named differently. And monitor, monitor, monitor. Use 
> strace, ltrace,  pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, 
> iostat and all tools you can get hold of. Without the event interface, 
> it's frequently a guessing game.  It is, however, possible to manage 
> things.  If working with partitioning, be very aware that PostgreSQL 
> optimizer has certain problems with partitions, especially with group 
> functions. If you want speed, everything must be prefixed with 
> partitioning column: indexes, expressions, joins. There is no explicit 
> star schema and creating hash indexes will not buy you much, as a matter 
> of fact, Postgres community is extremely suspicious of the hash indexes 
> and I don't see them widely used.
> Having said that, I was able to solve the problems with my speed and 
> partitioning.
> 
> -- 
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
> 
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Tony Capobianco
Thanks for all your responses. What's interesting is that an index is
used when this query is executed in Oracle.  It appears to do some
parallel processing:

SQL> set line 200
delete from plan_table;
explain plan for
select websiteid, emailaddress
  from members
 where emailok = 1
   and emailbounced = 0;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL> 
3 rows deleted.

SQL>   2345  
Explained.

SQL> SQL> 
PLAN_TABLE_OUTPUT

Plan hash value: 4247959398

---
| Id  | Operation   | Name   | Rows  | Bytes
| Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
---
|   0 | SELECT STATEMENT||   237M|
7248M|   469K  (2)| 01:49:33 ||  ||
|   1 |  PX COORDINATOR ||   |
||  ||  ||
|   2 |   PX SEND QC (RANDOM)   | :TQ1   |   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | P->S | QC (RAND)  |
|   3 |PX BLOCK ITERATOR||   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC ||
|*  4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP ||
---

PLAN_TABLE_OUTPUT


Predicate Information (identified by operation id):
---

   4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1)

16 rows selected.


On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote:
> 
> > -Original Message-
> > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
> > Sent: Thursday, October 14, 2010 3:43 PM
> > To: pgsql-performance@postgresql.org
> > Subject: oracle to psql migration - slow query in postgres
> > 
> > We are in the process of testing migration of our oracle data 
> > warehouse over to postgres.  A potential showstopper are full 
> > table scans on our members table.  We can't function on 
> > postgres effectively unless index scans are employed.  I'm 
> > thinking I don't have something set correctly in my 
> > postgresql.conf file, but I'm not sure what.
> > 
> > This table has approximately 300million rows.
> > 
> > Version:
> > SELECT version();
> > 
> > version  
> > --
> > 
> >  PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC 
> > gcc (GCC)
> > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
> > 
> > We have 4 quad-core processors and 32GB of RAM.  The below 
> > query uses the members_sorted_idx_001 index in oracle, but in 
> > postgres, the optimizer chooses a sequential scan.
> > 
> > explain analyze create table tmp_srcmem_emws1 as select 
> > emailaddress, websiteid
> >   from members
> >  where emailok = 1
> >and emailbounced = 0;
> >   QUERY
> > PLAN  
> > --
> > 
> >  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 
> > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1)
> >Filter: ((emailok = 1::numeric) AND (emailbounced = 
> > 0::numeric))  Total runtime: 850306.220 ms
> > (3 rows)
> > 
> > show shared_buffers ;
> >  shared_buffers
> > 
> >  7680MB
> > (1 row)
> > 
> > show effective_cache_size ;
> >  effective_cache_size
> > --
> >  22GB
> > (1 row)
> > 
> > show work_mem ;
> >  work_mem
> > --
> >  768MB
> > (1 row)
> > 
> > show enable_seqscan ;
> >  enable_seqscan

Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Tony Capobianco
Very true Igor!  Free is my favorite price.  
I'll figure a way around this issue.

Thanks for your help.
Tony

On Fri, 2010-10-15 at 14:54 -0400, Igor Neyman wrote:
> > -Original Message-
> > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
> > Sent: Friday, October 15, 2010 2:14 PM
> > To: pgsql-performance@postgresql.org
> > Subject: Re: oracle to psql migration - slow query in postgres
> > 
> > Thanks for all your responses. What's interesting is that an 
> > index is used when this query is executed in Oracle.  It 
> > appears to do some parallel processing:
> > 
> > SQL> set line 200
> > delete from plan_table;
> > explain plan for
> > select websiteid, emailaddress
> >   from members
> >  where emailok = 1
> >and emailbounced = 0;
> > 
> > SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
> > SQL> 
> > 3 rows deleted.
> > 
> > SQL>   2345  
> > Explained.
> > 
> > SQL> SQL> 
> > PLAN_TABLE_OUTPUT
> > --
> > --
> > --
> > --
> > Plan hash value: 4247959398
> > 
> > --
> > -
> > | Id  | Operation   | Name   | 
> > Rows  | Bytes
> > | Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
> > --
> > -
> > |   0 | SELECT STATEMENT||   237M|
> > 7248M|   469K  (2)| 01:49:33 ||  ||
> > |   1 |  PX COORDINATOR ||   |
> > ||  ||  ||
> > |   2 |   PX SEND QC (RANDOM)   | :TQ1   |   237M|
> > 7248M|   469K  (2)| 01:49:33 |  Q1,00 | P->S | QC (RAND)  |
> > |   3 |PX BLOCK ITERATOR||   237M|
> > 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC ||
> > |*  4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
> > 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP ||
> > --
> > -
> > 
> > PLAN_TABLE_OUTPUT
> > --
> > --
> > --
> > --
> > 
> > Predicate Information (identified by operation id):
> > ---
> > 
> >4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1)
> > 
> > 16 rows selected.
> > 
> > 
> 
> 1. Postgres doesn't have "FAST FULL SCAN" because even if all the info
> is in the index, it need to visit the row in the table ("visibility"
> issue).
> 
> 2. Postgres doesn't have parallel executions.
> 
> BUT, it's free anf has greate community support, as you already saw.
> 
> Regards,
> Igor Neyman
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] postgresql transaction id monitoring with nagios

2006-05-02 Thread Tony Wasson

Ever since I started working with PostgreSQL I've heard the need to
watch transaction IDs. The phrase "transaction ID wraparound" still
gives me a shiver. Attached it a short script that works with the
monitoring system Nagios to keep an eye on transaction IDs. It should
be easy to adapt to any other monitoring system.

It runs the textbook query below and reports how close you are to wraparound.
 SELECT datname, age(datfrozenxid) FROM pg_database;

The script detects a wrap at 2 billion. It starts warning once one or
more databases show an age over 1 billion transactions. It reports
critical at 1.5B transactions. I hope everyone out there is vacuuming
*all* databases often.

Hope some of you can use this script!
Tony Wasson


check_pg_transactionids.pl
Description: Perl program

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] postgresql transaction id monitoring with nagios

2006-05-02 Thread Tony Wasson

On 5/2/06, Vivek Khera <[EMAIL PROTECTED]> wrote:


On May 2, 2006, at 2:26 PM, Tony Wasson wrote:

> The script detects a wrap at 2 billion. It starts warning once one or
> more databases show an age over 1 billion transactions. It reports
> critical at 1.5B transactions. I hope everyone out there is vacuuming
> *all* databases often.

Something seems wrong... I just ran your script against my
development database server which is vacuumed daily and it said I was
53% of the way to 2B.  Seemed strange to me, so I re-ran "vacuum -a -
z" to vacuum all databases (as superuser), reran the script and got
the same answer.


Ah thanks, it's a bug in my understanding of the thresholds.

"With the standard freezing policy, the age column will start at one
billion for a freshly-vacuumed database."

So essentially, 1B is normal, 2B is the max. The logic is now..

The script detects a wrap at 2 billion. It starts warning once one or
more databases show an age over 1.5 billion transactions. It reports
critical at 1.75B transactions.

If anyone else understands differently, hit me with a clue bat.


check_pg_transactionids.pl
Description: Perl program

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] postgresql transaction id monitoring with nagios

2006-05-02 Thread Tony Wasson

On 5/2/06, Bruno Wolff III <[EMAIL PROTECTED]> wrote:

On Tue, May 02, 2006 at 12:06:30 -0700,
  Tony Wasson <[EMAIL PROTECTED]> wrote:
>
> Ah thanks, it's a bug in my understanding of the thresholds.
>
> "With the standard freezing policy, the age column will start at one
> billion for a freshly-vacuumed database."
>
> So essentially, 1B is normal, 2B is the max. The logic is now..
>
> The script detects a wrap at 2 billion. It starts warning once one or
> more databases show an age over 1.5 billion transactions. It reports
> critical at 1.75B transactions.
>
> If anyone else understands differently, hit me with a clue bat.

Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against
wrap around.


My motivation was primarily to monitor some existing PostgreSQL 8.0
servers. I'm not convinced it is "safe" to stop worrying about
transaction ids even on an 8.1 box.

It is comforting that 8.1 does safeguard against wraparound in at
least 2 ways. First, it emits a warnings during the last 10 million
transactions. If you manage to ignore all those, posgresql will shut
down before a wraparound. I think PostgreSQL does everything correctly
there, but I suspect someone will run into the shut down daemon
problem.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Killing long-running queries

2006-05-02 Thread Tony Wasson

On 5/2/06, Dan Harris <[EMAIL PROTECTED]> wrote:

My database is used primarily in an OLAP-type environment.  Sometimes my
users get a little carried away and find some way to slip past the
sanity filters in the applications and end up bogging down the server
with queries that run for hours and hours.  And, of course, what users
tend to do is to keep queuing up more queries when they don't see the
first one return instantly :)

So, I have been searching for a way to kill an individual query.  I read
in the mailing list archives that you could 'kill' the pid.  I've tried
this a few times and more than once, it has caused the postmaster to
die(!), terminating every query that was in process, even unrelated to
that query.

Is there some way I can just kill a query and not risk breaking
everything else when I do it?

Thanks



Hi Dan,

You can kill a specific pid under 8.1 using SELECT
pg_cancel_backend(pid). You can kill a query from the command line by
doing $ kill -TERM pid or $kill -SIGINT pid.

There are several tips from this thread that may be useful about
killing long running SQL:
 http://archives.postgresql.org/pgsql-general/2006-02/msg00298.php

In short, the recommendations are:
  1) Use statement_timeouts if at all possible. You can do this
database wide in postgresql.conf. You can also set this on a per user
or per SQL statement basis.
  2) Make step #1 does not kill autovacuum, or necessary automated
jobs. You can do this with "ALTER USER SET statement_timeout = 0".

I'm using a web page to show SELECT * FROM pg_stat_activity output
from several servers.  This makes it easy to see the pids of any
long-running SQL.

http://archives.postgresql.org/pgsql-general/2006-02/msg00427.php

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] New server to improve performance on our large and busy DB - advice? (v2)

2010-01-15 Thread Tony McC
On Thu, 14 Jan 2010 16:35:53 -0600
Dave Crooke  wrote:

> For any given database engine, regardless of the marketing and support
> stance, there is only one true "primary" enterprise OS platform that
> most big mission critical sites use, and is the best supported and
> most stable platform for that RDBMS. For Oracle, that's HP-UX (but 10
> years ago, it was Solaris). For PostgreSQL, it's Linux.

I am interested in this response and am wondering if this is just
Dave's opinion or some sort of official PostgreSQL policy.  I am
learning PostgreSQL by running it on FreeBSD 8.0-STABLE.  So far I
have found no problems and have even read a few posts that are critical
of Linux's handling of fsync.  I really don't want to start a Linux vs
FreeBSD flame war (I like Linux and use that too, though not for
database use), I am just intrigued by the claim that Linux is somehow
the natural OS for running PostgreSQL.  I think if Dave had said "for
PostgreSQL, it's a variant of Unix" I wouldn't have been puzzled.  So I
suppose the question is: what is it about Linux specifically (as
contrasted with other Unix-like OSes, especially Open Source ones) that
makes it particularly suitable for running PostgreSQL?

Best,
Tony
  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
ix  | %m-%u-%p
 log_lock_waits   | on
 log_min_error_statement  | panic
 log_min_messages | notice
 log_rotation_age | 0
 log_rotation_size| 0
 log_truncate_on_rotation | off
 logging_collector| on
 maintenance_work_mem | 1GB
 max_connections  | 400
 max_stack_depth  | 2MB
 search_path  | x
 server_encoding  | UTF8
 shared_buffers   | 7680MB
 TimeZone | US/Eastern
 wal_buffers  | 32MB
 wal_level| archive
 work_mem | 768MB

Should this query be hashing the smaller table on Postgres rather than
using nested loops?

Thanks.
Tony


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
   QUERY
PLAN
-
 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
   ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
 Hash Cond: (o.emailcampaignid = s.emailcampaignid)
 ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
 ->  Hash  (cost=8.79..8.79 rows=479 width=4)
   ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.


On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
> Tony Capobianco  writes:
> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > pg_dw-# as
> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> > pg_dw-#   from openactivity o,ecr_sents s
> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> > pg_dw-#  group by o.emailcampaignid;
> >  QUERY
> > PLAN  
> > -
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4)
> >  ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12)
> >Index Cond: (o.emailcampaignid = s.emailcampaignid)
> > (5 rows)
> 
> > Should this query be hashing the smaller table on Postgres rather than
> > using nested loops?
> 
> Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
> = 0" and see what plan you get then.
> 
>   regards, tom lane
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Here's the explain analyze:

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
as
select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
 where s.emailcampaignid = o.emailcampaignid
 group by o.emailcampaignid;

QUERY
PLAN   

 GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
time=308630.967..2592279.526 rows=472 loops=1)
   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
(actual time=31.489..2589363.047 rows=8586466 loops=1)
 ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
loops=1)
 ->  Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
rows=17926 loops=479)
   Index Cond: (o.emailcampaignid = s.emailcampaignid)
 Total runtime: 2592284.336 ms


On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
> > On Postgres, this same query takes about 58 minutes (could not run
> > explain analyze because it is in progress):
> >
> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > pg_dw-# as
> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> > pg_dw-#   from openactivity o,ecr_sents s
> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> > pg_dw-#  group by o.emailcampaignid;
> >  QUERY
> > PLAN
> > -
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4)
> >  ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12)
> >Index Cond: (o.emailcampaignid = s.emailcampaignid)
> > (5 rows)
> >
> 
> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
> explain.depesz.com.
> 
> regards
> Tomas
> 
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Well, this ran much better.  However, I'm not sure if it's because of
set enable_nestloop = 0, or because I'm executing the query twice in a
row, where previous results may be cached.  I will try this setting in
my code for when this process runs later today and see what the result
is.

Thanks!

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;

   QUERY
PLAN   

 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12) (actual
time=167254.751..167254.937 rows=472 loops=1)
   ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12) (actual
time=0.300..164577.131 rows=8586466 loops=1)
 Hash Cond: (o.emailcampaignid = s.emailcampaignid)
 ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12) (actual time=0.011..124351.878 rows=192542480
loops=1)
 ->  Hash  (cost=8.79..8.79 rows=479 width=4) (actual
time=0.253..0.253 rows=479 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 17kB
   ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4) (actual time=0.010..0.121 rows=479 loops=1)
 Total runtime: 167279.950 ms



On Wed, 2011-06-08 at 11:51 -0400, Stephen Frost wrote:
> * Tony Capobianco (tcapobia...@prospectiv.com) wrote:
> >  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
> >->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
> >  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
> >  ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
> > rows=192540967 width=12)
> >  ->  Hash  (cost=8.79..8.79 rows=479 width=4)
> >->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
> > width=4)
> > 
> > Yikes.  Two sequential scans.
> 
> Err, isn't that more-or-less exactly what you want here?  The smaller
> table is going to be hashed and then you'll traverse the bigger table
> and bounce each row off the hash table.  Have you tried actually running
> this and seeing how long it takes?  The bigger table doesn't look to be
> *that* big, if your i/o subsystem is decent and you've got a lot of
> memory available for kernel cacheing, should be quick.
> 
>   Thanks,
> 
>   Stephen



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
pg_dw=# show random_page_cost ;
 random_page_cost 
--
 4
(1 row)

Time: 0.299 ms
pg_dw=# show seq_page_cost ;
 seq_page_cost 
---
 1
(1 row)

Time: 0.250 ms
pg_dw=# show work_mem ;
 work_mem 
--
 768MB
(1 row)




On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote:
> Hello
> 
> what is your settings for
> 
> random_page_cost, seq_page_cost and work_mem?
> 
> Regards
> 
> Pavel Stehule
> 
> 2011/6/8 Tony Capobianco :
> > Here's the explain analyze:
> >
> > pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > as
> > select o.emailcampaignid, count(memberid) opencnt
> >  from openactivity o,ecr_sents s
> >  where s.emailcampaignid = o.emailcampaignid
> >  group by o.emailcampaignid;
> >
> > QUERY
> > PLAN
> > 
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
> > time=308630.967..2592279.526 rows=472 loops=1)
> >   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> > (actual time=31.489..2589363.047 rows=8586466 loops=1)
> > ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
> > loops=1)
> > ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
> > rows=17926 loops=479)
> >   Index Cond: (o.emailcampaignid = s.emailcampaignid)
> >  Total runtime: 2592284.336 ms
> >
> >
> > On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
> >> > On Postgres, this same query takes about 58 minutes (could not run
> >> > explain analyze because it is in progress):
> >> >
> >> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> >> > pg_dw-# as
> >> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> >> > pg_dw-#   from openactivity o,ecr_sents s
> >> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> >> > pg_dw-#  group by o.emailcampaignid;
> >> >  QUERY
> >> > PLAN
> >> > -
> >> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >> >->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >> >  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> >> > (cost=0.00..38.59 rows=479 width=4)
> >> >  ->  Index Scan using openact_emcamp_idx on openactivity o
> >> > (cost=0.00..3395.49 rows=19372 width=12)
> >> >Index Cond: (o.emailcampaignid = s.emailcampaignid)
> >> > (5 rows)
> >> >
> >>
> >> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
> >> explain.depesz.com.
> >>
> >> regards
> >> Tomas
> >>
> >>
> >
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> >
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
My current setting is 22G.  According to some documentation, I want to
set effective_cache_size to my OS disk cache + shared_buffers.  In this
case, I have 4 quad-core processors with 512K cache (8G) and my
shared_buffers is 7680M.  Therefore my effective_cache_size should be
approximately 16G?  Most of our other etl processes are running fine,
however I'm curious if I could see a significant performance boost by
reducing the effective_cache_size.


On Wed, 2011-06-08 at 13:03 -0400, Tom Lane wrote:
> Tony Capobianco  writes:
> > Well, this ran much better.  However, I'm not sure if it's because of
> > set enable_nestloop = 0, or because I'm executing the query twice in a
> > row, where previous results may be cached.  I will try this setting in
> > my code for when this process runs later today and see what the result
> > is.
> 
> If the performance differential holds up, you should look at adjusting
> your cost parameters so that the planner isn't so wrong about which one
> is faster.  Hacking enable_nestloop is a band-aid, not something you
> want to use in production.
> 
> Looking at the values you gave earlier, I wonder whether the
> effective_cache_size setting isn't unreasonably high.  That's reducing
> the estimated cost of accessing the large table via indexscans, and
> I'm thinking it reduced it too much.
> 
>   regards, tom lane
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Oooo...some bad math there.  Thanks.

On Wed, 2011-06-08 at 12:38 -0700, Samuel Gendler wrote:
> 
> 
> On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco
>  wrote:
> My current setting is 22G.  According to some documentation, I
> want to
> set effective_cache_size to my OS disk cache +
> shared_buffers.  In this
> case, I have 4 quad-core processors with 512K cache (8G) and
> my
> shared_buffers is 7680M.  Therefore my effective_cache_size
> should be
> approximately 16G?  Most of our other etl processes are
> running fine,
> however I'm curious if I could see a significant performance
> boost by
> reducing the effective_cache_size.
> 
> 
> 
> 
> 
> disk cache, not CPU memory cache.  It will be some significant
> fraction of total RAM on the host.  Incidentally, 16 * 512K cache =
> 8MB, not 8GB.
> 
> 
> http://en.wikipedia.org/wiki/CPU_cache
> 
> 
> 
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Cursor fetch performance issue

2012-01-24 Thread Tony Capobianco
We are migrating our Oracle warehouse to Postgres 9.

This function responds well:

pg=# select public.getMemberAdminPrevious_sp2(247815829, 
1,'test.em...@hotmail.com', 'email', 'test');
 getmemberadminprevious_sp2 

 
(1 row)

Time: 7.549 ms

However, when testing, this fetch takes upwards of 38 minutes:

BEGIN;
select public.getMemberAdminPrevious_sp2(247815829, 1,'test.em...@hotmail.com', 
'email', 'test');
FETCH ALL IN "";

How can I diagnose any performance issues with the fetch in the cursor?

Thanks.
Tony


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Tony Capobianco
Running just the sql of the function returns only 10 rows:

pg=# SELECT m.memberid, m.websiteid, m.emailaddress, 
pg-#m.firstname, m.lastname, m.regcomplete, m.emailok
pg-# FROM   members m
pg-# WHERE  m.emailaddress LIKE 'test.em...@hotmail.com'
pg-# ANDm.changedate_id < 5868 ORDER BY m.emailaddress, m.websiteid;
 memberid  | websiteid |emailaddress| firstname | lastname | 
regcomplete | emailok 
---+---++---+--+-+-
 247815829 | 1 | test.em...@hotmail.com | email | test |
   1 |   1
 300960335 |62 | test.em...@hotmail.com |   |  |
   1 |   1
 300959937 |   625 | test.em...@hotmail.com |   |  |
   1 |   1
 260152830 |  1453 | test.em...@hotmail.com |   |  |
   1 |   1
 300960163 |  1737 | test.em...@hotmail.com | email | test |
   1 |   1
 300960259 |  1824 | test.em...@hotmail.com | email | test |
   1 |   1
 300959742 |  1928 | test.em...@hotmail.com | email | test |
   1 |   1
 368122699 |  2457 | test.em...@hotmail.com | email | test |
   1 |   1
 403218613 |  2464 | test.em...@hotmail.com | email | test |
   1 |   0
 378951994 |  2656 | test.em...@hotmail.com |   |  |
   1 |   1
(10 rows)

Time: 132.626 ms

So, it would seem that's a small enough number of rows.  Unfortunately, 
issuing: 

set cursor_tuple_fraction to 1.0;

Did not have an effect on performance.  Is it common to modify this
cursor_tuple_fraction parameter each time we execute the function?


On Tue, 2012-01-24 at 21:47 +0100, Pavel Stehule wrote:
> Hello
> 
> 2012/1/24 Tony Capobianco :
> > We are migrating our Oracle warehouse to Postgres 9.
> >
> > This function responds well:
> >
> > pg=# select public.getMemberAdminPrevious_sp2(247815829, 
> > 1,'test.em...@hotmail.com', 'email', 'test');
> >  getmemberadminprevious_sp2
> > 
> >  
> > (1 row)
> >
> > Time: 7.549 ms
> >
> > However, when testing, this fetch takes upwards of 38 minutes:
> >
> > BEGIN;
> > select public.getMemberAdminPrevious_sp2(247815829, 
> > 1,'test.em...@hotmail.com', 'email', 'test');
> > FETCH ALL IN "";
> >
> > How can I diagnose any performance issues with the fetch in the cursor?
> >
> 
> Cursors are optimized to returns small subset of result - if you plan
> to read complete result, then set
> 
> set cursor_tuple_fraction to 1.0;
> 
> this is session config value, you can set it before selected cursors queries
> 
> Regards
> 
> Pavel Stehule
> 
> > Thanks.
> > Tony
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Tony Capobianco
Here's the explain:

pg=# explain select getMemberAdminPrevious_sp(247815829, 
1,'test.em...@hotmail.com', 'Email', 'Test');
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=0)
(1 row)

Time: 1.167 ms

There was discussion of 'LIKE' v. '=' and wildcard characters are not
being entered into the $1 parameter.

This is not generating a sql string.  I feel it's something to do with
the fetch of the refcursor.  The cursor is a larger part of a function:

CREATE OR REPLACE FUNCTION PUBLIC.GETMEMBERADMINPREVIOUS_SP2 ( 
  p_memberid   IN numeric,
  p_websiteid  IN numeric,
  p_emailaddress   IN varchar,
  p_firstname  IN varchar,
  p_lastname   IN varchar)
RETURNS refcursor AS $$
DECLARE
  refrefcursor;
  l_sysdateidnumeric;
BEGIN
  l_sysdateid := sysdateid();
  if (p_memberid != 0) then
if (p_emailaddress IS NOT NULL) then
  OPEN ref FOR
SELECT m.memberid, m.websiteid, m.emailaddress,
   m.firstname, m.lastname, m.regcomplete, m.emailok
FROM   members m
WHERE  m.emailaddress LIKE p_emailaddress
ANDm.changedate_id < l_sysdateid ORDER BY m.emailaddress, 
m.websiteid;
end if;
  end if;
  Return ref;
EXCEPTION
WHEN NO_DATA_FOUND THEN
   Return null;
END;
$$ LANGUAGE 'plpgsql';


On Tue, 2012-01-24 at 22:17 +0100, Pavel Stehule wrote:
> Hello
> 
> >
> > So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards?
> >
> > SELECT m.memberid, m.websiteid, m.emailaddress,
> >   m.firstname, m.lastname, m.regcomplete, m.emailok
> >   FROM   members m
> >   WHERE  m.emailaddress LIKE $1
> >   ANDm.changedate_id < $2
> >  ORDER BY m.emailaddress, m.websiteid;
> >
> > Or is it creating the string and executing it:
> >
> > sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, '
> >||  ' m.firstname, m.lastname, m.regcomplete, m.emailok '
> >||  ' FROM   members m
> >||  ' WHERE  m.emailaddress LIKE ' || arg1
> >||  ' ANDm.changedate_id < ' || arg2
> >||  ' ORDER BY m.emailaddress, m.websiteid ';
> > execute(sql);
> >
> > Maybe its the planner doesnt plan so well with $1 arguments vs actual
> > arguments thing.
> >
> 
> sure, it could be blind optimization problem in plpgsql. Maybe you
> have to use a dynamic SQL - OPEN FOR EXECUTE stmt probably
> 
> http://www.postgresql.org/docs/9.1/interactive/plpgsql-cursors.html
> 
> Regards
> 
> Pavel Stehule
> 
> > -Andy
> >
> >
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] CPU spikes and transactions

2013-10-14 Thread Tony Kay
Hi,

I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a 16
Opteron 6276 CPU box. We limit connections to roughly 120, but our webapp
is configured to allocate a thread-local connection, so those connections
are rarely doing anything more than half the time.

We have been running smoothly for over a year on this configuration, and
recently started having huge CPU spikes that bring the system to its knees.
Given that it is a multiuser system, it has been quite hard to pinpoint the
exact cause, but I think we've narrowed it down to two data import jobs
that were running in semi-long transactions (clusters of row inserts).

The tables affected by these inserts are used in common queries.

The imports will bring in a row count of perhaps 10k on average covering 4
tables.

The insert transactions are at isolation level read committed (the default
for the JDBC driver).

When the import would run (again, theory...we have not been able to
reproduce), we would end up maxed out on CPU, with a load average of 50 for
16 CPUs (our normal busy usage is a load average of 5 out of 16 CPUs).

When looking at the active queries, most of them are against the tables
that are affected by these imports.

Our workaround (that is holding at present) was to drop the transactions on
those imports (which is not optimal, but fortunately is acceptable for this
particular data). This workaround has prevented any further incidents, but
is of course inconclusive.

Does this sound familiar to anyone, and if so, please advise.

Thanks in advance,

Tony Kay


Re: [PERFORM] CPU spikes and transactions

2013-10-14 Thread Tony Kay
Hi Calvin,

Yes, I have sar data on all systems going back for years.

Since others are going to probably want to be assured I am really "reading
the data" right:

- This is 92% user CPU time, 5% sys, and 1% soft
- On some of the problems, I _do_ see a short spike of pgswpout's (memory
pressure), but again, not enough to end up using much system time
- The database disks are idle (all data being used is in RAM)..and are
SSDsaverage service times are barely measurable in ms.

If I had to guess, I'd say it was spinlock misbehaviorI cannot
understand why ekse a transaction blocking other things would drive the
CPUs so hard into the ground with user time.

Tony

Tony Kay

TeamUnify, LLC
TU Corporate Website <http://www.teamunify.com/>
TU Facebook <http://www.facebook.com/teamunify> | Free OnDeck Mobile
Apps<http://www.teamunify.com/__corp__/ondeck/>



On Mon, Oct 14, 2013 at 4:05 PM, Calvin Dodge  wrote:

> Have you tried running "vmstat 1" during these times? If so, what is
> the percentage of WAIT time?  Given that IIRC shared buffers should be
> no more than 25% of installed memory, I wonder if too little is
> available for system caching of disk reads.  A high WAIT percentage
> would indicate excessive I/O (especially random seeks).
>
> Calvin Dodge
>
> On Mon, Oct 14, 2013 at 6:00 PM, Tony Kay  wrote:
> > Hi,
> >
> > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a 16
> > Opteron 6276 CPU box. We limit connections to roughly 120, but our
> webapp is
> > configured to allocate a thread-local connection, so those connections
> are
> > rarely doing anything more than half the time.
> >
> > We have been running smoothly for over a year on this configuration, and
> > recently started having huge CPU spikes that bring the system to its
> knees.
> > Given that it is a multiuser system, it has been quite hard to pinpoint
> the
> > exact cause, but I think we've narrowed it down to two data import jobs
> that
> > were running in semi-long transactions (clusters of row inserts).
> >
> > The tables affected by these inserts are used in common queries.
> >
> > The imports will bring in a row count of perhaps 10k on average covering
> 4
> > tables.
> >
> > The insert transactions are at isolation level read committed (the
> default
> > for the JDBC driver).
> >
> > When the import would run (again, theory...we have not been able to
> > reproduce), we would end up maxed out on CPU, with a load average of 50
> for
> > 16 CPUs (our normal busy usage is a load average of 5 out of 16 CPUs).
> >
> > When looking at the active queries, most of them are against the tables
> that
> > are affected by these imports.
> >
> > Our workaround (that is holding at present) was to drop the transactions
> on
> > those imports (which is not optimal, but fortunately is acceptable for
> this
> > particular data). This workaround has prevented any further incidents,
> but
> > is of course inconclusive.
> >
> > Does this sound familiar to anyone, and if so, please advise.
> >
> > Thanks in advance,
> >
> > Tony Kay
> >
>


Re: [PERFORM] CPU spikes and transactions

2013-10-15 Thread Tony Kay
On Mon, Oct 14, 2013 at 4:42 PM, Tomas Vondra  wrote:

> On 15.10.2013 01:00, Tony Kay wrote:
> > Hi,
> >
> > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a
> > 16 Opteron 6276 CPU box. We limit connections to roughly 120, but
> > our webapp is configured to allocate a thread-local connection, so
> > those connections are rarely doing anything more than half the time.
>
> Lower your shared buffers to about 20% of your RAM, unless you've tested
> it's actually helping in your particular case. It's unlikely you'll get
> better performance by using more than that, especially on older
> versions, so it's wiser to leave the rest for page cache.
>
> It might even be one of the causes of the performance issue you're
> seeing, as shared buffers are not exactly overhead-free.
>
> See this for more details on tuning:
>
>http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server


I had followed the general directions from several sources years ago, which
indicate up to 40% of RAM. We've been running very large shared buffers for
4 years now, but it is difficult to generate a good real load without
testing against users, so we have not felt the need to move it around. In
general, I don't tend to tinker with a setting that has been fine for this
long without good reason. I've been wanting to upgrade to the newer
mmap-based versions of pgsql, but was waiting to re-tune this when I did so.

Why do you suspect that shared_buffers would cause the behavior I'm seeing?


>
>
> You're on a rather old 9.1.x version, BTW. The last version in this
> branch is 9.1.10 and there are some important security fixes (e.g. in
> 9.1.9). Not sure if there are any fixes relevant to the performance
> issue, though.
>
> An upgrade to 9.1.10 is planned.


> A few initial questions:
>
> * What OS are we dealing with?
>

CentOS el6


>
> * So how many active connections are there on average (see
>   pg_stat_activity for connections running queries)?
>

about 40-60


>
> * How much data are we talking about? In total and in the imports?
>

80GB database. The imports are maybe 1-3 MB...often much smaller. 10k rows
would be a probably average.


>
> > We have been running smoothly for over a year on this configuration,
> > and recently started having huge CPU spikes that bring the system to
> > its knees. Given that it is a multiuser system, it has been quite
> > hard to pinpoint the exact cause, but I think we've narrowed it down
> > to two data import jobs that were running in semi-long transactions
> > (clusters of row inserts).
> >
> > The tables affected by these inserts are used in common queries.
> >
> > The imports will bring in a row count of perhaps 10k on average
> > covering 4 tables.
> >
> > The insert transactions are at isolation level read committed (the
> > default for the JDBC driver).
> >
> > When the import would run (again, theory...we have not been able to
> > reproduce), we would end up maxed out on CPU, with a load average of
> > 50 for 16 CPUs (our normal busy usage is a load average of 5 out of
> > 16 CPUs).
> >
> > When looking at the active queries, most of them are against the
> > tables that are affected by these imports.
>
> Which processes consume most CPU time? Are those backends executing the
> queries, or some background processes (checkpointer, autovacuum, ...)?
>
>
The backends executing the queries...most of the queries that seem hung
usually run in a few ms.


> Can you post a "top -c" output collected at the time of the CPU peak?
>
>
Don't have process accounting, so I cannot regenerate that; however, I can
tell you what queries were active at one of them.

There were 36 of the queries agains table ind_event (which is one affected
by the import). Those queries usually take 5-10ms, and we never see more
than 2 active during normal operation. These had been active for
_minutes_a sample of the running queries:

time_active   |   datname   | procpid |   query

-+-+-+---
 00:08:10.891105 | tudb|9058 | select * from
mr_uss_ind_event_x where (tu
 00:08:10.981845 | tudb|8977 | select * from
mr_uss_ind_event_x where (tu
 00:07:08.883347 | tudb|8930 | select * from
mr_uss_ind_event_x where org
 00:07:15.266393 | tudb|8927 | select * from
mr_uss_ind_event_x where org
 00:07:27.587133 | tudb|   11867 | update msg_result set
dt_result=$1,msg_id=
 00:08:06.458885 | tudb|8912 | select * from
mr_uss_ind_event_x where 

Re: [PERFORM] CPU spikes and transactions

2013-10-15 Thread Tony Kay
Thanks for the tip. I forgot there were kernel stats on spinlocks.

I'm not sure we'll be able to get it to tip in a test environment, and
we're unwilling to revert the code in production in order to have our users
trigger it. We'll try triggering it on our test server, and if we manage,
I'll get you the stats.

Thanks!

Tony


Tony Kay

TeamUnify, LLC
TU Corporate Website <http://www.teamunify.com/>
TU Facebook <http://www.facebook.com/teamunify> | Free OnDeck Mobile
Apps<http://www.teamunify.com/__corp__/ondeck/>



On Tue, Oct 15, 2013 at 6:00 AM, Merlin Moncure  wrote:

> On Mon, Oct 14, 2013 at 6:45 PM, Tomas Vondra  wrote:
> > On 15.10.2013 01:26, Tony Kay wrote:
> >> Hi Calvin,
> >>
> >> Yes, I have sar data on all systems going back for years.
> >>
> >> Since others are going to probably want to be assured I am really
> >> "reading the data" right:
> >>
> >> - This is 92% user CPU time, 5% sys, and 1% soft
> >> - On some of the problems, I _do_ see a short spike of pgswpout's
> >> (memory pressure), but again, not enough to end up using much system
> time
> >> - The database disks are idle (all data being used is in RAM)..and are
> >> SSDsaverage service times are barely measurable in ms.
> >
> > OK. Can you share the data? Maybe we'll notice something suspicious.
> >
> >> If I had to guess, I'd say it was spinlock misbehaviorI cannot
> >> understand why ekse a transaction blocking other things would drive
> >> the CPUs so hard into the ground with user time.
> >
> > Have you tried running perf, to verify the time is actually spent on
> > spinlocks?
>
> +1 this.  It is almost certainly spinlocks, but we need to know which
> one and why.  plz install debug symbols and run a perf during normal
> and high load conditions.
>
> merlin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] CPU spikes and transactions

2013-10-15 Thread Tony Kay
On Tue, Oct 15, 2013 at 10:26 AM, Julien Cigar  wrote:

>
> for two reasons:
>
> - some of the overhead of bgwriter and checkpoints is more or less linear
> in the size of shared_buffers, for example it could be possible that a
> large quantity of data could be dirty when a checkpoint occurs).
>
> - the OS cache is also being used for reads and writes, the larger
>   shared_buffers is, the more you risk double buffering (same blocks
>   in the OS cache and in the database buffer cache).
>

Excellent. Thank you for the information. My suspicion has always been that
the shared_buffers are "level 1 cache", so it seems to me that you'd want
that to be large enough to hold your entire database if you could. However,
I'm now realizing that I was _also_ assuming the IPC shared memory was also
being locked via mlock to prevent swapping it out, and I'm now getting the
feeling that this isn't true, which means the double buffering could lead
to swap space use on buffer cache pressure...which I do occasionally see in
ways I had not expected.

We do streaming replication and also store them for snapshot PITR backups,
so I am intimately familiar with our write load, and I can say it is pretty
low (we ship a 16MB WAL file about every 10-15 minutes during our busiest
times).

That said, I can see how an import that is doing a bunch of writes could
possibly spread those over a large area that would then consume a lot of
CPU on the writer and checkpoint; however, I do not see how either of those
would cause 40-60 different postgres backgroud processes (all running a
normally "light query") to spin off into oblivion unless the write work
load is somehow threaded into the background workers (which I'm sure it
isn't). So, I think we're still dealing with a spinlock issue.

We're going to upgrade to 9.1.10 (with debug symbols) Thursday night and
add another 64GB of RAM. I'll tune shared_buffers down to 2GB at that time
and bump effective_cache_size up at the same time. My large history of sar
data will make it apparent pretty quickly if that is a win/lose/tie.

If we have another spike in production, we'll be ready to measure it more
accurately.

Thanks,

Tony