Re: [PERFORM] oracle to psql migration - slow query in postgres
On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala wrote: > If working with partitioning, be very aware that PostgreSQL optimizer has > certain problems with partitions, especially with group functions. If you > want speed, everything must be prefixed with partitioning column: indexes, > expressions, joins. There is no explicit star schema and creating hash > indexes will not buy you much, as a matter of fact, Postgres community is > extremely suspicious of the hash indexes and I don't see them widely used. > Having said that, I was able to solve the problems with my speed and > partitioning. > > Could you elaborate on this, please? What do you mean by 'everythin must be prefixed with partitioning column?' --sam
Re: [PERFORM] Slow count(*) again...
On Wed, 2010-10-13 at 09:02 -0400, Greg Smith wrote: > XFS support is available as an optional module starting in RHEL 5.5. > In CentOS, you just grab it, so that's what I've been doing. My > understanding is that you may have to ask your sales rep to enable > access to it under the official RedHat Network channels if you're > using a subscription from them. I'm not sure exactly what the support > situation is with it, but it's definitely available as an RPM from > RedHat. Right. It is called "Red Hat Scalable File System", and once paid, it is available via RHN. -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [PERFORM] oracle to psql migration - slow query in postgres
Samuel Gendler wrote: On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala mailto:mladen.gog...@vmsinfo.com>> wrote: If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used. Having said that, I was able to solve the problems with my speed and partitioning. Could you elaborate on this, please? What do you mean by 'everythin must be prefixed with partitioning column?' --sam If you have partitioned table part_tab, partitioned on the column item_date and if there is a global primary key in Oracle, let's call it item_id, then queries like "select * from part_tab where item_id=12345" will perform worse than queries with item_date" select * from part_tab where item_id=12345 and item_date='2010-10-15' This also applies to inserts and updates. Strictly speaking, the item_date column in the query above is not necessary, after all, the item_id column is the primary key. However, with range scans you will get much better results if you include the item_date column than if you use combination of columns without. The term "prefixed indexes" is borrowed from Oracle RDBMS and means that the beginning column in the index is the column on which the table is partitioned. Oracle, as opposed to Postgres, has global indexes, the indexes that span all partitions. PostgreSQL only maintains indexes on each of the partitions separately. Oracle calls such indexes "local indexes" and defines them on the partitioned table level. Here is a brief and rather succinct explanation of the terminology: http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php Of, course, there are other differences between Oracle partitioning and PostgreSQL partitioning. The main difference is $1/CPU. I am talking from experience: news=> \d moreover_documents Table "moreover.moreover_documents" Column|Type | Modifiers --+-+--- document_id | bigint | not null dre_reference| bigint | not null headline | character varying(4000) | author | character varying(200) | url | character varying(1000) | rank | bigint | content | text| stories_like_this| character varying(1000) | internet_web_site_id | bigint | not null harvest_time | timestamp without time zone | valid_time | timestamp without time zone | keyword | character varying(200) | article_id | bigint | not null media_type | character varying(20) | source_type | character varying(20) | created_at | timestamp without time zone | autonomy_fed_at | timestamp without time zone | language | character varying(150) | Indexes: "moreover_documents_pkey" PRIMARY KEY, btree (document_id) Triggers: insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH ROW EXE CUTE PROCEDURE moreover_insert_trgfn() Number of child tables: 8 (Use \d+ to list them.) The child tables are, of course, partitions. Here is the original: Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options SQL> desc moreover_documents Name Null?Type - DOCUMENT# NOT NULL NUMBER DRE_REFERENCE NOT NULL NUMBER HEADLINEVARCHAR2(4000) AUTHOR VARCHAR2(200) URLVARCHAR2(1000) RANKNUMBER CONTENTCLOB STORIES_LIKE_THISVARCHAR2(1000) INTERNET_WEB_SITE# NOT NULL NUMBER HARVEST_TIMEDATE VALID_TIMEDATE KEYWORDVARCHAR2(200) ARTICLE_ID NOT NULL NUMBER MEDIA_TYPEVARCHAR2(20) CREATED_ATDATE SOURCE_TYPEVARCHAR2(50) PUBLISH_DATEDATE AUTONOMY_FED_ATDATE LANGUAGEVARCHAR2(150) SQL> I must say that it took me some time to get things right. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New
Re: [PERFORM] oracle to psql migration - slow query in postgres
On Thu, Oct 14, 2010 at 3:43 PM, Tony Capobianco wrote: > explain analyze create table tmp_srcmem_emws1 > as > select emailaddress, websiteid > from members > where emailok = 1 > and emailbounced = 0; *) as others have noted, none of your indexes will back this expression. For an index to match properly the index must have all the fields matched in the 'where' clause in left to right order. you could rearrange indexes you already have and probably get things to work properly. *) If you want things to go really fast, and the combination of emailok, emailbounced is a small percentage (say, less than 5) in the table, and you are not interested in the schema level changes your table is screaming, and the (1,0) combination is what you want to frequently match and you should consider: create function email_interesting(ok numeric, bounced numeric) returns bool as $$ select $1 = 1 and $2 = 0; $$ language sql immutable; create function members_email_interesting_idx on members(email_interesting(emailok, emailbounced)) where email_interesting(); This will build a partial index which you can query via: select emailaddress, websiteid from members where email_interesting(emailok, emailbounced); merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
The recommendations on the numeric columns are fantastic. Thank you very much. We will revisit our methods of assigning datatypes when we migrate our data over from Oracle. Regarding the full table scans; it appears inevitable that full table scans are necessary for the volume of data involved and the present design of our indexes. Over time, indexes were added/removed to satisfy particular functionality. Considering this is our most important table, I will research exactly how this table is queried to better optimize/reorganize our indexes. Thanks for your help. Tony On Thu, 2010-10-14 at 23:59 -0400, Mladen Gogala wrote: > On 10/14/2010 4:10 PM, Jon Nelson wrote: > > The first thing I'd do is think real hard about whether you really > > really want 'numeric' instead of boolean, smallint, or integer. The > > second thing is that none of your indices (which specify a whole bunch > > of fields, by the way) have only just emailok, emailbounced, or only > > the pair of them. Without knowing the needs of your app, I would > > reconsider your index choices and go with fewer columns per index. > > > Also, make sure that the statistics is good, that histograms are large > enough and that Geico (the genetic query optimizer) will really work > hard to save you 15% or more on the query execution time. You can also > make sure that any index existing index is used, by disabling the > sequential scan and then activating and de-activating indexes with the > dummy expressions, just as it was done with Oracle's rule based optimizer. > I agree that a good data model is even more crucial for Postgres than is > the case with Oracle. Oracle, because of its rich assortment of tweaking > & hacking tools and parameters, can be made to perform, even if the > model is designed by someone who didn't apply the rules of good design. > Postgres is much more susceptible to bad models and it is much harder to > work around a badly designed model in Postgres than in Oracle. What > people do not understand is that every application in the world will > become badly designed after years of maintenance, adding columns, > creating additional indexes, views, tables and triggers and than > deploying various tools to design applications. As noted by Murphy, > things develop from bad to worse. Keep Postgres models simple and > separated, because it's much easier to keep clearly defined models > simple and effective than to keep models with 700 tables and 350 views, > frequently with conflicting names, different columns named the same and > same columns named differently. And monitor, monitor, monitor. Use > strace, ltrace, pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, > iostat and all tools you can get hold of. Without the event interface, > it's frequently a guessing game. It is, however, possible to manage > things. If working with partitioning, be very aware that PostgreSQL > optimizer has certain problems with partitions, especially with group > functions. If you want speed, everything must be prefixed with > partitioning column: indexes, expressions, joins. There is no explicit > star schema and creating hash indexes will not buy you much, as a matter > of fact, Postgres community is extremely suspicious of the hash indexes > and I don't see them widely used. > Having said that, I was able to solve the problems with my speed and > partitioning. > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
>> This table has approximately 300million rows. > > and your query grab rows=236 660 930 of them. An index might be > useless in this situation. I want to point out that this is probably the most important comment here. A couple of people have noted out that the index won't work for this query, but more importantly, an index is (probably) not desirable for this query. As an analogy (since everyone loves half-baked programming analogies), if you want to find a couple of bakeries to sponsor your MySQL Data Integrity Issues Awareness Walk by donating scones, you use the yellow pages. If you want to hit up every business in the area to donate whatever they can, you're better off canvasing the neighborhood. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Stored procedure declared as VOLATILE => no good optimization is done
Hello, I have heard it said that if a stored procedure is declared as VOLATILE, then no good optimizations can be done on queries within the stored procedure or queries that use the stored procedure (say as the column in a view). I have seen this in practice, recommended on the irc channel, and in the archives ( http://archives.postgresql.org/pgsql-performance/2008-01/msg00283.php). Can someone help me understand or point me to some documentation explaining why this is so? Any insights would be appreciated. I'm new to pgsql and would like to know a little more about what is going on under the hood. Thanks, Damon
Re: [PERFORM] Slow count(*) again...
Jon Nelson wrote: Well, I didn't quite mean that - having no familiarity with Oracle I don't know what the alter system statement does, but I was talking specifically about the linux buffer and page cache. Those are not utilized by Oracle. This is a RAC instance, running on top of ASM, which is an Oracle volume manager, using raw devices. There is no file system on those disks: SQL> select file_name from dba_data_files 2 where tablespace_name='ADBASE_DATA'; FILE_NAME +DGDATA/stag3/datafile/adbase_data.262.727278257 +DGDATA/stag3/datafile/adbase_data.263.727278741 +DGDATA/stag3/datafile/adbase_data.264.727280145 +DGDATA/stag3/datafile/adbase_data.265.727280683 [ora...@lpo-oracle-30 ~]$ $ORA_CRS_HOME/bin/crs_stat -l NAME=ora.STAG3.STAG31.inst TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-30 NAME=ora.STAG3.STAG32.inst TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-31 NAME=ora.STAG3.db TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-31 NAME=ora.lpo-oracle-30.ASM1.asm TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-30 NAME=ora.lpo-oracle-30.LISTENER_LPO-ORACLE-30.lsnr TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-30 NAME=ora.lpo-oracle-30.gsd TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-30 NAME=ora.lpo-oracle-30.ons TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-30 NAME=ora.lpo-oracle-30.vip TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-30 NAME=ora.lpo-oracle-31.ASM2.asm TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-31 NAME=ora.lpo-oracle-31.LISTENER_LPO-ORACLE-31.lsnr TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-31 NAME=ora.lpo-oracle-31.gsd TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-31 NAME=ora.lpo-oracle-31.ons TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-31 NAME=ora.lpo-oracle-31.vip TYPE=application TARGET=ONLINE STATE=ONLINE on lpo-oracle-31 The only way to flush cache is the aforementioned "alter system" command. AFAIK, Postgres doesn't have anything like that. Oracle uses raw devices precisely to avoid double buffering. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] help with understanding EXPLAIN and boosting performance
Hello I have an application hosted on Heroku. They use postgres. It's more or less abstracted away, but I can get some performance data from New Relic. For the most part, performance is ok, but now and then some queries take a few seconds, and spike up to 15 or even 16 seconds! Ouch! This is the most detailed information I could get form New Relic. Do you have any suggestions how I could improve the performance? QUERY PLANLimit (cost=26.49..1893.46 rows=1 width=4) QUERY PLAN -> Unique (cost=26.49..44833.82 rows=24 width=4) QUERY PLAN -> Nested Loop (cost=26.49..44833.81 rows=24 width=4) QUERY PLAN -> Merge Join (cost=26.49..44532.99 rows=4773 width=8) QUERY PLAN Merge Cond: (songs.id = artists_songs.song_id) QUERY PLAN -> Index Scan using songs_pkey on songs (cost=0.00..25219.30 rows=4773 width=4) QUERY PLAN Filter: (lower((name)::text) = 'thirteen'::text) QUERY PLAN -> Index Scan using index_artists_songs_on_song_id on artists_songs (cost=0.00..18822.04 rows=960465 width=8) QUERY PLAN -> Index Scan using artists_pkey on artists (cost=0.00..0.06 rows=1 width=4) QUERY PLAN Index Cond: (artists.id = artists_songs.artist_id) QUERY PLAN Filter: (lower((artists.name)::text) = 'red mountain church'::text) Thanks! -- = Brandon Casci Loudcaster http://loudcaster.com =
[PERFORM] Index scan / Index cond limitation or ?
Hello people, I'm having trouble to persuade index scan to check all of the conditions I specify _inside_ index cond. That is, _some_ condition always get pushed out of index cond and applied later (which will often result, for my real table contents, in too many unwanted rows initially hit by index scan and hence randomly slow queries) An index with all relevant columns does exist of course. Here goes an example. create table foo ( id serial primary key, rec_time timestamp with time zone DEFAULT now(), some_value integer, some_data text ); CREATE INDEX foo_test ON foo (id, rec_time, some_value); set enable_seqscan = false; set enable_bitmapscan = true; explain select id from foo where true and rec_time > '2010-01-01 22:00:06' --and rec_time < '2010-10-14 23:59' and some_value in (1, 2) and id > 123 This one works perfectly as I want it (and note "and rec_time < ... " condition is commented out): Bitmap Heap Scan on foo (cost=13.18..17.19 rows=1 width=4) Recheck Cond: ((id > 123) AND (rec_time > '2010-01-01 22:00:06+03'::timestamp with time zone) AND (some_value = ANY ('{1,2}'::integer[]))) -> Bitmap Index Scan on foo_test (cost=0.00..13.18 rows=1 width=0) Index Cond: ((id > 123) AND (rec_time > '2010-01-01 22:00:06+03'::timestamp with time zone) AND (some_value = ANY ('{1,2}'::integer[])))" Now, as soon as I enable "and rec_time < ... " condition, I get the following: explain select id from foo where true and rec_time > '2010-01-01 22:00:06' and rec_time < '2010-10-14 23:59' and some_value in (1, 2) and id > 123 Bitmap Heap Scan on foo (cost=8.59..13.94 rows=1 width=4) Recheck Cond: ((id > 123) AND (rec_time > '2010-01-01 22:00:06+03'::timestamp with time zone) AND (rec_time < '2010-10-14 23:59:00+04'::timestamp with time zone)) Filter: (some_value = ANY ('{1,2}'::integer[])) -> Bitmap Index Scan on foo_test (cost=0.00..8.59 rows=2 width=0) Index Cond: ((id > 123) AND (rec_time > '2010-01-01 22:00:06+03'::timestamp with time zone) AND (rec_time < '2010-10-14 23:59:00+04'::timestamp with time zone)) So, "in (1, 2)" condition is not in Index Cond anymore! Why is that? How can I push it back? SELECT version(); PostgreSQL 8.3.1, compiled by Visual C++ build 1400 but the behaviour seems exactly the same in 9.0 (just checked it briefly). Thank you! Please CC me, I'm not on the list. Nikolai -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Auto ANALYZE criteria
Thanks for fixing the docs, but if that's the case, I shouldn't be seeing the behavior that I'm seeing. Should I flesh out this test case a little better and file a bug? Thanks, Joe On Tue, Sep 21, 2010 at 4:44 PM, Tom Lane wrote: > Joe Miller writes: >> I was looking at the autovacuum documentation: >> http://www.postgresql.org/docs/9.0/interactive/routine-vacuuming.html#AUTOVACUUM > >> For analyze, a similar condition is used: the threshold, defined as: >> analyze threshold = analyze base threshold + analyze scale factor * >> number of tuples >> is compared to the total number of tuples inserted or updated since >> the last ANALYZE. > >> I guess that should be updated to read "insert, updated or deleted". > > Mph. We caught the other places where the docs explain what the analyze > threshold is, but missed that one. Fixed, thanks for pointing it out. > > regards, tom lane > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] help with understanding EXPLAIN
Hello I have an application hosted on Heroku. They use postgres. It's more or less abstracted away, but I can get some performance data from New Relic. For the most part, performance is ok, but now and then some queries take a few seconds, and spike up to 15 or even 16 seconds! Ouch! This is the most detailed information I could get form New Relic. Do you have any suggestions how I could improve the performance? QUERY PLANLimit (cost=26.49..1893.46 rows=1 width=4) QUERY PLAN -> Unique (cost=26.49..44833.82 rows=24 width=4) QUERY PLAN -> Nested Loop (cost=26.49..44833.81 rows=24 width=4) QUERY PLAN -> Merge Join (cost=26.49..44532.99 rows=4773 width=8) QUERY PLAN Merge Cond: (songs.id = artists_songs.song_id) QUERY PLAN -> Index Scan using songs_pkey on songs (cost=0.00..25219.30 rows=4773 width=4) QUERY PLAN Filter: (lower((name)::text) = 'thirteen'::text) QUERY PLAN -> Index Scan using index_artists_songs_on_song_id on artists_songs (cost=0.00..18822.04 rows=960465 width=8) QUERY PLAN -> Index Scan using artists_pkey on artists (cost=0.00..0.06 rows=1 width=4) QUERY PLAN Index Cond: (artists.id = artists_songs.artist_id) QUERY PLAN Filter: (lower((artists.name)::text) = 'red mountain church'::text) -- = Brandon Casci Loudcaster http://loudcaster.com =
Re: [PERFORM] Slow count(*) again...
Neil Whelchel wrote: That is why I suggested an estimate(*) that works like (a faster) count(*) except that it may be off a bit. I think that is what he was talking about when he wrote this. The main problem with "select count(*)" is that it gets seriously mis-used. Using "select count(*)" to establish existence is bad for performance and for code readability. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
> -Original Message- > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] > Sent: Thursday, October 14, 2010 3:43 PM > To: pgsql-performance@postgresql.org > Subject: oracle to psql migration - slow query in postgres > > We are in the process of testing migration of our oracle data > warehouse over to postgres. A potential showstopper are full > table scans on our members table. We can't function on > postgres effectively unless index scans are employed. I'm > thinking I don't have something set correctly in my > postgresql.conf file, but I'm not sure what. > > This table has approximately 300million rows. > > Version: > SELECT version(); > > version > -- > > PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC > gcc (GCC) > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit > > We have 4 quad-core processors and 32GB of RAM. The below > query uses the members_sorted_idx_001 index in oracle, but in > postgres, the optimizer chooses a sequential scan. > > explain analyze create table tmp_srcmem_emws1 as select > emailaddress, websiteid > from members > where emailok = 1 >and emailbounced = 0; > QUERY > PLAN > -- > > Seq Scan on members (cost=0.00..14137154.64 rows=238177981 > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) >Filter: ((emailok = 1::numeric) AND (emailbounced = > 0::numeric)) Total runtime: 850306.220 ms > (3 rows) > > show shared_buffers ; > shared_buffers > > 7680MB > (1 row) > > show effective_cache_size ; > effective_cache_size > -- > 22GB > (1 row) > > show work_mem ; > work_mem > -- > 768MB > (1 row) > > show enable_seqscan ; > enable_seqscan > > on > (1 row) > > Below are the data definitions for the table/indexes in question: > > \d members > Table "members" >Column|Type | Modifiers > -+-+--- > memberid| numeric | not null > firstname | character varying(50) | > lastname| character varying(50) | > emailaddress| character varying(50) | > password| character varying(50) | > address1| character varying(50) | > address2| character varying(50) | > city| character varying(50) | > statecode | character varying(50) | > zipcode | character varying(50) | > birthdate | date| > emailok | numeric(2,0)| > gender | character varying(1)| > addeddate | timestamp without time zone | > emailbounced| numeric(2,0)| > changedate | timestamp without time zone | > optoutsource| character varying(100) | > websiteid | numeric | > promotionid | numeric | > sourceid| numeric | > siteid | character varying(64) | > srcwebsiteid| numeric | > homephone | character varying(20) | > homeareacode| character varying(10) | > campaignid | numeric | > srcmemberid | numeric | > optoutdate | date| > regcomplete | numeric(1,0)| > regcompletesourceid | numeric | > ipaddress | character varying(25) | > pageid | numeric | > streetaddressstatus | numeric(1,0)| > middlename | character varying(50) | > optinprechecked | numeric(1,0)| > optinposition | numeric | > homephonestatus | numeric | > addeddate_id| numeric | > changedate_id | numeric | > rpmindex| numeric | > optmode | numeric(1,0)| > countryid | numeric | > confirmoptin| numeric(2,0)| > bouncedate | date| > memberageid | numeric | > sourceid2 | numeric | > remoteuserid| character varying(50) |
Re: [PERFORM] oracle to psql migration - slow query in postgres
Thanks for all your responses. What's interesting is that an index is used when this query is executed in Oracle. It appears to do some parallel processing: SQL> set line 200 delete from plan_table; explain plan for select websiteid, emailaddress from members where emailok = 1 and emailbounced = 0; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); SQL> 3 rows deleted. SQL> 2345 Explained. SQL> SQL> PLAN_TABLE_OUTPUT Plan hash value: 4247959398 --- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | --- | 0 | SELECT STATEMENT|| 237M| 7248M| 469K (2)| 01:49:33 || || | 1 | PX COORDINATOR || | || || || | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | | 3 |PX BLOCK ITERATOR|| 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || --- PLAN_TABLE_OUTPUT Predicate Information (identified by operation id): --- 4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) 16 rows selected. On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote: > > > -Original Message- > > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] > > Sent: Thursday, October 14, 2010 3:43 PM > > To: pgsql-performance@postgresql.org > > Subject: oracle to psql migration - slow query in postgres > > > > We are in the process of testing migration of our oracle data > > warehouse over to postgres. A potential showstopper are full > > table scans on our members table. We can't function on > > postgres effectively unless index scans are employed. I'm > > thinking I don't have something set correctly in my > > postgresql.conf file, but I'm not sure what. > > > > This table has approximately 300million rows. > > > > Version: > > SELECT version(); > > > > version > > -- > > > > PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC > > gcc (GCC) > > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit > > > > We have 4 quad-core processors and 32GB of RAM. The below > > query uses the members_sorted_idx_001 index in oracle, but in > > postgres, the optimizer chooses a sequential scan. > > > > explain analyze create table tmp_srcmem_emws1 as select > > emailaddress, websiteid > > from members > > where emailok = 1 > >and emailbounced = 0; > > QUERY > > PLAN > > -- > > > > Seq Scan on members (cost=0.00..14137154.64 rows=238177981 > > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) > >Filter: ((emailok = 1::numeric) AND (emailbounced = > > 0::numeric)) Total runtime: 850306.220 ms > > (3 rows) > > > > show shared_buffers ; > > shared_buffers > > > > 7680MB > > (1 row) > > > > show effective_cache_size ; > > effective_cache_size > > -- > > 22GB > > (1 row) > > > > show work_mem ; > > work_mem > > -- > > 768MB > > (1 row) > > > > show enable_seqscan ; > > enable_seqscan > > > > on > > (1 row) > > > > Below are the data definitions for the table/indexes in question: > > > > \d members > > Table "members" > >Column|Type | Modifiers > > -+-+--- > > memberid| numeric | not null > > firstname | character varying(50) | > > lastname| chara
Re: [PERFORM] Index scan / Index cond limitation or ?
Nikolai Zhubr writes: > So, "in (1, 2)" condition is not in Index Cond anymore! Why is that? How > can I push it back? It thinks the indexscan condition is sufficiently selective already. An = ANY condition like that will force multiple index searches, one for each of the OR'd possibilities, so it's far from "free" to add it to the index condition. The planner doesn't think it's worth it. Perhaps on your real query it is, but there's not much point in debating about the behavior on this toy table; without realistic table sizes and up-to-date stats it's impossible to say whether that choice is correct or not. > SELECT version(); > PostgreSQL 8.3.1, compiled by Visual C++ build 1400 You really, really, really ought to be running 8.3.something-newer. We didn't put out the last 11 8.3.x bugfix updates just because we didn't have anything better to do. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] help with understanding EXPLAIN and boosting performance
Brandon Casci writes: > I have an application hosted on Heroku. They use postgres. It's more or less > abstracted away, but I can get some performance data from New Relic. For the > most part, performance is ok, but now and then some queries take a few > seconds, and spike up to 15 or even 16 seconds! Ouch! The particular query you're showing here doesn't look terribly expensive. Are you sure this is one that took that long? If you're seeing identical queries take significantly different times, I'd wonder about what else is happening on the server. The most obvious explanation for that type of behavior is that everything is swapped into RAM when it's fast, but has to be read from disk when it's slow. If that's what's happening you should consider buying more RAM or offloading some activities to other machines, so that there's not so much competition for memory space. If specific queries are consistently slow then EXPLAIN might give some useful info about those. It's unlikely to tell you much about non-reproducible slowdowns, though. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
> -Original Message- > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] > Sent: Friday, October 15, 2010 2:14 PM > To: pgsql-performance@postgresql.org > Subject: Re: oracle to psql migration - slow query in postgres > > Thanks for all your responses. What's interesting is that an > index is used when this query is executed in Oracle. It > appears to do some parallel processing: > > SQL> set line 200 > delete from plan_table; > explain plan for > select websiteid, emailaddress > from members > where emailok = 1 >and emailbounced = 0; > > SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); > SQL> > 3 rows deleted. > > SQL> 2345 > Explained. > > SQL> SQL> > PLAN_TABLE_OUTPUT > -- > -- > -- > -- > Plan hash value: 4247959398 > > -- > - > | Id | Operation | Name | > Rows | Bytes > | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | > -- > - > | 0 | SELECT STATEMENT|| 237M| > 7248M| 469K (2)| 01:49:33 || || > | 1 | PX COORDINATOR || | > || || || > | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| > 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | > | 3 |PX BLOCK ITERATOR|| 237M| > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || > |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || > -- > - > > PLAN_TABLE_OUTPUT > -- > -- > -- > -- > > Predicate Information (identified by operation id): > --- > >4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) > > 16 rows selected. > > 1. Postgres doesn't have "FAST FULL SCAN" because even if all the info is in the index, it need to visit the row in the table ("visibility" issue). 2. Postgres doesn't have parallel executions. BUT, it's free anf has greate community support, as you already saw. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
Very true Igor! Free is my favorite price. I'll figure a way around this issue. Thanks for your help. Tony On Fri, 2010-10-15 at 14:54 -0400, Igor Neyman wrote: > > -Original Message- > > From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] > > Sent: Friday, October 15, 2010 2:14 PM > > To: pgsql-performance@postgresql.org > > Subject: Re: oracle to psql migration - slow query in postgres > > > > Thanks for all your responses. What's interesting is that an > > index is used when this query is executed in Oracle. It > > appears to do some parallel processing: > > > > SQL> set line 200 > > delete from plan_table; > > explain plan for > > select websiteid, emailaddress > > from members > > where emailok = 1 > >and emailbounced = 0; > > > > SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); > > SQL> > > 3 rows deleted. > > > > SQL> 2345 > > Explained. > > > > SQL> SQL> > > PLAN_TABLE_OUTPUT > > -- > > -- > > -- > > -- > > Plan hash value: 4247959398 > > > > -- > > - > > | Id | Operation | Name | > > Rows | Bytes > > | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | > > -- > > - > > | 0 | SELECT STATEMENT|| 237M| > > 7248M| 469K (2)| 01:49:33 || || > > | 1 | PX COORDINATOR || | > > || || || > > | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) | > > | 3 |PX BLOCK ITERATOR|| 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || > > |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| > > 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || > > -- > > - > > > > PLAN_TABLE_OUTPUT > > -- > > -- > > -- > > -- > > > > Predicate Information (identified by operation id): > > --- > > > >4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1) > > > > 16 rows selected. > > > > > > 1. Postgres doesn't have "FAST FULL SCAN" because even if all the info > is in the index, it need to visit the row in the table ("visibility" > issue). > > 2. Postgres doesn't have parallel executions. > > BUT, it's free anf has greate community support, as you already saw. > > Regards, > Igor Neyman > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Stored procedure declared as VOLATILE => no good optimization is done
Damon Snyder wrote: > I have heard it said that if a stored procedure is declared as > VOLATILE, then no good optimizations can be done on queries within > the stored procedure or queries that use the stored procedure (say > as the column in a view). I have seen this in practice, recommended > on the irc channel, and in the archives ( > http://archives.postgresql.org/pgsql-performance/2008-01/msg00283.php > ). Can > someone help me understand or point me to some documentation > explaining why this is so? Here's the documentation: http://www.postgresql.org/docs/current/interactive/sql-createfunction.html http://www.postgresql.org/docs/current/interactive/xfunc-volatility.html -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Stored procedure declared as VOLATILE => no good optimization is done
On Mon, Oct 11, 2010 at 7:10 PM, Damon Snyder wrote: > Hello, > I have heard it said that if a stored procedure is declared as VOLATILE, > then no good optimizations can be done on queries within the stored > procedure or queries that use the stored procedure (say as the column in a > view). I have seen this in practice, recommended on the irc channel, and in > the archives > (http://archives.postgresql.org/pgsql-performance/2008-01/msg00283.php). Can > someone help me understand or point me to some documentation explaining why > this is so? > Any insights would be appreciated. I'm new to pgsql and would like to know a > little more about what is going on under the hood. > Thanks, > Damon The theory behind 'volatile' is pretty simple -- each execution of the function, regardless of the inputs, can be expected to produce a completely independent result, or modifies the datbase. In the case of immutable, which is on the other end, particular set of inputs will produce one and only result, and doesn't modify anything. In the immutable case, the planner can shuffle the function call around in the query, calling it less, simplifying joins, etc. There are lots of theoretical optimizations that can be done since the inputs (principally table column values and literal values) can be assumed static for the duration of the query. 'stable' is almost like immutable, but is only guaranteed static for the duration of the query. most functions that read from but don't write to the database will fit in this category. Most optimizations still apply here, but stable functions can't be used in indexes and can't be executed and saved off in plan time where it might be helpful (prepared statements and pl/pgsql plans). broadly speaking: *) function generates same output from inputs regardless of what's going on in the database, and has no side effects: IMMUTABLE *) function reads (only) from tables, or is an immutable function in most senses but influenced from the GUC (or any other out of scope thing): STABLE *) all other cases: VOLATILE (which is btw the default) merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Stored procedure declared as VOLATILE => no good optimization is done
> broadly speaking: > *) function generates same output from inputs regardless of what's > going on in the database, and has no side effects: IMMUTABLE So can I say "if a function is marked IMMUTABLE, then it should never modify database"? Is there any counter example? > *) function reads (only) from tables, or is an immutable function in > most senses but influenced from the GUC (or any other out of scope > thing): STABLE It seems if above is correct, I can say STABLE functions should never modify databases as well. > *) all other cases: VOLATILE (which is btw the default) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] UUID performance as primary key
Hi Guys, I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID would be perfect in my case as I will be having many small databases which will link up to a global database using the UUID. Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me figure this out, as it is critical for my project. Thanks in advance, Nav -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Stored procedure declared as VOLATILE => no good optimization is done
Tatsuo Ishii writes: > So can I say "if a function is marked IMMUTABLE, then it should never > modify database"? Is there any counter example? > It seems if above is correct, I can say STABLE functions should never > modify databases as well. Both of those things are explicitly stated here: http://developer.postgresql.org/pgdocs/postgres/xfunc-volatility.html regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID performance as primary key
On 16/10/2010 9:58 AM, Navkirat Singh wrote: Hi Guys, I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID would be perfect in my case as I will be having many small databases which will link up to a global database using the UUID. Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me figure this out, as it is critical for my project. Pro: No need for (serverid,serverseq) pair primary keys or hacks with modulus based key generation. Doesn't set any pre-determined limit on how many servers/databases may be in a cluster. Con: Slower than modulo key generation approach, uses more storage. Foreign key relationships may be slower too. Overall, UUIDs seem to be a favoured approach. The other way people seem to do this is by assigning a unique instance id to each server/database out of a maximum "n" instances decided at setup time. Every key generation sequence increments by "n" whenever it generates a key, with an offset of the server/database id. That way, if n=100, server 1 will generate primary keys 001, 101, 201, 301, ..., server 2 will generate primary keys 002, 102, 202, 302, ... and so on. That works great until you need more than 100 instances, at which point you're really, REALLY boned. In really busy systems it also limits the total amount of primary key space - but with BIGINT primary keys, that's unlikely to be something you need to worry about. The composite primary key (serverid,sequenceid) approach avoids the need for a pre-defined maximum number of servers, but can be slow to index and can require more storage, especially because of tuple headers. I have no firsthand experience with any of these approaches so I can't offer you a considered opinion. I know that the MS-SQL crowd at least strongly prefer UUIDs, but they have very strong in-database UUID support. MySQL folks seem to mostly favour the modulo primary key generation approach. I don't see much discussion of the issue here - I get the impression Pg doesn't see heavy use in sharded environments. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Jesper Krogh wrote: To be honest, if it is EDB, Redpill, Command Prompt, 2nd Quadrant or whoever end up doing the job is, seen from this perspective not important, just it ends in the hands of someone "capable" of doing it. ... although Heikki has done some work on this task already. Now you're closing in on why this is a touchy subject. Heikki has already done work here funded by EDB. As such, the idea of anyone else being put in charge of fund raising and allocation for this particular feature would be a political mess. While it would be nice if there was a completely fair sponsorship model for developing community PostgreSQL features, overseen by a benevolent, free, and completely unaffiliated overlord, we're not quite there yet. In cases like these, where there's evidence a company with a track record of delivering features is already involved, you're probably better off contacting someone from there directly--rather than trying to fit that into the public bounty model some PostgreSQL work is getting done via lately. The visibility map is a particularly troublesome one, because the list of "capable" people who could work on that, but who aren't already working at a company having some relations with EDB, is rather slim. I know that's kind of frustrating to hear, for people who would like to get a feature done but can't finance the whole thing themselves. But look on the bright side--the base price is free, and when you give most PostgreSQL companies money to work on something it's at least possible to get what you want done. You'd have to pay a whole lot more than the $15K number you threw out there before any of the commercial database vendors would pay any attention to your particular feature request. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, "PostgreSQL 9.0 High Performance"Pre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID performance as primary key
Wouldn't UUID PK cause a significant drop in insert performance because every insert is now out of order, which leads to a constant re-arranging of the B+ tree? The amount of random IO's that's going to generate would just kill the performance. --- On Fri, 10/15/10, Craig Ringer wrote: From: Craig Ringer Subject: Re: [PERFORM] UUID performance as primary key To: "Navkirat Singh" Cc: pgsql-performance@postgresql.org Date: Friday, October 15, 2010, 10:59 PM On 16/10/2010 9:58 AM, Navkirat Singh wrote: > Hi Guys, > > I am interested in finding out the pros/cons of using UUID as a primary key > field. My requirement states that UUID would be perfect in my case as I will > be having many small databases which will link up to a global database using > the UUID. Hence, the need for a unique key across all databases. It would be > extremely helpful if someone could help me figure this out, as it is critical > for my project. Pro: No need for (serverid,serverseq) pair primary keys or hacks with modulus based key generation. Doesn't set any pre-determined limit on how many servers/databases may be in a cluster. Con: Slower than modulo key generation approach, uses more storage. Foreign key relationships may be slower too. Overall, UUIDs seem to be a favoured approach. The other way people seem to do this is by assigning a unique instance id to each server/database out of a maximum "n" instances decided at setup time. Every key generation sequence increments by "n" whenever it generates a key, with an offset of the server/database id. That way, if n=100, server 1 will generate primary keys 001, 101, 201, 301, ..., server 2 will generate primary keys 002, 102, 202, 302, ... and so on. That works great until you need more than 100 instances, at which point you're really, REALLY boned. In really busy systems it also limits the total amount of primary key space - but with BIGINT primary keys, that's unlikely to be something you need to worry about. The composite primary key (serverid,sequenceid) approach avoids the need for a pre-defined maximum number of servers, but can be slow to index and can require more storage, especially because of tuple headers. I have no firsthand experience with any of these approaches so I can't offer you a considered opinion. I know that the MS-SQL crowd at least strongly prefer UUIDs, but they have very strong in-database UUID support. MySQL folks seem to mostly favour the modulo primary key generation approach. I don't see much discussion of the issue here - I get the impression Pg doesn't see heavy use in sharded environments. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
Excerpts from Robert Haas's message of mié jun 09 15:47:55 -0400 2010: > In going back through emails I had marked as possibly needing another > look before 9.0 is released, I came across this issue again. As I > understand it, analyze (or analyse) now collects statistics for both > the parent individually, and for the parent and its children together. > However, as I further understand it, autovacuum won't actually fire > off an analyze unless there's enough activity on the parent table > considered individually to warrant it. So if you have an empty parent > and a bunch of children with data in it, your stats will still stink, > unless you analyze by hand. So, is there something we could now do about this, while there's still time before 9.1? I haven't followed this issue very closely, but it seems to me that what we want is that we want an ANALYZE in a child table to be mutated into an analyze of its parent table, if the conditions are right; and that an ANALYZE of a parent removes the child tables from being analyzed on the same run. If we analyze the parent, do we also update the children stats, or is it just that we keep two stats for the parent, one with children and one without, both being updated when the parent is analyzed? If the latter's the case, maybe we should modify ANALYZE a bit more, so that we can analyze the whole hierarchy in one go, and store the lot of stats with a single pass (each child alone, the parent alone, the parent plus children). However it's not real clear how would this work with multiple inheritance levels. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
Alvaro Herrera writes: > If we analyze the parent, do we also update the children stats, or is it > just that we keep two stats for the parent, one with children and one > without, both being updated when the parent is analyzed? The latter. The trick here is that we need to fire an analyze on the parent even though only its children may have had any updates. > If the latter's the case, maybe we should modify ANALYZE a bit more, so > that we can analyze the whole hierarchy in one go, and store the lot of > stats with a single pass (each child alone, the parent alone, the parent > plus children). However it's not real clear how would this work with > multiple inheritance levels. It's also not clear how it works without blowing out memory... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
On Fri, Oct 15, 2010 at 10:22 PM, Tom Lane wrote: > Alvaro Herrera writes: > > If we analyze the parent, do we also update the children stats, or is it > > just that we keep two stats for the parent, one with children and one > > without, both being updated when the parent is analyzed? > > The latter. > > The trick here is that we need to fire an analyze on the parent even > though only its children may have had any updates. > > > If the latter's the case, maybe we should modify ANALYZE a bit more, so > > that we can analyze the whole hierarchy in one go, and store the lot of > > stats with a single pass (each child alone, the parent alone, the parent > > plus children). However it's not real clear how would this work with > > multiple inheritance levels. > An issue with automatically analyzing the entire hierarchy is 'abstract' table definitions. I've got a set of tables for storing the same data at different granularities of aggregation. Within each granularity, I've got partitions, but because the set of columns is identical for each granularity, I've got an abstract table definition that is inherited by everything. I don't need or want statistics kept on that table because I never query across the abstract table, only the parent table of each aggregation granularity create table abstract_fact_table ( time timestamp, measure1 bigint, measure2 bigint, measure3 bigint, fk1 bigint, fk2 bigint ); create table minute_scale_fact_table ( } inherits abstract_fact_table; // Then there are several partitions for minute scale data create table hour_scale_fact_table ( ) inherits abstract_fact_table; // then several partitions for hour scale data etc. I do run queries on the minute_scale_fact_table and hour_scale_fact_table but never do so on abstract_fact_table. I could certainly modify my schema such that the abstract table goes away entirely easily enough, but I find this easier for new developers to come in and comprehend, since the similarity between the table definitions is explicit. I'm glad this topic came up, as I was unaware that I need to run analyze on the parent partitions separately - and no data is every inserted directly into the top level of each granularity hierarchy, so it will never fire by itself. If I am using ORM and I've got functionality in a common baseclass in the source code, I'll often implement its mapping in the database via a parent table that the table for any subclass mapping can inherit from. Again, I have no interest in maintaining statistics on the parent table, since I never query against it directly.