Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread kevin kempter
Also, I'm running version 8.3 on a centOS box with 2  dual core CPU's  
and 32Gig of ram



On May 16, 2008, at 12:58 AM, kevin kempter wrote:


Sorry I goofed on the query text Here's the correct query:

select
f14.xpublisher_dim_id,
f14.xtime_dim_id,
f14.xlocation_dim_id,
f14.xreferrer_dim_id,
f14.xsite_dim_id,
f14.xsystem_cfg_dim_id,
f14.xaffiliate_dim_id,
f14.customer_id,
f14.pf_dts_id,
f14.episode_id,
f14.sessionid,
f14.bytes_received,
f14.bytes_transmitted,
f14.total_played_time_sec,
segdim.xsegment_dim_id as episode_level_segid
from
bigtab_stats_fact_tmp14 f14,
xsegment_dim segdim
where
f14.customer_id = segdim.customer_srcid
and f14.show_id = segdim.show_srcid
and f14.season_id = segdim.season_srcid
and f14.episode_id = segdim.episode_srcid
and segdim.segment_srcid is NULL;






On May 16, 2008, at 12:31 AM, kevin kempter wrote:


Hi List;

I have a table with 9,961,914 rows in it (see the describe of  
bigtab_stats_fact_tmp14 below)


I also have a table with 7,785 rows in it (see the describe of  
xsegment_dim below)


I'm running the join shown below and it takes > 10 hours and  
eventually runs out of disk space on a 1.4TB file system


I've included below a describe of both tables, the join and an  
explain plan, any help / suggestions would be much appreciated !


I need to get this beast to run as quickly as possible (without  
filling up my file system)



Thanks in advance...











select
f14.xpublisher_dim_id,
f14.xtime_dim_id,
f14.xlocation_dim_id,
f14.xreferrer_dim_id,
f14.xsite_dim_id,
f14.xsystem_cfg_dim_id,
f14.xaffiliate_dim_id,
f14.customer_id,
pf_dts_id,
episode_id,
sessionid,
bytes_received,
bytes_transmitted,
total_played_time_sec,
segdim.xsegment_dim_id as episode_level_segid
from
bigtab_stats_fact_tmp14 f14,
xsegment_dim segdim
where
f14.customer_id = segdim.customer_srcid
and f14.show_id = segdim.show_srcid
and f14.season_id = segdim.season_srcid
and f14.episode_id = segdim.episode_srcid
and segdim.segment_srcid is NULL;






QUERY PLAN
---
Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND  
(segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid =  
f14.show_id) AND (segdim.season_srcid = f14.season_id))

->  Sort  (cost=1570.35..1579.46 rows=3643 width=40)
Sort Key: segdim.episode_srcid, segdim.customer_srcid,  
segdim.show_srcid, segdim.season_srcid
->  Seq Scan on xsegment_dim segdim  (cost=0.00..1354.85 rows=3643  
width=40)

Filter: (segment_srcid IS NULL)
->  Sort  (cost=1755323.26..1780227.95 rows=9961874 width=126)
Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id
->  Seq Scan on bigtab_stats_fact_tmp14 f14  (cost=0.00..597355.74  
rows=9961874 width=126)

(9 rows)









# \d bigtab_stats_fact_tmp14
Table "public.bigtab_stats_fact_tmp14"
Column  |Type | Modifiers
--+-+---
pf_dts_id | bigint  |
pf_device_id   | bigint  |
segment_id   | bigint  |
cdn_id   | bigint  |
collector_id | bigint  |
digital_envoy_id | bigint  |
maxmind_id   | bigint  |
quova_id | bigint  |
website_id   | bigint  |
referrer_id  | bigint  |
affiliate_id | bigint  |
custom_info_id   | bigint  |
start_dt | timestamp without time zone |
total_played_time_sec| numeric(18,5)   |
bytes_received   | bigint  |
bytes_transmitted| bigint  |
stall_count  | integer |
stall_duration_sec   | numeric(18,5)   |
hiccup_count | integer |
hiccup_duration_sec  | numeric(18,5)   |
watched_duration_sec | numeric(18,5)   |
rewatched_duration_sec   | numeric(18,5)   |
requested_start_position | numeric(18,5)   |
requested_stop_position  | numeric(18,5)   |
post_position| numeric(18,5)   |
is_vod   | numeric(1,0)|
sessionid| bigint  |
create_dt| timestamp without time zone |
segment_type_id  | bigint  |
customer_id  | bigint  |
content_publisher_id | bigint  |
content_owner_id | bigint  |
episode_id   | bigint   

Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread Claus Guttesen
> I have a table with 9,961,914 rows in it (see the describe of
> bigtab_stats_fact_tmp14 below)
>
> I also have a table with 7,785 rows in it (see the describe of xsegment_dim
> below)
>
> I'm running the join shown below and it takes > 10 hours and eventually runs
> out of disk space on a 1.4TB file system
>
> I've included below a describe of both tables, the join and an explain plan,
> any help / suggestions would be much appreciated !
>
> I need to get this beast to run as quickly as possible (without filling up
> my file system)
>
>
> Thanks in advance...

What version of postgresql are you using? According to
http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
you may benefit from adjusting work_mem.

You also index segment_srcid (in table xsegment_dim) but if you search
for  NULL and you have enough of those it defaults to a seq. scan:

Seq Scan on xsegment_dim segdim  (cost=0.00..1354.85 rows=3643 width=40)
> Filter: (segment_srcid IS NULL)

Maby you could insert some default value into segment_srcid (some
arbitrary large numbers) instead of NULL and then search for values
greater than??

You could also try to lower random_page_cost from default to 2.

> select
> f14.xpublisher_dim_id,
> f14.xtime_dim_id,
> f14.xlocation_dim_id,
> f14.xreferrer_dim_id,
> f14.xsite_dim_id,
> f14.xsystem_cfg_dim_id,
> f14.xaffiliate_dim_id,
> f14.customer_id,
> pf_dts_id,
> episode_id,
> sessionid,
> bytes_received,
> bytes_transmitted,
> total_played_time_sec,
> segdim.xsegment_dim_id as episode_level_segid
> from
> bigtab_stats_fact_tmp14 f14,
> xsegment_dim segdim
> where
> f14.customer_id = segdim.customer_srcid
> and f14.show_id = segdim.show_srcid
> and f14.season_id = segdim.season_srcid
> and f14.episode_id = segdim.episode_srcid
> and segdim.segment_srcid is NULL;
>
>
>
>
>
>
> QUERY PLAN
> ---
> Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
> Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND
> (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid =
> f14.show_id) AND (segdim.season_srcid = f14.season_id))
> ->  Sort  (cost=1570.35..1579.46 rows=3643 width=40)
> Sort Key: segdim.episode_srcid, segdim.customer_srcid, segdim.show_srcid,
> segdim.season_srcid
> ->  Seq Scan on xsegment_dim segdim  (cost=0.00..1354.85 rows=3643 width=40)
> Filter: (segment_srcid IS NULL)
> ->  Sort  (cost=1755323.26..1780227.95 rows=9961874 width=126)
> Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id
> ->  Seq Scan on bigtab_stats_fact_tmp14 f14  (cost=0.00..597355.74
> rows=9961874 width=126)
> (9 rows)
>
>
>
>
>
>
>
>
>
> # \d bigtab_stats_fact_tmp14
> Table "public.bigtab_stats_fact_tmp14"
> Column  |Type | Modifiers
> --+-+---
> pf_dts_id | bigint  |
> pf_device_id   | bigint  |
> segment_id   | bigint  |
> cdn_id   | bigint  |
> collector_id | bigint  |
> digital_envoy_id | bigint  |
> maxmind_id   | bigint  |
> quova_id | bigint  |
> website_id   | bigint  |
> referrer_id  | bigint  |
> affiliate_id | bigint  |
> custom_info_id   | bigint  |
> start_dt | timestamp without time zone |
> total_played_time_sec| numeric(18,5)   |
> bytes_received   | bigint  |
> bytes_transmitted| bigint  |
> stall_count  | integer |
> stall_duration_sec   | numeric(18,5)   |
> hiccup_count | integer |
> hiccup_duration_sec  | numeric(18,5)   |
> watched_duration_sec | numeric(18,5)   |
> rewatched_duration_sec   | numeric(18,5)   |
> requested_start_position | numeric(18,5)   |
> requested_stop_position  | numeric(18,5)   |
> post_position| numeric(18,5)   |
> is_vod   | numeric(1,0)|
> sessionid| bigint  |
> create_dt| timestamp without time zone |
> segment_type_id  | bigint  |
> customer_id  | bigint  |
> content_publisher_id | bigint  |
> content_owner_id | bigint  |
> episode_id   | bigint  |
> duration_sec 

Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread kevin kempter

Sorry I goofed on the query text Here's the correct query:

select
f14.xpublisher_dim_id,
f14.xtime_dim_id,
f14.xlocation_dim_id,
f14.xreferrer_dim_id,
f14.xsite_dim_id,
f14.xsystem_cfg_dim_id,
f14.xaffiliate_dim_id,
f14.customer_id,
f14.pf_dts_id,
f14.episode_id,
f14.sessionid,
f14.bytes_received,
f14.bytes_transmitted,
f14.total_played_time_sec,
segdim.xsegment_dim_id as episode_level_segid
from
bigtab_stats_fact_tmp14 f14,
xsegment_dim segdim
where
f14.customer_id = segdim.customer_srcid
and f14.show_id = segdim.show_srcid
and f14.season_id = segdim.season_srcid
and f14.episode_id = segdim.episode_srcid
and segdim.segment_srcid is NULL;






On May 16, 2008, at 12:31 AM, kevin kempter wrote:


Hi List;

I have a table with 9,961,914 rows in it (see the describe of  
bigtab_stats_fact_tmp14 below)


I also have a table with 7,785 rows in it (see the describe of  
xsegment_dim below)


I'm running the join shown below and it takes > 10 hours and  
eventually runs out of disk space on a 1.4TB file system


I've included below a describe of both tables, the join and an  
explain plan, any help / suggestions would be much appreciated !


I need to get this beast to run as quickly as possible (without  
filling up my file system)



Thanks in advance...











select
f14.xpublisher_dim_id,
f14.xtime_dim_id,
f14.xlocation_dim_id,
f14.xreferrer_dim_id,
f14.xsite_dim_id,
f14.xsystem_cfg_dim_id,
f14.xaffiliate_dim_id,
f14.customer_id,
pf_dts_id,
episode_id,
sessionid,
bytes_received,
bytes_transmitted,
total_played_time_sec,
segdim.xsegment_dim_id as episode_level_segid
from
bigtab_stats_fact_tmp14 f14,
xsegment_dim segdim
where
f14.customer_id = segdim.customer_srcid
and f14.show_id = segdim.show_srcid
and f14.season_id = segdim.season_srcid
and f14.episode_id = segdim.episode_srcid
and segdim.segment_srcid is NULL;






QUERY PLAN
---
Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND  
(segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid =  
f14.show_id) AND (segdim.season_srcid = f14.season_id))

->  Sort  (cost=1570.35..1579.46 rows=3643 width=40)
Sort Key: segdim.episode_srcid, segdim.customer_srcid,  
segdim.show_srcid, segdim.season_srcid
->  Seq Scan on xsegment_dim segdim  (cost=0.00..1354.85 rows=3643  
width=40)

Filter: (segment_srcid IS NULL)
->  Sort  (cost=1755323.26..1780227.95 rows=9961874 width=126)
Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id
->  Seq Scan on bigtab_stats_fact_tmp14 f14  (cost=0.00..597355.74  
rows=9961874 width=126)

(9 rows)









# \d bigtab_stats_fact_tmp14
Table "public.bigtab_stats_fact_tmp14"
Column  |Type | Modifiers
--+-+---
pf_dts_id | bigint  |
pf_device_id   | bigint  |
segment_id   | bigint  |
cdn_id   | bigint  |
collector_id | bigint  |
digital_envoy_id | bigint  |
maxmind_id   | bigint  |
quova_id | bigint  |
website_id   | bigint  |
referrer_id  | bigint  |
affiliate_id | bigint  |
custom_info_id   | bigint  |
start_dt | timestamp without time zone |
total_played_time_sec| numeric(18,5)   |
bytes_received   | bigint  |
bytes_transmitted| bigint  |
stall_count  | integer |
stall_duration_sec   | numeric(18,5)   |
hiccup_count | integer |
hiccup_duration_sec  | numeric(18,5)   |
watched_duration_sec | numeric(18,5)   |
rewatched_duration_sec   | numeric(18,5)   |
requested_start_position | numeric(18,5)   |
requested_stop_position  | numeric(18,5)   |
post_position| numeric(18,5)   |
is_vod   | numeric(1,0)|
sessionid| bigint  |
create_dt| timestamp without time zone |
segment_type_id  | bigint  |
customer_id  | bigint  |
content_publisher_id | bigint  |
content_owner_id | bigint  |
episode_id   | bigint  |
duration_sec | numeric(18,5)   |
device_id| bigint  |
os_id

Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread Simon Riggs

On Fri, 2008-05-16 at 00:31 -0600, kevin kempter wrote:

> I'm running the join shown below and it takes > 10 hours and  
> eventually runs out of disk space on a 1.4TB file system

Well, running in 10 hours doesn't mean there's a software problem, nor
does running out of disk space.

Please crunch some numbers before you ask, such as how much disk space
was used by the query, how big you'd expect it to be etc, plus provide
information such as what the primary key of the large table is and what
is your release level is etc..

Are you sure you want to retrieve an estimated 3 billion rows? Can you
cope if that estimate is wrong and the true figure is much higher? Do
you think the estimate is realistic?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread Richard Huxton

kevin kempter wrote:

Hi List;

I have a table with 9,961,914 rows in it (see the describe of 
bigtab_stats_fact_tmp14 below)


I also have a table with 7,785 rows in it (see the describe of 
xsegment_dim below)


I'm running the join shown below and it takes > 10 hours and eventually 
runs out of disk space on a 1.4TB file system



QUERY PLAN
--- 


Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)


Dumb question Kevin, but are you really expecting 3.2 billion rows in 
the result-set? Because that's approaching 400GB of result-set without 
any overheads.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread kevin kempter
I'm expecting 9,961,914 rows returned. Each row in the big table  
should have a corresponding key in the smaller tale, I want to  
basically "expand" the big table column list by one, via adding the  
appropriate key from the smaller table for each row in the big table.  
It's not a cartesion product join.




On May 16, 2008, at 1:40 AM, Richard Huxton wrote:


kevin kempter wrote:

Hi List;
I have a table with 9,961,914 rows in it (see the describe of  
bigtab_stats_fact_tmp14 below)
I also have a table with 7,785 rows in it (see the describe of  
xsegment_dim below)
I'm running the join shown below and it takes > 10 hours and  
eventually runs out of disk space on a 1.4TB file system



QUERY PLAN
--- Merge 
 Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)


Dumb question Kevin, but are you really expecting 3.2 billion rows  
in the result-set? Because that's approaching 400GB of result-set  
without any overheads.


--
 Richard Huxton
 Archonet Ltd



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread Richard Huxton

kevin kempter wrote:
I'm expecting 9,961,914 rows returned. Each row in the big table should 
have a corresponding key in the smaller tale, I want to basically 
"expand" the big table column list by one, via adding the appropriate 
key from the smaller table for each row in the big table. It's not a 
cartesion product join.


Didn't seem likely, to be honest.

What happens if you try the query as a cursor, perhaps with an order-by 
on customer_id or something to encourage index use? Do you ever get a 
first row back?


In fact, what happens if you slap an index over all your join columns on 
xsegment_dim? With 7,000 rows that should make it a cheap test.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread Richard Huxton

kevin kempter wrote:

Hi List;

I have a table with 9,961,914 rows in it (see the describe of 
bigtab_stats_fact_tmp14 below)


I also have a table with 7,785 rows in it (see the describe of 
xsegment_dim below)


Something else is puzzling me with this - you're joining over four fields.


from
bigtab_stats_fact_tmp14 f14,
xsegment_dim segdim
where
f14.customer_id = segdim.customer_srcid
and f14.show_id = segdim.show_srcid
and f14.season_id = segdim.season_srcid
and f14.episode_id = segdim.episode_srcid
and segdim.segment_srcid is NULL;


--- 


Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)



->  Sort  (cost=1570.35..1579.46 rows=3643 width=40)



->  Sort  (cost=1755323.26..1780227.95 rows=9961874 width=126)


Here it's still expecting 320 matches against each row from the large 
table. That's ~ 10% of the small table (or that fraction of it that PG 
expects) which seems very high for four clauses ANDed together.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread Luke Lonergan
Try 'set enable-mergejoin=false' and see if you get a hashjoin.

- Luke

- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: Richard Huxton <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org 
Sent: Fri May 16 04:00:41 2008
Subject: Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of 
disk space

I'm expecting 9,961,914 rows returned. Each row in the big table  
should have a corresponding key in the smaller tale, I want to  
basically "expand" the big table column list by one, via adding the  
appropriate key from the smaller table for each row in the big table.  
It's not a cartesion product join.



On May 16, 2008, at 1:40 AM, Richard Huxton wrote:

> kevin kempter wrote:
>> Hi List;
>> I have a table with 9,961,914 rows in it (see the describe of  
>> bigtab_stats_fact_tmp14 below)
>> I also have a table with 7,785 rows in it (see the describe of  
>> xsegment_dim below)
>> I'm running the join shown below and it takes > 10 hours and  
>> eventually runs out of disk space on a 1.4TB file system
>
>> QUERY PLAN
>> ---
>>  Merge 
>>  Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
>
> Dumb question Kevin, but are you really expecting 3.2 billion rows  
> in the result-set? Because that's approaching 400GB of result-set  
> without any overheads.
>
> -- 
>  Richard Huxton
>  Archonet Ltd


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread kevin kempter
On further investigation it turns out that I/we have a serious data  
issue in that my small table is full of 'UNKNOWN' tags so my query  
cannot associate the data correctly - thus I will end up with 2+  
billion rows.



Thanks everyone for your help




On May 16, 2008, at 1:38 AM, Simon Riggs wrote:



On Fri, 2008-05-16 at 00:31 -0600, kevin kempter wrote:


I'm running the join shown below and it takes > 10 hours and
eventually runs out of disk space on a 1.4TB file system


Well, running in 10 hours doesn't mean there's a software problem, nor
does running out of disk space.

Please crunch some numbers before you ask, such as how much disk space
was used by the query, how big you'd expect it to be etc, plus provide
information such as what the primary key of the large table is and  
what

is your release level is etc..

Are you sure you want to retrieve an estimated 3 billion rows? Can you
cope if that estimate is wrong and the true figure is much higher? Do
you think the estimate is realistic?

--
Simon Riggs   www.2ndQuadrant.com
PostgreSQL Training, Services and Support



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-16 Thread david

On Thu, 15 May 2008, [EMAIL PROTECTED] wrote:


On Thu, 15 May 2008, Matthew Wakeling wrote:


On Thu, 15 May 2008, Philippe Amelant wrote:

using mkfs.ext3 I can use "-T" to tune the filesytem

mkfs.ext3 -T fs_type ...

fs_type are in /etc/mke2fs.conf (on debian)


If you look at that file, you'd see that tuning really doesn't change that 
much. In fact, the only thing it does change (if you avoid "small" and 
"floppy") is the number of inodes available in the filesystem. Since 
Postgres tends to produce few large files, you don't need that many inodes, 
so the "largefile" option may be best. However, note that the number of 
inodes is a hard limit of the filesystem - if you try to create more files 
on the filesystem than there are available inodes, then you will get an out 
of space error even if the filesystem has space left.
The only real benefit of having not many inodes is that you waste a little 
less space, so many admins are pretty generous with this setting.


IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T 
setting goes.



Probably of more use are some of the other settings:

-m reserved-blocks-percentage - this reserves a portion of the filesystem
   that only root can write to. If root has no need for it, you can kill
   this by setting it to zero. The default is for 5% of the disc to be
   wasted.


think twice about this. ext2/3 get slow when they fill up (they have 
fragmentation problems when free space gets too small), this 5% that only 
root can use also serves as a buffer against that as well.



-j turns the filesystem into ext3 instead of ext2 - many people say that
   for Postgres you shouldn't do this, as ext2 is faster.


for the partition with the WAL on it you may as well do ext2 (the WAL is 
written synchronously and sequentially so the journal doesn't help you), but 
for the data partition you may benifit from the journal.


a fairly recent article on the subject

http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-16 Thread Scott Marlowe
On Thu, May 15, 2008 at 9:38 AM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> Guillaume Cottenceau wrote:
>>
>> Matthew Wakeling  writes:
>
>> It is still relevant, as with 5% margin, you can afford changing
>> that to 0% with tune2fs, just the time for you to start PG and
>> remove some data by SQL, then shutdown and set the margin to 5%
>> again.
>>
>
> I find that if you actually reach that level of capacity failure it is due
> to lack of management and likely there is much lower hanging fruit left over
> by a lazy dba or sysadmin than having to adjust filesystem level parameters.
>
> Manage actively and the above change is absolutely irrelevant.

Sorry, but that's like saying that open heart surgery isn't a fix for
clogged arteries because you should have been taking aspirin everyday
and exercising.  It might not be the best answer, but sometimes it's
the only answer you've got.

I know that being able to drop the margin from x% to 0% for 10 minutes
has pulled more than one db back from the brink for me (usually
consulting on other people's databases, only once or so on my own) :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-16 Thread Joshua D. Drake
On Fri, 16 May 2008 11:07:17 -0600
"Scott Marlowe" <[EMAIL PROTECTED]> wrote:

> Sorry, but that's like saying that open heart surgery isn't a fix for
> clogged arteries because you should have been taking aspirin everyday
> and exercising.  It might not be the best answer, but sometimes it's
> the only answer you've got.
> 
> I know that being able to drop the margin from x% to 0% for 10 minutes
> has pulled more than one db back from the brink for me (usually
> consulting on other people's databases, only once or so on my own) :)

My point is, if you are adjusting that parameter you probably have a
stray log or a bunch of rpms etc... that can be truncated to get
you where you need to be.

Of course there is always the last ditch effort of what you suggest but
first you should look for the more obvious possible solution.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




signature.asc
Description: PGP signature


[PERFORM] very slow left join

2008-05-16 Thread Ben
I've inherited an Oracle database that I'm porting to Postgres, and this 
has been going quite well until now. Unfortunately, I've found one view (a 
largish left join) that runs several orders of magnitude slower on 
Postgres than it did on Oracle.


=> select version();
 version
--
 PostgreSQL 8.2.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 
20070105 (Red Hat 4.1.1-52)
(1 row)


After analyzing the database, the explain analyze output for the query is:

 Nested Loop Left Join  (cost=133.51..15846.99 rows=1 width=312) (actual 
time=109.131..550711.374 rows=1248 loops=1)
   Join Filter: (log.logkey = ln.logkey)
   ->  Nested Loop  (cost=133.51..267.44 rows=1 width=306) (actual 
time=15.316..74.074 rows=1248 loops=1)
 ->  Merge Join  (cost=133.51..267.16 rows=1 width=325) (actual 
time=15.300..60.332 rows=1248 loops=1)
   Merge Cond: (log.eventkey = e.eventkey)
   Join Filter: ((e.clientkey = log.clientkey) AND (e.premiseskey = 
log.premiseskey))
   ->  Index Scan using log_eventkey_idx on log  
(cost=0.00..3732.14 rows=36547 width=167) (actual time=0.015..25.385 rows=36547 
loops=1)
 Filter: (logicaldel = 'N'::bpchar)
   ->  Sort  (cost=133.51..135.00 rows=595 width=328) (actual 
time=15.185..16.379 rows=1248 loops=1)
 Sort Key: e.eventkey
 ->  Hash Join  (cost=1.30..106.09 rows=595 width=328) 
(actual time=0.073..2.033 rows=1248 loops=1)
   Hash Cond: ((e.clientkey = p.clientkey) AND 
(e.premiseskey = p.premiseskey))
   ->  Seq Scan on event e  (cost=0.00..89.48 rows=1248 
width=246) (actual time=0.005..0.481 rows=1248 loops=1)
   ->  Hash  (cost=1.14..1.14 rows=11 width=82) (actual 
time=0.059..0.059 rows=11 loops=1)
 ->  Seq Scan on premises p  (cost=0.00..1.14 
rows=11 width=82) (actual time=0.004..0.020 rows=11 loops=1)
   Filter: (logicaldel = 'N'::bpchar)
 ->  Index Scan using severity_pk on severity s  (cost=0.00..0.27 
rows=1 width=49) (actual time=0.007..0.009 rows=1 loops=1248)
   Index Cond: (e.severitykey = s.severitykey)
   ->  Seq Scan on lognote ln1  (cost=0.00..15552.67 rows=1195 width=175) 
(actual time=1.173..440.695 rows=1244 loops=1248)
 Filter: ((logicaldel = 'N'::bpchar) AND (subplan))
 SubPlan
   ->  Limit  (cost=4.30..8.58 rows=1 width=34) (actual 
time=0.171..0.171 rows=1 loops=2982720)
 InitPlan
   ->  GroupAggregate  (cost=0.00..4.30 rows=1 width=110) 
(actual time=0.089..0.089 rows=1 loops=2982720)
 ->  Index Scan using lognote_pk on lognote 
(cost=0.00..4.28 rows=1 width=110) (actual time=0.086..0.087 rows=1 loops=2982720)
   Index Cond: ((clientkey = $0) AND (premiseskey = 
$1) AND (logkey = $2))
   Filter: ((logicaldel = 'N'::bpchar) AND 
((lognotetext ~~ '_%;%'::text) OR (lognotetext ~~ '_%has modified Respond 
Status to%'::text)))
 ->  Index Scan using lognote_pk on lognote  (cost=0.00..4.28 
rows=1 width=34) (actual time=0.170..0.170 rows=1 loops=2982720)
   Index Cond: ((clientkey = $0) AND (premiseskey = $1) AND 
(logkey = $2))
   Filter: ((logicaldel = 'N'::bpchar) AND (lognotetime = 
$3))
 Total runtime: 550712.393 ms
(31 rows)


Either side of the left join runs quite fast independently. (The full 
query also runs well when made into an inner join, but that's not the 
logic I want.) The biggest difference between running each side 
indpendently and together in a left join is that this line in the plan for 
the right side of the left join:


->  Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=110) 
(actual time=0.086..0.087 rows=1 loops=2982720)

...becomes this line when run independantly:

->  Index Scan using lognote_pk on lognote  (cost=0.00..4.28 rows=1 width=110) 
(actual time=0.086..0.087 rows=1 loops=2390)

That's quite a few more loops in the left join. Am I right to think that 
it's looping so much because the analyzer is so far off when guessing the 
rows for the left side of the join (1 vs. 1248)? Or is there something 
else going on? I've tried bumping up analyze stats on a few columns, but 
I'm not too sure how to spot which columns it might help with and, sure 
enough, it didn't help.



The actual query:

select *
from
  (
select *
from
  event e,
  severity s,
  premises p,
  log
where
  p.clientkey = e.clientkey and
  p.premiseskey = e.premiseskey and
  p.logicaldel = 'N' and
  log.logicaldel = 'N' and
  e.clientkey = log.clientkey and
  e.premiseskey =

Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Greg Smith

On Thu, 15 May 2008, Alvaro Herrera wrote:


Starting a transaction does not write anything to pg_clog.


For Matt and others, some details here are in 
src/backend/access/transam/README:


"pg_clog records the commit status for each transaction that has been 
assigned an XID."


"Transactions and subtransactions are assigned permanent XIDs only when/if 
they first do something that requires one --- typically, 
insert/update/delete a tuple, though there are a few other places that 
need an XID assigned."


After reading the code and that documentation a bit, the part I'm still 
not sure about is whether the CLOG entry is created when the XID is 
assigned and then kept current as the state changes, or whether that isn't 
even in CLOG until the transaction is committed.  It seems like the 
latter, but there's some ambiguity in the wording and too many code paths 
for me to map right now.


From there, it doesn't make its way out to disk until the internal CLOG 
buffers are filled, at which point the least recently used buffer there is 
evicted to permanent storage.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow left join

2008-05-16 Thread Scott Marlowe
On Fri, May 16, 2008 at 11:56 AM, Ben <[EMAIL PROTECTED]> wrote:
> I've inherited an Oracle database that I'm porting to Postgres, and this has
> been going quite well until now. Unfortunately, I've found one view (a
> largish left join) that runs several orders of magnitude slower on Postgres
> than it did on Oracle.
>
> => select version();
> version
> --
>  PostgreSQL 8.2.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
> 4.1.1 20070105 (Red Hat 4.1.1-52)
> (1 row)

1: Update to 8.2.7.  It's pretty painless, and who knows what
performance bugs you might be fighting that you don't really need to.

> After analyzing the database, the explain analyze output for the query is:
>
>  Nested Loop Left Join  (cost=133.51..15846.99 rows=1 width=312) (actual
> time=109.131..550711.374 rows=1248 loops=1)
>   Join Filter: (log.logkey = ln.logkey)
>   ->  Nested Loop  (cost=133.51..267.44 rows=1 width=306) (actual
> time=15.316..74.074 rows=1248 loops=1)
SNIP
>  Total runtime: 550712.393 ms

Just for giggles, try running the query like so:

set enable_nestloop = off;
explain analyze ...

and see what happens.  I'm guessing that the nested loops are bad choices here.

>(case when instr(lognotetext,';') = 0 then instr(lognotetext,' has
> modified')
>  else instr(lognotetext,';') end) = 0 then NULL

Try creating indexes on the functions above, and make sure you're
running the db in the C local if you can. Note you may need to dump /
initdb --locale=C / reload your data if you're not in the C locale
already.  text_pattern_ops may be applicable here, but I'm not sure
how to use it in the above functions.

> Table "public.event"
> Column |Type |   Modifiers
> +-+
>  clientkey  | character(30)   | not null
>  premiseskey| character(30)   | not null
>  eventkey   | character(30)   | not null
>  severitykey| character(30)   |

Do these really need to be character and not varchar?  varchar / text
are better optimized in pgsql, and character often need to be cast
anyway, so you might as well start with varchar.  Unless you REALLY
need padding in your db, avoid char(x).

Don't see anything else, but who knows what someone else might see.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow left join

2008-05-16 Thread Ben

On Fri, 16 May 2008, Scott Marlowe wrote:


Just for giggles, try running the query like so:

set enable_nestloop = off;
explain analyze ...

and see what happens.  I'm guessing that the nested loops are bad choices here.


You guess correctly, sir! Doing so shaves 3 orders of magnitude off the 
runtime. That's nice. :) But that brings up the question of why postgres 
thinks nested loops are the way to go? It would be handy if I could make 
it guess correctly to begin with and didn't have to turn nested loops off 
each time I run this.




Table "public.event"
Column |Type |   Modifiers
+-+
 clientkey  | character(30)   | not null
 premiseskey| character(30)   | not null
 eventkey   | character(30)   | not null
 severitykey| character(30)   |


Do these really need to be character and not varchar?  varchar / text
are better optimized in pgsql, and character often need to be cast
anyway, so you might as well start with varchar.  Unless you REALLY
need padding in your db, avoid char(x).


Unfortuantely, the people who created this database made all keys 30 
character strings, and we're not near a place in our release cycle where 
we can fix that.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Simon Riggs

On Fri, 2008-05-16 at 14:05 -0400, Greg Smith wrote:
> After reading the code and that documentation a bit, the part I'm
> still not sure about is whether the CLOG entry is created when the XID
> is assigned and then kept current as the state changes, or whether
> that isn't even in CLOG until the transaction is committed.  It seems
> like the latter, but there's some ambiguity in the wording and too
> many code paths for me to map right now.

Alvaro already said this, I thought? The clog is updated only at sub or
main transaction end, thank goodness. When the transactionid is assigned
the page of the clog that contains that transactionid is checked to see
if it already exists and if not, it is initialised.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow left join

2008-05-16 Thread Scott Marlowe
On Fri, May 16, 2008 at 12:21 PM, Ben <[EMAIL PROTECTED]> wrote:
> On Fri, 16 May 2008, Scott Marlowe wrote:
>
>> Just for giggles, try running the query like so:
>>
>> set enable_nestloop = off;
>> explain analyze ...
>>
>> and see what happens.  I'm guessing that the nested loops are bad choices
>> here.
>
> You guess correctly, sir! Doing so shaves 3 orders of magnitude off the
> runtime. That's nice. :) But that brings up the question of why postgres
> thinks nested loops are the way to go? It would be handy if I could make it
> guess correctly to begin with and didn't have to turn nested loops off each
> time I run this.

Well, I'm guessing that you aren't in locale=C and that the text
functions in your query aren't indexed.  Try creating an index on them
something like:

create index abc_txtfield_func on mytable (substring(textfield,1,5));

etc and see if that helps.

As for the char type, I totally understand the issue, having inherited
oracle dbs before...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow left join

2008-05-16 Thread Ben

On Fri, 16 May 2008, Scott Marlowe wrote:


Well, I'm guessing that you aren't in locale=C and that the text


Correct, I am not. And my understanding is that by moving to the C locale, 
I would loose utf8 validation, so I don't want to go there. Though, it's 
news to me that I would get any kind of select performance boost with 
locale=C. Why would it help?



functions in your query aren't indexed.  Try creating an index on them
something like:

create index abc_txtfield_func on mytable (substring(textfield,1,5));

etc and see if that helps.


It does not. :(

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow left join

2008-05-16 Thread Craig Ringer

Ben wrote:

On Fri, 16 May 2008, Scott Marlowe wrote:


Well, I'm guessing that you aren't in locale=C and that the text


Correct, I am not. And my understanding is that by moving to the C 
locale, I would loose utf8 validation, so I don't want to go there. 
Though, it's news to me that I would get any kind of select performance 
boost with locale=C. Why would it help?


As far as I know the difference is that in the "C" locale PostgreSQL can 
use simple byte-ordinal-oriented rules for sorting, character access, 
etc. It can ignore the possibility of a character being more than one 
byte in size. It can also avoid having to consider pairs of characters 
where the ordinality of the numeric byte value of the characters is not 
the same as the ordinality of the characters in the locale (ie they 
don't sort in byte-value order).


If I've understood it correctly ( I don't use "C" locale databases 
myself and I have not tested any of this ) that means that two UTF-8 
encoded strings stored in a "C" locale database might not compare how 
you expect. They might sort in a different order to what you expect, 
especially if one is a 2-byte or more char and the other is only 1 byte. 
They might compare non-equal even though they contain the same sequence 
of Unicode characters because one is in a decomposed form and one is in 
a precomposed form. The database neither knows the encoding of the 
strings nor cares about it; it's just treating them as byte sequences 
without any interest in their meaning.


If you only ever work with 7-bit ASCII, that might be OK. Ditto if you 
never rely on the database for text sorting and comparison.


Someone please yell at me if I've mistaken something here.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Alvaro Herrera
Greg Smith wrote:

> After reading the code and that documentation a bit, the part I'm still  
> not sure about is whether the CLOG entry is created when the XID is  
> assigned and then kept current as the state changes, or whether that 
> isn't even in CLOG until the transaction is committed.  It seems like the 
> latter, but there's some ambiguity in the wording and too many code paths 
> for me to map right now.

pg_clog is allocated in pages of 8kB apiece(*).  On allocation, pages are
zeroed, which is the bit pattern for "transaction in progress".  So when
a transaction starts, it only needs to ensure that the pg_clog page that
corresponds to it is allocated, but it need not write anything to it.

(*) Each transaction needs 2 bits, so on a 8 kB page there is space for
4 transactions/byte * 8 pages * 1kB/page = 32k transactions.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Regexps - never completing join.

2008-05-16 Thread Scott Marlowe
On Wed, May 14, 2008 at 9:33 AM, Rusty Conover <[EMAIL PROTECTED]> wrote:
> Returning to this problem this morning, I made some more insight.
>
> One way I did find that worked to control the loop (but doesn't yield the
> same results because its a left join)
>
> select wc_rule.id from wc_rule left join classifications on
> classifications.classification ~* wc_rule.regexp;

If you do that and exclude the extra rows added to the right with somthing like

and wc_rule.somefield IS NOT NULL

does it run fast and give you the same answers as the regular join?

I'm guessing that this could be optimized to use a hash agg method of
joining for text, but I'm no expert on the subject.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Alvaro Herrera
Alvaro Herrera wrote:

> pg_clog is allocated in pages of 8kB apiece(*).  On allocation, pages are
> zeroed, which is the bit pattern for "transaction in progress".  So when
> a transaction starts, it only needs to ensure that the pg_clog page that
> corresponds to it is allocated, but it need not write anything to it.

Of course, in 8.3 it's not when the transaction starts, but when the Xid
is assigned (i.e. when the transaction first calls a read-write
command).  In previous versions it happens when the first snapshot is
taken (i.e. normally on the first command of any type, with very few
exceptions.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Please ignore ...

2008-05-16 Thread Alvaro Herrera
Marc G. Fournier wrote:

> Someone on this list has one of those 'confirm your email' filters on their 
> mailbox, which is bouncing back messages ... this is an attempt to try and 
> narrow down the address that is causing this ...

So it seems you're still unable to determine the problematic address?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Regexps - never completing join.

2008-05-16 Thread Rusty Conover


On May 16, 2008, at 2:35 PM, Scott Marlowe wrote:

On Wed, May 14, 2008 at 9:33 AM, Rusty Conover  
<[EMAIL PROTECTED]> wrote:

Returning to this problem this morning, I made some more insight.

One way I did find that worked to control the loop (but doesn't  
yield the

same results because its a left join)

select wc_rule.id from wc_rule left join classifications on
classifications.classification ~* wc_rule.regexp;


If you do that and exclude the extra rows added to the right with  
somthing like


and wc_rule.somefield IS NOT NULL

does it run fast and give you the same answers as the regular join?

I'm guessing that this could be optimized to use a hash agg method of
joining for text, but I'm no expert on the subject.


Hi Scott,

It's not really a hash agg problem really just a looping inside/ 
outside table selection problem.


The slowdown is really the compilation of the regexp repeatedly by  
RE_compile_and_cache() because the regexps are being run on the inside  
of the loop rather then the outside.  And since the regexp cache is  
only 32 items big, the every match is resulting in a recompilation of  
the regexp since I have about 700 regexps.


Thanks,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Regexps - never completing join.

2008-05-16 Thread Scott Marlowe
On Fri, May 16, 2008 at 3:37 PM, Rusty Conover <[EMAIL PROTECTED]> wrote:
>
> On May 16, 2008, at 2:35 PM, Scott Marlowe wrote:
>
>> On Wed, May 14, 2008 at 9:33 AM, Rusty Conover <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Returning to this problem this morning, I made some more insight.
>>>
>>> One way I did find that worked to control the loop (but doesn't yield the
>>> same results because its a left join)
>>>
>>> select wc_rule.id from wc_rule left join classifications on
>>> classifications.classification ~* wc_rule.regexp;
>>
>> If you do that and exclude the extra rows added to the right with somthing
>> like
>>
>> and wc_rule.somefield IS NOT NULL
>>
>> does it run fast and give you the same answers as the regular join?
>>
>> I'm guessing that this could be optimized to use a hash agg method of
>> joining for text, but I'm no expert on the subject.
>
> Hi Scott,
>
> It's not really a hash agg problem really just a looping inside/outside
> table selection problem.
>
> The slowdown is really the compilation of the regexp repeatedly by
> RE_compile_and_cache() because the regexps are being run on the inside of
> the loop rather then the outside.  And since the regexp cache is only 32
> items big, the every match is resulting in a recompilation of the regexp
> since I have about 700 regexps.

That's not what I meant.  What I meant was it seems like a good
candidate for a hash aggregate solution.  I'm pretty sure pgsql can't
use hashagg for something like this right now.

If you hashagged each regexp and each column fed through it, you could
probably get good performance.  but that's a backend hacker thing, not
something I'd know how to do.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Greg Smith wrote:
>> After reading the code and that documentation a bit, the part I'm still  
>> not sure about is whether the CLOG entry is created when the XID is  
>> assigned and then kept current as the state changes, or whether that 
>> isn't even in CLOG until the transaction is committed.

> pg_clog is allocated in pages of 8kB apiece(*).  On allocation, pages are
> zeroed, which is the bit pattern for "transaction in progress".  So when
> a transaction starts, it only needs to ensure that the pg_clog page that
> corresponds to it is allocated, but it need not write anything to it.

One additional point: this means that one transaction in every 32K
writing transactions *does* have to do extra work when it assigns itself
an XID, namely create and zero out the next page of pg_clog.  And that
doesn't just slow down the transaction in question, but the next few
guys that would like an XID but arrive on the scene while the
zeroing-out is still in progress.

This probably contributes to the behavior that Simon and Josh regularly
complain about, that our transaction execution time is subject to
unpredictable spikes.  I'm not sure how to get rid of it though.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance