Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
Also, I'm running version 8.3 on a centOS box with 2 dual core CPU's and 32Gig of ram On May 16, 2008, at 12:58 AM, kevin kempter wrote: Sorry I goofed on the query text Here's the correct query: select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, f14.pf_dts_id, f14.episode_id, f14.sessionid, f14.bytes_received, f14.bytes_transmitted, f14.total_played_time_sec, segdim.xsegment_dim_id as episode_level_segid from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; On May 16, 2008, at 12:31 AM, kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes > 10 hours and eventually runs out of disk space on a 1.4TB file system I've included below a describe of both tables, the join and an explain plan, any help / suggestions would be much appreciated ! I need to get this beast to run as quickly as possible (without filling up my file system) Thanks in advance... select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, pf_dts_id, episode_id, sessionid, bytes_received, bytes_transmitted, total_played_time_sec, segdim.xsegment_dim_id as episode_level_segid from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; QUERY PLAN --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid = f14.show_id) AND (segdim.season_srcid = f14.season_id)) -> Sort (cost=1570.35..1579.46 rows=3643 width=40) Sort Key: segdim.episode_srcid, segdim.customer_srcid, segdim.show_srcid, segdim.season_srcid -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 width=40) Filter: (segment_srcid IS NULL) -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74 rows=9961874 width=126) (9 rows) # \d bigtab_stats_fact_tmp14 Table "public.bigtab_stats_fact_tmp14" Column |Type | Modifiers --+-+--- pf_dts_id | bigint | pf_device_id | bigint | segment_id | bigint | cdn_id | bigint | collector_id | bigint | digital_envoy_id | bigint | maxmind_id | bigint | quova_id | bigint | website_id | bigint | referrer_id | bigint | affiliate_id | bigint | custom_info_id | bigint | start_dt | timestamp without time zone | total_played_time_sec| numeric(18,5) | bytes_received | bigint | bytes_transmitted| bigint | stall_count | integer | stall_duration_sec | numeric(18,5) | hiccup_count | integer | hiccup_duration_sec | numeric(18,5) | watched_duration_sec | numeric(18,5) | rewatched_duration_sec | numeric(18,5) | requested_start_position | numeric(18,5) | requested_stop_position | numeric(18,5) | post_position| numeric(18,5) | is_vod | numeric(1,0)| sessionid| bigint | create_dt| timestamp without time zone | segment_type_id | bigint | customer_id | bigint | content_publisher_id | bigint | content_owner_id | bigint | episode_id | bigint
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
> I have a table with 9,961,914 rows in it (see the describe of > bigtab_stats_fact_tmp14 below) > > I also have a table with 7,785 rows in it (see the describe of xsegment_dim > below) > > I'm running the join shown below and it takes > 10 hours and eventually runs > out of disk space on a 1.4TB file system > > I've included below a describe of both tables, the join and an explain plan, > any help / suggestions would be much appreciated ! > > I need to get this beast to run as quickly as possible (without filling up > my file system) > > > Thanks in advance... What version of postgresql are you using? According to http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY you may benefit from adjusting work_mem. You also index segment_srcid (in table xsegment_dim) but if you search for NULL and you have enough of those it defaults to a seq. scan: Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 width=40) > Filter: (segment_srcid IS NULL) Maby you could insert some default value into segment_srcid (some arbitrary large numbers) instead of NULL and then search for values greater than?? You could also try to lower random_page_cost from default to 2. > select > f14.xpublisher_dim_id, > f14.xtime_dim_id, > f14.xlocation_dim_id, > f14.xreferrer_dim_id, > f14.xsite_dim_id, > f14.xsystem_cfg_dim_id, > f14.xaffiliate_dim_id, > f14.customer_id, > pf_dts_id, > episode_id, > sessionid, > bytes_received, > bytes_transmitted, > total_played_time_sec, > segdim.xsegment_dim_id as episode_level_segid > from > bigtab_stats_fact_tmp14 f14, > xsegment_dim segdim > where > f14.customer_id = segdim.customer_srcid > and f14.show_id = segdim.show_srcid > and f14.season_id = segdim.season_srcid > and f14.episode_id = segdim.episode_srcid > and segdim.segment_srcid is NULL; > > > > > > > QUERY PLAN > --- > Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) > Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND > (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid = > f14.show_id) AND (segdim.season_srcid = f14.season_id)) > -> Sort (cost=1570.35..1579.46 rows=3643 width=40) > Sort Key: segdim.episode_srcid, segdim.customer_srcid, segdim.show_srcid, > segdim.season_srcid > -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 width=40) > Filter: (segment_srcid IS NULL) > -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) > Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id > -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74 > rows=9961874 width=126) > (9 rows) > > > > > > > > > > # \d bigtab_stats_fact_tmp14 > Table "public.bigtab_stats_fact_tmp14" > Column |Type | Modifiers > --+-+--- > pf_dts_id | bigint | > pf_device_id | bigint | > segment_id | bigint | > cdn_id | bigint | > collector_id | bigint | > digital_envoy_id | bigint | > maxmind_id | bigint | > quova_id | bigint | > website_id | bigint | > referrer_id | bigint | > affiliate_id | bigint | > custom_info_id | bigint | > start_dt | timestamp without time zone | > total_played_time_sec| numeric(18,5) | > bytes_received | bigint | > bytes_transmitted| bigint | > stall_count | integer | > stall_duration_sec | numeric(18,5) | > hiccup_count | integer | > hiccup_duration_sec | numeric(18,5) | > watched_duration_sec | numeric(18,5) | > rewatched_duration_sec | numeric(18,5) | > requested_start_position | numeric(18,5) | > requested_stop_position | numeric(18,5) | > post_position| numeric(18,5) | > is_vod | numeric(1,0)| > sessionid| bigint | > create_dt| timestamp without time zone | > segment_type_id | bigint | > customer_id | bigint | > content_publisher_id | bigint | > content_owner_id | bigint | > episode_id | bigint | > duration_sec
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
Sorry I goofed on the query text Here's the correct query: select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, f14.pf_dts_id, f14.episode_id, f14.sessionid, f14.bytes_received, f14.bytes_transmitted, f14.total_played_time_sec, segdim.xsegment_dim_id as episode_level_segid from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; On May 16, 2008, at 12:31 AM, kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes > 10 hours and eventually runs out of disk space on a 1.4TB file system I've included below a describe of both tables, the join and an explain plan, any help / suggestions would be much appreciated ! I need to get this beast to run as quickly as possible (without filling up my file system) Thanks in advance... select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, pf_dts_id, episode_id, sessionid, bytes_received, bytes_transmitted, total_played_time_sec, segdim.xsegment_dim_id as episode_level_segid from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; QUERY PLAN --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid = f14.show_id) AND (segdim.season_srcid = f14.season_id)) -> Sort (cost=1570.35..1579.46 rows=3643 width=40) Sort Key: segdim.episode_srcid, segdim.customer_srcid, segdim.show_srcid, segdim.season_srcid -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 width=40) Filter: (segment_srcid IS NULL) -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74 rows=9961874 width=126) (9 rows) # \d bigtab_stats_fact_tmp14 Table "public.bigtab_stats_fact_tmp14" Column |Type | Modifiers --+-+--- pf_dts_id | bigint | pf_device_id | bigint | segment_id | bigint | cdn_id | bigint | collector_id | bigint | digital_envoy_id | bigint | maxmind_id | bigint | quova_id | bigint | website_id | bigint | referrer_id | bigint | affiliate_id | bigint | custom_info_id | bigint | start_dt | timestamp without time zone | total_played_time_sec| numeric(18,5) | bytes_received | bigint | bytes_transmitted| bigint | stall_count | integer | stall_duration_sec | numeric(18,5) | hiccup_count | integer | hiccup_duration_sec | numeric(18,5) | watched_duration_sec | numeric(18,5) | rewatched_duration_sec | numeric(18,5) | requested_start_position | numeric(18,5) | requested_stop_position | numeric(18,5) | post_position| numeric(18,5) | is_vod | numeric(1,0)| sessionid| bigint | create_dt| timestamp without time zone | segment_type_id | bigint | customer_id | bigint | content_publisher_id | bigint | content_owner_id | bigint | episode_id | bigint | duration_sec | numeric(18,5) | device_id| bigint | os_id
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
On Fri, 2008-05-16 at 00:31 -0600, kevin kempter wrote: > I'm running the join shown below and it takes > 10 hours and > eventually runs out of disk space on a 1.4TB file system Well, running in 10 hours doesn't mean there's a software problem, nor does running out of disk space. Please crunch some numbers before you ask, such as how much disk space was used by the query, how big you'd expect it to be etc, plus provide information such as what the primary key of the large table is and what is your release level is etc.. Are you sure you want to retrieve an estimated 3 billion rows? Can you cope if that estimate is wrong and the true figure is much higher? Do you think the estimate is realistic? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes > 10 hours and eventually runs out of disk space on a 1.4TB file system QUERY PLAN --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) Dumb question Kevin, but are you really expecting 3.2 billion rows in the result-set? Because that's approaching 400GB of result-set without any overheads. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
I'm expecting 9,961,914 rows returned. Each row in the big table should have a corresponding key in the smaller tale, I want to basically "expand" the big table column list by one, via adding the appropriate key from the smaller table for each row in the big table. It's not a cartesion product join. On May 16, 2008, at 1:40 AM, Richard Huxton wrote: kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes > 10 hours and eventually runs out of disk space on a 1.4TB file system QUERY PLAN --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) Dumb question Kevin, but are you really expecting 3.2 billion rows in the result-set? Because that's approaching 400GB of result-set without any overheads. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
kevin kempter wrote: I'm expecting 9,961,914 rows returned. Each row in the big table should have a corresponding key in the smaller tale, I want to basically "expand" the big table column list by one, via adding the appropriate key from the smaller table for each row in the big table. It's not a cartesion product join. Didn't seem likely, to be honest. What happens if you try the query as a cursor, perhaps with an order-by on customer_id or something to encourage index use? Do you ever get a first row back? In fact, what happens if you slap an index over all your join columns on xsegment_dim? With 7,000 rows that should make it a cheap test. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) Something else is puzzling me with this - you're joining over four fields. from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; --- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) -> Sort (cost=1570.35..1579.46 rows=3643 width=40) -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) Here it's still expecting 320 matches against each row from the large table. That's ~ 10% of the small table (or that fraction of it that PG expects) which seems very high for four clauses ANDed together. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
Try 'set enable-mergejoin=false' and see if you get a hashjoin. - Luke - Original Message - From: [EMAIL PROTECTED] <[EMAIL PROTECTED]> To: Richard Huxton <[EMAIL PROTECTED]> Cc: pgsql-performance@postgresql.org Sent: Fri May 16 04:00:41 2008 Subject: Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space I'm expecting 9,961,914 rows returned. Each row in the big table should have a corresponding key in the smaller tale, I want to basically "expand" the big table column list by one, via adding the appropriate key from the smaller table for each row in the big table. It's not a cartesion product join. On May 16, 2008, at 1:40 AM, Richard Huxton wrote: > kevin kempter wrote: >> Hi List; >> I have a table with 9,961,914 rows in it (see the describe of >> bigtab_stats_fact_tmp14 below) >> I also have a table with 7,785 rows in it (see the describe of >> xsegment_dim below) >> I'm running the join shown below and it takes > 10 hours and >> eventually runs out of disk space on a 1.4TB file system > >> QUERY PLAN >> --- >> Merge >> Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) > > Dumb question Kevin, but are you really expecting 3.2 billion rows > in the result-set? Because that's approaching 400GB of result-set > without any overheads. > > -- > Richard Huxton > Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space
On further investigation it turns out that I/we have a serious data issue in that my small table is full of 'UNKNOWN' tags so my query cannot associate the data correctly - thus I will end up with 2+ billion rows. Thanks everyone for your help On May 16, 2008, at 1:38 AM, Simon Riggs wrote: On Fri, 2008-05-16 at 00:31 -0600, kevin kempter wrote: I'm running the join shown below and it takes > 10 hours and eventually runs out of disk space on a 1.4TB file system Well, running in 10 hours doesn't mean there's a software problem, nor does running out of disk space. Please crunch some numbers before you ask, such as how much disk space was used by the query, how big you'd expect it to be etc, plus provide information such as what the primary key of the large table is and what is your release level is etc.. Are you sure you want to retrieve an estimated 3 billion rows? Can you cope if that estimate is wrong and the true figure is much higher? Do you think the estimate is realistic? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
On Thu, 15 May 2008, [EMAIL PROTECTED] wrote: On Thu, 15 May 2008, Matthew Wakeling wrote: On Thu, 15 May 2008, Philippe Amelant wrote: using mkfs.ext3 I can use "-T" to tune the filesytem mkfs.ext3 -T fs_type ... fs_type are in /etc/mke2fs.conf (on debian) If you look at that file, you'd see that tuning really doesn't change that much. In fact, the only thing it does change (if you avoid "small" and "floppy") is the number of inodes available in the filesystem. Since Postgres tends to produce few large files, you don't need that many inodes, so the "largefile" option may be best. However, note that the number of inodes is a hard limit of the filesystem - if you try to create more files on the filesystem than there are available inodes, then you will get an out of space error even if the filesystem has space left. The only real benefit of having not many inodes is that you waste a little less space, so many admins are pretty generous with this setting. IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T setting goes. Probably of more use are some of the other settings: -m reserved-blocks-percentage - this reserves a portion of the filesystem that only root can write to. If root has no need for it, you can kill this by setting it to zero. The default is for 5% of the disc to be wasted. think twice about this. ext2/3 get slow when they fill up (they have fragmentation problems when free space gets too small), this 5% that only root can use also serves as a buffer against that as well. -j turns the filesystem into ext3 instead of ext2 - many people say that for Postgres you shouldn't do this, as ext2 is faster. for the partition with the WAL on it you may as well do ext2 (the WAL is written synchronously and sequentially so the journal doesn't help you), but for the data partition you may benifit from the journal. a fairly recent article on the subject http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
On Thu, May 15, 2008 at 9:38 AM, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > Guillaume Cottenceau wrote: >> >> Matthew Wakeling writes: > >> It is still relevant, as with 5% margin, you can afford changing >> that to 0% with tune2fs, just the time for you to start PG and >> remove some data by SQL, then shutdown and set the margin to 5% >> again. >> > > I find that if you actually reach that level of capacity failure it is due > to lack of management and likely there is much lower hanging fruit left over > by a lazy dba or sysadmin than having to adjust filesystem level parameters. > > Manage actively and the above change is absolutely irrelevant. Sorry, but that's like saying that open heart surgery isn't a fix for clogged arteries because you should have been taking aspirin everyday and exercising. It might not be the best answer, but sometimes it's the only answer you've got. I know that being able to drop the margin from x% to 0% for 10 minutes has pulled more than one db back from the brink for me (usually consulting on other people's databases, only once or so on my own) :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
On Fri, 16 May 2008 11:07:17 -0600 "Scott Marlowe" <[EMAIL PROTECTED]> wrote: > Sorry, but that's like saying that open heart surgery isn't a fix for > clogged arteries because you should have been taking aspirin everyday > and exercising. It might not be the best answer, but sometimes it's > the only answer you've got. > > I know that being able to drop the margin from x% to 0% for 10 minutes > has pulled more than one db back from the brink for me (usually > consulting on other people's databases, only once or so on my own) :) My point is, if you are adjusting that parameter you probably have a stray log or a bunch of rpms etc... that can be truncated to get you where you need to be. Of course there is always the last ditch effort of what you suggest but first you should look for the more obvious possible solution. Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate signature.asc Description: PGP signature
[PERFORM] very slow left join
I've inherited an Oracle database that I'm porting to Postgres, and this has been going quite well until now. Unfortunately, I've found one view (a largish left join) that runs several orders of magnitude slower on Postgres than it did on Oracle. => select version(); version -- PostgreSQL 8.2.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) (1 row) After analyzing the database, the explain analyze output for the query is: Nested Loop Left Join (cost=133.51..15846.99 rows=1 width=312) (actual time=109.131..550711.374 rows=1248 loops=1) Join Filter: (log.logkey = ln.logkey) -> Nested Loop (cost=133.51..267.44 rows=1 width=306) (actual time=15.316..74.074 rows=1248 loops=1) -> Merge Join (cost=133.51..267.16 rows=1 width=325) (actual time=15.300..60.332 rows=1248 loops=1) Merge Cond: (log.eventkey = e.eventkey) Join Filter: ((e.clientkey = log.clientkey) AND (e.premiseskey = log.premiseskey)) -> Index Scan using log_eventkey_idx on log (cost=0.00..3732.14 rows=36547 width=167) (actual time=0.015..25.385 rows=36547 loops=1) Filter: (logicaldel = 'N'::bpchar) -> Sort (cost=133.51..135.00 rows=595 width=328) (actual time=15.185..16.379 rows=1248 loops=1) Sort Key: e.eventkey -> Hash Join (cost=1.30..106.09 rows=595 width=328) (actual time=0.073..2.033 rows=1248 loops=1) Hash Cond: ((e.clientkey = p.clientkey) AND (e.premiseskey = p.premiseskey)) -> Seq Scan on event e (cost=0.00..89.48 rows=1248 width=246) (actual time=0.005..0.481 rows=1248 loops=1) -> Hash (cost=1.14..1.14 rows=11 width=82) (actual time=0.059..0.059 rows=11 loops=1) -> Seq Scan on premises p (cost=0.00..1.14 rows=11 width=82) (actual time=0.004..0.020 rows=11 loops=1) Filter: (logicaldel = 'N'::bpchar) -> Index Scan using severity_pk on severity s (cost=0.00..0.27 rows=1 width=49) (actual time=0.007..0.009 rows=1 loops=1248) Index Cond: (e.severitykey = s.severitykey) -> Seq Scan on lognote ln1 (cost=0.00..15552.67 rows=1195 width=175) (actual time=1.173..440.695 rows=1244 loops=1248) Filter: ((logicaldel = 'N'::bpchar) AND (subplan)) SubPlan -> Limit (cost=4.30..8.58 rows=1 width=34) (actual time=0.171..0.171 rows=1 loops=2982720) InitPlan -> GroupAggregate (cost=0.00..4.30 rows=1 width=110) (actual time=0.089..0.089 rows=1 loops=2982720) -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=110) (actual time=0.086..0.087 rows=1 loops=2982720) Index Cond: ((clientkey = $0) AND (premiseskey = $1) AND (logkey = $2)) Filter: ((logicaldel = 'N'::bpchar) AND ((lognotetext ~~ '_%;%'::text) OR (lognotetext ~~ '_%has modified Respond Status to%'::text))) -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=34) (actual time=0.170..0.170 rows=1 loops=2982720) Index Cond: ((clientkey = $0) AND (premiseskey = $1) AND (logkey = $2)) Filter: ((logicaldel = 'N'::bpchar) AND (lognotetime = $3)) Total runtime: 550712.393 ms (31 rows) Either side of the left join runs quite fast independently. (The full query also runs well when made into an inner join, but that's not the logic I want.) The biggest difference between running each side indpendently and together in a left join is that this line in the plan for the right side of the left join: -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=110) (actual time=0.086..0.087 rows=1 loops=2982720) ...becomes this line when run independantly: -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=110) (actual time=0.086..0.087 rows=1 loops=2390) That's quite a few more loops in the left join. Am I right to think that it's looping so much because the analyzer is so far off when guessing the rows for the left side of the join (1 vs. 1248)? Or is there something else going on? I've tried bumping up analyze stats on a few columns, but I'm not too sure how to spot which columns it might help with and, sure enough, it didn't help. The actual query: select * from ( select * from event e, severity s, premises p, log where p.clientkey = e.clientkey and p.premiseskey = e.premiseskey and p.logicaldel = 'N' and log.logicaldel = 'N' and e.clientkey = log.clientkey and e.premiseskey =
Re: [PERFORM] I/O on select count(*)
On Thu, 15 May 2008, Alvaro Herrera wrote: Starting a transaction does not write anything to pg_clog. For Matt and others, some details here are in src/backend/access/transam/README: "pg_clog records the commit status for each transaction that has been assigned an XID." "Transactions and subtransactions are assigned permanent XIDs only when/if they first do something that requires one --- typically, insert/update/delete a tuple, though there are a few other places that need an XID assigned." After reading the code and that documentation a bit, the part I'm still not sure about is whether the CLOG entry is created when the XID is assigned and then kept current as the state changes, or whether that isn't even in CLOG until the transaction is committed. It seems like the latter, but there's some ambiguity in the wording and too many code paths for me to map right now. From there, it doesn't make its way out to disk until the internal CLOG buffers are filled, at which point the least recently used buffer there is evicted to permanent storage. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow left join
On Fri, May 16, 2008 at 11:56 AM, Ben <[EMAIL PROTECTED]> wrote: > I've inherited an Oracle database that I'm porting to Postgres, and this has > been going quite well until now. Unfortunately, I've found one view (a > largish left join) that runs several orders of magnitude slower on Postgres > than it did on Oracle. > > => select version(); > version > -- > PostgreSQL 8.2.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) > 4.1.1 20070105 (Red Hat 4.1.1-52) > (1 row) 1: Update to 8.2.7. It's pretty painless, and who knows what performance bugs you might be fighting that you don't really need to. > After analyzing the database, the explain analyze output for the query is: > > Nested Loop Left Join (cost=133.51..15846.99 rows=1 width=312) (actual > time=109.131..550711.374 rows=1248 loops=1) > Join Filter: (log.logkey = ln.logkey) > -> Nested Loop (cost=133.51..267.44 rows=1 width=306) (actual > time=15.316..74.074 rows=1248 loops=1) SNIP > Total runtime: 550712.393 ms Just for giggles, try running the query like so: set enable_nestloop = off; explain analyze ... and see what happens. I'm guessing that the nested loops are bad choices here. >(case when instr(lognotetext,';') = 0 then instr(lognotetext,' has > modified') > else instr(lognotetext,';') end) = 0 then NULL Try creating indexes on the functions above, and make sure you're running the db in the C local if you can. Note you may need to dump / initdb --locale=C / reload your data if you're not in the C locale already. text_pattern_ops may be applicable here, but I'm not sure how to use it in the above functions. > Table "public.event" > Column |Type | Modifiers > +-+ > clientkey | character(30) | not null > premiseskey| character(30) | not null > eventkey | character(30) | not null > severitykey| character(30) | Do these really need to be character and not varchar? varchar / text are better optimized in pgsql, and character often need to be cast anyway, so you might as well start with varchar. Unless you REALLY need padding in your db, avoid char(x). Don't see anything else, but who knows what someone else might see. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow left join
On Fri, 16 May 2008, Scott Marlowe wrote: Just for giggles, try running the query like so: set enable_nestloop = off; explain analyze ... and see what happens. I'm guessing that the nested loops are bad choices here. You guess correctly, sir! Doing so shaves 3 orders of magnitude off the runtime. That's nice. :) But that brings up the question of why postgres thinks nested loops are the way to go? It would be handy if I could make it guess correctly to begin with and didn't have to turn nested loops off each time I run this. Table "public.event" Column |Type | Modifiers +-+ clientkey | character(30) | not null premiseskey| character(30) | not null eventkey | character(30) | not null severitykey| character(30) | Do these really need to be character and not varchar? varchar / text are better optimized in pgsql, and character often need to be cast anyway, so you might as well start with varchar. Unless you REALLY need padding in your db, avoid char(x). Unfortuantely, the people who created this database made all keys 30 character strings, and we're not near a place in our release cycle where we can fix that. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On Fri, 2008-05-16 at 14:05 -0400, Greg Smith wrote: > After reading the code and that documentation a bit, the part I'm > still not sure about is whether the CLOG entry is created when the XID > is assigned and then kept current as the state changes, or whether > that isn't even in CLOG until the transaction is committed. It seems > like the latter, but there's some ambiguity in the wording and too > many code paths for me to map right now. Alvaro already said this, I thought? The clog is updated only at sub or main transaction end, thank goodness. When the transactionid is assigned the page of the clog that contains that transactionid is checked to see if it already exists and if not, it is initialised. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow left join
On Fri, May 16, 2008 at 12:21 PM, Ben <[EMAIL PROTECTED]> wrote: > On Fri, 16 May 2008, Scott Marlowe wrote: > >> Just for giggles, try running the query like so: >> >> set enable_nestloop = off; >> explain analyze ... >> >> and see what happens. I'm guessing that the nested loops are bad choices >> here. > > You guess correctly, sir! Doing so shaves 3 orders of magnitude off the > runtime. That's nice. :) But that brings up the question of why postgres > thinks nested loops are the way to go? It would be handy if I could make it > guess correctly to begin with and didn't have to turn nested loops off each > time I run this. Well, I'm guessing that you aren't in locale=C and that the text functions in your query aren't indexed. Try creating an index on them something like: create index abc_txtfield_func on mytable (substring(textfield,1,5)); etc and see if that helps. As for the char type, I totally understand the issue, having inherited oracle dbs before... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow left join
On Fri, 16 May 2008, Scott Marlowe wrote: Well, I'm guessing that you aren't in locale=C and that the text Correct, I am not. And my understanding is that by moving to the C locale, I would loose utf8 validation, so I don't want to go there. Though, it's news to me that I would get any kind of select performance boost with locale=C. Why would it help? functions in your query aren't indexed. Try creating an index on them something like: create index abc_txtfield_func on mytable (substring(textfield,1,5)); etc and see if that helps. It does not. :( -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow left join
Ben wrote: On Fri, 16 May 2008, Scott Marlowe wrote: Well, I'm guessing that you aren't in locale=C and that the text Correct, I am not. And my understanding is that by moving to the C locale, I would loose utf8 validation, so I don't want to go there. Though, it's news to me that I would get any kind of select performance boost with locale=C. Why would it help? As far as I know the difference is that in the "C" locale PostgreSQL can use simple byte-ordinal-oriented rules for sorting, character access, etc. It can ignore the possibility of a character being more than one byte in size. It can also avoid having to consider pairs of characters where the ordinality of the numeric byte value of the characters is not the same as the ordinality of the characters in the locale (ie they don't sort in byte-value order). If I've understood it correctly ( I don't use "C" locale databases myself and I have not tested any of this ) that means that two UTF-8 encoded strings stored in a "C" locale database might not compare how you expect. They might sort in a different order to what you expect, especially if one is a 2-byte or more char and the other is only 1 byte. They might compare non-equal even though they contain the same sequence of Unicode characters because one is in a decomposed form and one is in a precomposed form. The database neither knows the encoding of the strings nor cares about it; it's just treating them as byte sequences without any interest in their meaning. If you only ever work with 7-bit ASCII, that might be OK. Ditto if you never rely on the database for text sorting and comparison. Someone please yell at me if I've mistaken something here. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Greg Smith wrote: > After reading the code and that documentation a bit, the part I'm still > not sure about is whether the CLOG entry is created when the XID is > assigned and then kept current as the state changes, or whether that > isn't even in CLOG until the transaction is committed. It seems like the > latter, but there's some ambiguity in the wording and too many code paths > for me to map right now. pg_clog is allocated in pages of 8kB apiece(*). On allocation, pages are zeroed, which is the bit pattern for "transaction in progress". So when a transaction starts, it only needs to ensure that the pg_clog page that corresponds to it is allocated, but it need not write anything to it. (*) Each transaction needs 2 bits, so on a 8 kB page there is space for 4 transactions/byte * 8 pages * 1kB/page = 32k transactions. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regexps - never completing join.
On Wed, May 14, 2008 at 9:33 AM, Rusty Conover <[EMAIL PROTECTED]> wrote: > Returning to this problem this morning, I made some more insight. > > One way I did find that worked to control the loop (but doesn't yield the > same results because its a left join) > > select wc_rule.id from wc_rule left join classifications on > classifications.classification ~* wc_rule.regexp; If you do that and exclude the extra rows added to the right with somthing like and wc_rule.somefield IS NOT NULL does it run fast and give you the same answers as the regular join? I'm guessing that this could be optimized to use a hash agg method of joining for text, but I'm no expert on the subject. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Alvaro Herrera wrote: > pg_clog is allocated in pages of 8kB apiece(*). On allocation, pages are > zeroed, which is the bit pattern for "transaction in progress". So when > a transaction starts, it only needs to ensure that the pg_clog page that > corresponds to it is allocated, but it need not write anything to it. Of course, in 8.3 it's not when the transaction starts, but when the Xid is assigned (i.e. when the transaction first calls a read-write command). In previous versions it happens when the first snapshot is taken (i.e. normally on the first command of any type, with very few exceptions.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Please ignore ...
Marc G. Fournier wrote: > Someone on this list has one of those 'confirm your email' filters on their > mailbox, which is bouncing back messages ... this is an attempt to try and > narrow down the address that is causing this ... So it seems you're still unable to determine the problematic address? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regexps - never completing join.
On May 16, 2008, at 2:35 PM, Scott Marlowe wrote: On Wed, May 14, 2008 at 9:33 AM, Rusty Conover <[EMAIL PROTECTED]> wrote: Returning to this problem this morning, I made some more insight. One way I did find that worked to control the loop (but doesn't yield the same results because its a left join) select wc_rule.id from wc_rule left join classifications on classifications.classification ~* wc_rule.regexp; If you do that and exclude the extra rows added to the right with somthing like and wc_rule.somefield IS NOT NULL does it run fast and give you the same answers as the regular join? I'm guessing that this could be optimized to use a hash agg method of joining for text, but I'm no expert on the subject. Hi Scott, It's not really a hash agg problem really just a looping inside/ outside table selection problem. The slowdown is really the compilation of the regexp repeatedly by RE_compile_and_cache() because the regexps are being run on the inside of the loop rather then the outside. And since the regexp cache is only 32 items big, the every match is resulting in a recompilation of the regexp since I have about 700 regexps. Thanks, Rusty -- Rusty Conover InfoGears Inc. http://www.infogears.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regexps - never completing join.
On Fri, May 16, 2008 at 3:37 PM, Rusty Conover <[EMAIL PROTECTED]> wrote: > > On May 16, 2008, at 2:35 PM, Scott Marlowe wrote: > >> On Wed, May 14, 2008 at 9:33 AM, Rusty Conover <[EMAIL PROTECTED]> >> wrote: >>> >>> Returning to this problem this morning, I made some more insight. >>> >>> One way I did find that worked to control the loop (but doesn't yield the >>> same results because its a left join) >>> >>> select wc_rule.id from wc_rule left join classifications on >>> classifications.classification ~* wc_rule.regexp; >> >> If you do that and exclude the extra rows added to the right with somthing >> like >> >> and wc_rule.somefield IS NOT NULL >> >> does it run fast and give you the same answers as the regular join? >> >> I'm guessing that this could be optimized to use a hash agg method of >> joining for text, but I'm no expert on the subject. > > Hi Scott, > > It's not really a hash agg problem really just a looping inside/outside > table selection problem. > > The slowdown is really the compilation of the regexp repeatedly by > RE_compile_and_cache() because the regexps are being run on the inside of > the loop rather then the outside. And since the regexp cache is only 32 > items big, the every match is resulting in a recompilation of the regexp > since I have about 700 regexps. That's not what I meant. What I meant was it seems like a good candidate for a hash aggregate solution. I'm pretty sure pgsql can't use hashagg for something like this right now. If you hashagged each regexp and each column fed through it, you could probably get good performance. but that's a backend hacker thing, not something I'd know how to do. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Greg Smith wrote: >> After reading the code and that documentation a bit, the part I'm still >> not sure about is whether the CLOG entry is created when the XID is >> assigned and then kept current as the state changes, or whether that >> isn't even in CLOG until the transaction is committed. > pg_clog is allocated in pages of 8kB apiece(*). On allocation, pages are > zeroed, which is the bit pattern for "transaction in progress". So when > a transaction starts, it only needs to ensure that the pg_clog page that > corresponds to it is allocated, but it need not write anything to it. One additional point: this means that one transaction in every 32K writing transactions *does* have to do extra work when it assigns itself an XID, namely create and zero out the next page of pg_clog. And that doesn't just slow down the transaction in question, but the next few guys that would like an XID but arrive on the scene while the zeroing-out is still in progress. This probably contributes to the behavior that Simon and Josh regularly complain about, that our transaction execution time is subject to unpredictable spikes. I'm not sure how to get rid of it though. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance