[PERFORM] Is Query need to be optimized
ops=1) Hash Cond: (l.source_id = m.doc_id) -> Seq Scan on loc_context_demo l (cost=0.00..48108.71 rows=356 width=18) (actual time=0.986..316.129 rows=816 loops=1) Filter: ((geom && '010320E61001000A00935A97CF5D7E3A408BA46A930EAF5740935A97CF5D7E3A408BA46A930EAF5740F023C92F807E3 A403D5E90251CAF5740B2BD8E20E745394059E2DB9683CD5640A6A712BBC793364091548ABA9CD0564002B050A337C93540EBA0A9236E785740319F7772E0E33840758E85A069CA574003618D4205 E43840B48CC28F72CA574003618D4205E43840B48CC28F72CA5740935A97CF5D7E3A408BA46A930EAF5740'::geometry) AND _st_within(geom, '010320E61001000A0093 5A97CF5D7E3A408BA46A930EAF5740935A97CF5D7E3A408BA46A930EAF5740F023C92F807E3A403D5E90251CAF5740B2BD8E20E745394059E2DB9683CD5640A6A712BBC793364091548ABA9CD0564 002B050A337C93540EBA0A9236E785740319F7772E0E33840758E85A069CA574003618D4205E43840B48CC28F72CA574003618D4205E43840B48CC28F72CA5740935A97CF5D7E3A408BA46A930EAF 5740'::geometry)) -> Hash (cost=37186.32..37186.32 rows=9248 width=55) (actual time=190.396..190.396 rows=9016 loops=1) -> Seq Scan on metadata_demo m (cost=0.00..37186.32 rows=9248 width=55) (actual time=38.895..183.396 rows=9016 loops=1) Filter: (doc_category = 'Terrorism'::bpchar) Total runtime: 1210.112 ms (15 rows) Best regards, Adarsh Sharma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Is it require further tuning
ops=1) Total runtime: 64.396 ms Don't know why it uses Seq Scan on loc_context_terror as i have indexes on the desired columns as well. Thanks & best Regards, Adarsh Sharma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to tune this query
8C5BE4C64526406D4676A905705240A55424D1E75F28402385E2519739524084C31EB73D7C2A401E60 E240D11D5240D786518A3F712D40F35BED31F40E52403C8BF8C96E62304026AE5FD5A7F15140A381208F97D531405E60389EC4E1514086BD630860DE324036012694EDDF5140AD741D40E17C3340E 00EA98AC6DE5140E9339DDE5C7D334057D066D3D6DE5140F428BE555EE63340D2983401FADD51406269CD198B4F3440B9FC8ECE16DD5140BC38DFDB0B503440CA8056C85BDD514085A28D108C5034 40E0F9A841A1DD514003F100B50B51344028F11A3BE7DD5140824556C68A513440364940B52DDE514052B27C71748834406197CA83AEF851402902454461C434405BB0871D3D30524058A819DA7FC A3440C6EEBDFC22685240D7C07A36B7CA34407CC17F7F32685240F442A351EECA34402D04B1E84168524022BBE72B25CB344026AB843851685240FB4E9CC55BCB3440B8302D6F606852406DBD141F 92CB34408E96DC8C6F685240E45EA438C8CB3440FE66C4917E68524082259E12FECB344054B6157E8D685240419D54AD33CC3440182401529C68524030EC190969CC344052DCB60DAB685240ADD23 F269ECC3440CA9866B1B968524092AB1705D3CC344044A23F3DC8685240756CF2A507CD3440B8D170B1D6685240406C9864EAFA3340D89D889DF98452403FAD44C337FB334042381684FE84524089 B70D3485FB33403B021A7503855240EAD919B7D2FB3340E43D9E70088552406A7C8F4C20FC3340EA46AD760D855240772095F46DFC3340AD92518712855240146151AFBBFC334075B095A21785524 015F3EA7C09FD33409A4984C81C855240270BB5E1518C334075A7F23A044E5240'::geometry)) -> Hash (cost=43457.32..43457.32 rows=82682 width=55) (actual time=255.707..255.707 rows=82443 loops=1) -> Seq Scan on metadata_demo m (cost=0.00..43457.32 rows=82682 width=55) (actual time=0.013..230.904 rows=82443 loops=1) Filter: (doc_category = 'Naxalism'::bpchar) -> Index Scan using idx_crawled_id on page_content_demo p (cost=0.00..9.24 rows=1 width=1353) (actual time=4.822..4.946 rows=1 loops=4129) Index Cond: (p.crawled_page_id = l.source_id) Total runtime: 21379.870 ms (14 rows) Yesterday after some Performance tuning ( shared-buffers=1GB,effective cache-size=2Gb, work mem=64MB, maintenance_work_mem=256MB) and creating indexes as : CREATE INDEX idx1_source_id_l2 ON l1 USING btree(source_id,lat,lon); CREATE INDEX idx_doc_id_m1 ON m1 USING btree(doc_id,doc_category); CREATE INDEX idx_crawled_id_p1 ON p1 USING btree (crawled_page_id,heading,category,crawled_page_url); my Total runtime := Total runtime: 704.383 ms And if run the same explain analyze command again ,Total runtime: 696.856 ms What is the reason that first time it takes so much time and I know second time , Postgres uses cache . Is it possible to make it run faster at the first time too. Please let me know. Thanks & best Regards, Adarsh Sharma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Help with Query Tuning
Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1 ON page_content USING btree (crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ** Error ** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh Sharma
Re: [PERFORM] Help with Query Tuning
Thanks Marshall, would I need to change the data type of *content *column to tsvector and create a Gist Index on it. Best Regards, Adarsh Kenneth Marshall wrote: On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote: Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1 ON page_content USING btree (crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ** Error ** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh Sharma You should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
Re: [PERFORM] Help with Query Tuning
Thanks, I understand it know :- But My one doubt which isn't clear : *Original Query :-* select count(*) from page_content where (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%'); *Output :-* count --- 57061 (1 row) Time: 19726.555 ms I need to tune it , use full-text searching as : *Modified Query :- *SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 'dsf' || 'ssb'); *Output :-* count --- 0 (1 row) Time: 194685.125 ms * *I try, SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || '%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || '%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' || '%cops%' || '%crpf%' || '%dsf%' || '%ssb%'); count --- 0 (1 row) Time: 194722.468 ms I know I have to create index but index is the next step, first you have to get the correct result . CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', content)); Please guide me where I am going wrong. Thanks & best Regards, Adarsh Sharma Kenneth Marshall wrote: On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote: Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1 ON page_content USING btree (crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ** Error ** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh Sharma You should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
Re: [PERFORM] Help with Query Tuning
Thanks , it works now .. :-) Here is the output : pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | ' pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | ' pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | ' pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); count 137193 (1 row) Time: 195441.894 ms But my original query is to use AND also i.e select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%'); count --- 57061 (1 row) Time: 19423.087 ms Now I have to add AND condition ( AND (content like '%kill%' OR content like '%injure%') ) also. Thanks & Regards, Adarsh Sharma t...@fuzzy.cz wrote: t...@fuzzy.cz wrote: Yes , I think we caught the problem but it results in the below error : SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvector('english',content) @@ to_tsquery('english','Mujahid ' || 'jihad ' || 'Militant ' || 'fedayeen ' || 'insurgent ' || 'terrORist ' || 'cadre ' || 'civilians ' || 'police ' || 'defence ' || 'cops ' || 'crpf ' || 'dsf ' || 'ssb'); ERROR: syntax error in tsquery: "Mujahid jihad Militant fedayeen insurgent terrORist cadre civilians police defence cops crpf dsf ssb" The text passed to to_tsquery has to be a proper query, i.e. single tokens separated by boolean operators. In your case, you should put there '|' (which means OR) to get something like this 'Mujahid | jihad | Militant | ...' or you can use plainto_tsquery() as that accepts simple text, but it puts '&' (AND) between the tokens and I guess that's not what you want. Tomas What to do to make it satisfies the OR condition to match any of the to_tsquery values as we got it right through like '%Mujahid' or . or You can't force the plainto_tsquery to somehow use the OR instead of AND. You need to modify the piece of code that produces the search text to put there '|' characters. So do something like this SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen); Not sure where does this text come from, but you can do this in a higher level language, e.g. in PHP. Something like this $words = implode(' | ', explode(' ',$text)); and then pass the $words into the query. Or something like that. Tomas
[PERFORM] Reason of Slowness of query
Dear all, I have 2 tables in my database name clause2( 4900 MB) & page_content(1582 MB). My table definations are as : *page_content :- *CREATE TABLE page_content ( content_id integer, wkb_geometry geometry, link_level integer, isprocessable integer, isvalid integer, isanalyzed integer, islocked integer, content_language character(10), url_id integer, publishing_date character(40), heading character(150), category character(150), crawled_page_url character(500), keywords character(500), dt_stamp timestamp with time zone, "content" character varying, crawled_page_id bigint, id integer ) WITH ( OIDS=FALSE ); *Indexes on it :-* CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id, content_language, publishing_date, isprocessable); CREATE INDEX pgweb_idx ON page_content USING gin (to_tsvector('english'::regconfig, content::text)); *clause 2:- *CREATE TABLE clause2 ( id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass), source_id integer, sentence_id integer, clause_id integer, tense character varying(30), clause text, CONSTRAINT pk_clause_demo_id PRIMARY KEY (id) )WITH ( OIDS=FALSE); *Indexes on it : *CREATE INDEX idx_clause2_march10 ON clause2 USING btree (id, source_id);* *I perform a join query on it as : * explain analyze select distinct(p.crawled_page_id) from page_content p , clause2 c where p.crawled_page_id != c.source_id ; *What it takes more than 1 hour to complete. As I issue the explain analyze command and cannot able to wait for output but I send my explain output as : QUERY PLAN Unique (cost=927576.16..395122387390.13 rows=382659 width=8) -> Nested Loop (cost=927576.16..360949839832.15 rows=13669019023195 width=8) Join Filter: (p.crawled_page_id <> c.source_id) -> Index Scan using idx_page_id on page_content p (cost=0.00..174214.02 rows=428817 width=8) -> Materialize (cost=927576.16..1370855.12 rows=31876196 width=4) -> Seq Scan on clause2 c (cost=0.00..771182.96 rows=31876196 width=4) (6 rows) Please guide me how to make the above query run faster as I am not able to do that. Thanks, Adarsh * *
Re: [PERFORM] Reason of Slowness of query
Thanks Chetan, here is the output of your updated query : *explain select distinct(p.crawled_page_id) from page_content p where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); * QUERY PLAN --- HashAggregate (cost=1516749.47..1520576.06 rows=382659 width=8) -> Hash Anti Join (cost=1294152.41..1515791.80 rows=383071 width=8) Hash Cond: (p.crawled_page_id = c.source_id) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) -> Hash (cost=771182.96..771182.96 rows=31876196 width=4) -> Seq Scan on clause2 c (cost=0.00..771182.96 rows=31876196 width=4) (6 rows) And my explain analyze output is : QUERY PLAN HashAggregate (cost=1516749.47..1520576.06 rows=382659 width=8) (actual time=5.181..56669.270 rows=72 loops=1) -> Hash Anti Join (cost=1294152.41..1515791.80 rows=383071 width=8) (actual time=45740.789..56665.816 rows=74 loops=1) Hash Cond: (p.crawled_page_id = c.source_id) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.012..715.915 rows=428467 loops=1) -> Hash (cost=771182.96..771182.96 rows=31876196 width=4) (actual time=45310.524..45310.524 rows=31853083 loops=1) -> Seq Scan on clause2 c (cost=0.00..771182.96 rows=31876196 width=4) (actual time=0.055..23408.884 rows=31853083 loops=1) Total runtime: 56687.660 ms (7 rows) But Is there is any option to tune it further and one more thing output rows varies from 6 to 7. Thanks & best Regards, Adarsh Sharma Chetan Suttraway wrote: On Wed, Mar 23, 2011 at 11:58 AM, Adarsh Sharma mailto:adarsh.sha...@orkash.com>> wrote: Dear all, I have 2 tables in my database name clause2( 4900 MB) & page_content(1582 MB). My table definations are as : *page_content :- *CREATE TABLE page_content ( content_id integer, wkb_geometry geometry, link_level integer, isprocessable integer, isvalid integer, isanalyzed integer, islocked integer, content_language character(10), url_id integer, publishing_date character(40), heading character(150), category character(150), crawled_page_url character(500), keywords character(500), dt_stamp timestamp with time zone, "content" character varying, crawled_page_id bigint, id integer ) WITH ( OIDS=FALSE ); *Indexes on it :-* CREATE INDEX idx_page_id ON page_content USING btree (crawled_page_id); CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id, content_language, publishing_date, isprocessable); CREATE INDEX pgweb_idx ON page_content USING gin (to_tsvector('english'::regconfig, content::text)); *clause 2:- *CREATE TABLE clause2 ( id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass), source_id integer, sentence_id integer, clause_id integer, tense character varying(30), clause text, CONSTRAINT pk_clause_demo_id PRIMARY KEY (id) )WITH ( OIDS=FALSE); *Indexes on it : *CREATE INDEX idx_clause2_march10 ON clause2 USING btree (id, source_id);* *I perform a join query on it as : * explain analyze select distinct(p.crawled_page_id) from page_content p , clause2 c where p.crawled_page_id != c.source_id ; *What it takes more than 1 hour to complete. As I issue the explain analyze command and cannot able to wait for output but I send my explain output as : QUERY PLAN Unique (cost=927576.16..395122387390.13 rows=382659 width=8) -> Nested Loop (cost=927576.16..360949839832.15 rows=13669019023195 width=8) Join Filter: (p.crawled_page_id <> c.source_id) -> Index Scan using idx_page_id on page_content p (cost=0.00..174214.02 rows=428817 width=8) -> Materialize (cost=927576.16..1370855.12 rows=31876196 width=4) -> Seq Scan on clause2 c (cost=0.00..771182.96 rows=31876196 width=4) (6 rows) Please guide me how to make the above query run faster as I am not able to do that. Thanks,
[PERFORM] Re-Reason of Slowness of Query
Thanks Chetan, After my Lunch Break, I tried the below steps : *My original query was : *explain analyze select distinct(p.crawled_page_id) from page_content p , clause2 c where p.crawled_page_id != c.source_id which hangs because it is wrong query to fetch the desired output . *Next Updated Query be Chetan Suttraway :* explain analyze select distinct(p.crawled_page_id) from page_content p where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7192.843..7195.923 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.040..7192.426 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.009..395.599 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.014..0.014 rows=1 loops=428467) Index Cond: (c.source_id = p.crawled_page_id) Total runtime: 7199.748 ms (6 rows) I think it is very much faster but I don't understand the query : *explain select distinct(b) from t1,t2 where t1.b >t2.d union all select distinct(b) from t1,t2 where t1.b *As i transform it into my format as: explain select distinct(p.crawled_page_id) from page_content p , clause2 c where p.crawled_page_id > c.source_id union all select distinct(p.crawled_page_id) from page_content p,clause2 c where p.crawled_page_id < c.source_id; QUERY PLAN - Append (cost=0.00..296085951076.34 rows=765318 width=8) -> Unique (cost=0.00..148042971711.58 rows=382659 width=8) -> Nested Loop (cost=0.00..136655213119.84 rows=4555103436696 width=8) -> Index Scan using idx_page_id on page_content p (cost=0.00..174214.02 rows=428817 width=8) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..185898.05 rows=10622488 width=4) Index Cond: (p.crawled_page_id > c.source_id) -> Unique (cost=0.00..148042971711.58 rows=382659 width=8) -> Nested Loop (cost=0.00..136655213119.84 rows=4555103436696 width=8) -> Index Scan using idx_page_id on page_content p (cost=0.00..174214.02 rows=428817 width=8) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..185898.05 rows=10622488 width=4) Index Cond: (p.crawled_page_id < c.source_id) (11 rows) I don't think this is correct because it produce 11 rows output. Any further suggestions, Please guide. Thanks & best Regards, Adarsh Sharma
Re: [PERFORM] Re-Reason of Slowness of Query
I just want to retrieve that id 's from page_content which do not have any entry in clause2 table. Thanks , Adarsh Vitalii Tymchyshyn wrote: 23.03.11 11:17, Adarsh Sharma ???(??): I think it is very much faster but I don't understand the query : *explain select distinct(b) from t1,t2 where t1.b >t2.d union all select distinct(b) from t1,t2 where t1.b * I don't understand it too. What are you trying to get? Is it select distinct(b) from t1 where b > (select min(d) from t2)* *or b < (select max(d) from t2) ? Can you explain in words, not SQL, what do you expect do retrieve? Best regards, Vitalii Tymchyshyn
Re: [PERFORM] Re-Reason of Slowness of Query
Vitalii Tymchyshyn wrote: 23.03.11 12:10, Adarsh Sharma ???(??): I just want to retrieve that id 's from page_content which do not have any entry in clause2 table. Then select distinct(p.crawled_page_id) from page_content p where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); is correct query. I can't understand how* select 1 from clause2 c where c.source_id = p.crawled_page_id works too, *i get my output . What is the significance of 1 here. Thanks , Adarsh ** Best regards, Vitalii Tymchyshyn.
Re: [PERFORM] Re-Reason of Slowness of Query
Thank U all, for U'r Nice Support. Let me Conclude the results, below results are obtained after finding the needed queries : *First Option : *pdc_uima=# explain analyze select distinct(p.crawled_page_id) pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id = pdc_uima(# c.source_id) where (c.source_id is null); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=87927.000..87930.084 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.191..87926.546 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 loops=428467) Index Cond: (p.crawled_page_id = c.source_id) Total runtime: 87933.882 ms :-( (6 rows) *Second Option : *pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p pdc_uima-# where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7047.259..7050.261 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.039..7046.826 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 loops=428467) Index Cond: (c.source_id = p.crawled_page_id) Total runtime: 7054.074 ms :-) (6 rows) Thanks & best Regards, Adarsh Sharma Chetan Suttraway wrote: On Wed, Mar 23, 2011 at 4:08 PM, mailto:t...@fuzzy.cz>> wrote: > I just want to retrieve that id 's from page_content which do not have > any entry in clause2 table. In that case the query probably does not work (at least the query you've sent in the first post) as it will return even those IDs that have at least one other row in 'clause2' (not matching the != condition). At least that's how I understand it. true. So instead of this select distinct(p.crawled_page_id) from page_content p, clause2 c where p.crawled_page_id != c.source_id ; you should probably do this select distinct(p.crawled_page_id) from page_content p left join clause2 c on (p.crawled_page_id = c.source_id) where (c.source_id is null); I guess this will be much more efficient too. This looks like to give expected results. Also note that the where clause "is null" is really required and is not an optional predicate. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org <mailto:pgsql-performance@postgresql.org>) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Regards, Chetan Suttraway EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL <http://www.enterprisedb.com/> company.
Re: [PERFORM] Re-Reason of Slowness of Query
Vitalii Tymchyshyn wrote: 23.03.11 13:21, Adarsh Sharma ???(??): Thank U all, for U'r Nice Support. Let me Conclude the results, below results are obtained after finding the needed queries : *First Option : *pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p left join clause2 c on (p.crawled_page_id = c.source_id) where (c.source_id is null); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=87927.000..87930.084 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.191..87926.546 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 loops=428467) Index Cond: (p.crawled_page_id = c.source_id) Total runtime: 87933.882 ms :-( (6 rows) *Second Option : *pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p pdc_uima-# where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7047.259..7050.261 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.039..7046.826 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 loops=428467) Index Cond: (c.source_id = p.crawled_page_id) Total runtime: 7054.074 ms :-) (6 rows) Actually the plans are equal, so I suppose it depends on what were run first :). Slow query operates with data mostly on disk, while fast one with data in memory. Yes U 'r absolutely right, if I run it again, it display the output as : pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p left join clause2 c on (p.crawled_page_id = c.source_id) where (c.source_id is null); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7618.452..7621.427 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.131..7618.043 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.020..472.811 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.015..0.015 rows=1 loops=428467) Index Cond: (p.crawled_page_id = c.source_id) Total runtime: 7637.132 ms (6 rows) I let U know after a fresh start (session ). Then the true result comes and if further tuning required can be performed. Best Regards, Adarsh Best regards, Vitalii Tymchyshyn
Re: [PERFORM] Re-Reason of Slowness of Query
Actually the plans are equal, so I suppose it depends on what were run first :). Slow query operates with data mostly on disk, while fast one with data in memory. yeah. maybe the easiest way, is to start a fresh session and fire the queries. After the fresh start , the results obtained are : pdc_uima=# explain analyze select distinct(p.crawled_page_id) pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id = pdc_uima(# c.source_id) where (c.source_id is null); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7725.132..7728.341 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.115..7724.713 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.021..472.199 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.015..0.015 rows=1 loops=428467) Index Cond: (p.crawled_page_id = c.source_id) Total runtime: 7731.840 ms (6 rows) pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p pdc_uima-# where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=6192.249..6195.368 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.036..6191.838 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.008..372.489 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.012..0.012 rows=1 loops=428467) Index Cond: (c.source_id = p.crawled_page_id) Total runtime: 6198.567 ms (6 rows) This seems a slight upper hand of the second query . Would it be possible to tune it further. My postgresql.conf parameters are as follows : ( Total RAM = 16 GB ) shared_buffers = 4GB max_connections=700 effective_cache_size = 6GB work_mem=16MB maintenance_mem=64MB I think to change work_mem=64MB maintenance_mem=256MB Does it has some effects now. Thanks & best Regards, Adarsh Sharma Best regards, Vitalii Tymchyshyn -- Regards, Chetan Suttraway EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL <http://www.enterprisedb.com/> company.
Re: [PERFORM] Re-Reason of Slowness of Query
t...@fuzzy.cz wrote: On 03/23/2011 04:17 AM, Adarsh Sharma wrote: explain analyze select distinct(p.crawled_page_id) from page_content p where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); You know... I'm surprised nobody has mentioned this, but DISTINCT is very slow unless you have a fairly recent version of Postgres that replaces it with something faster. Try this: Nobody mentioned that because the explain plan already uses hash aggregate (instead of the old sort) HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7047.259..7050.261 rows=72 loops=1) which means this is at least 8.4. Plus the 'distinct' step uses less than 1% of total time, so even if you improve it the impact will be minimal. Yes, U"r absolutely right I am using Version 8.4SS and i am satisfied with the below query results: pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p pdc_uima-# where NOT EXISTS (select 1 from clause2 c where c.source_id = p.crawled_page_id); QUERY PLAN - HashAggregate (cost=100278.16..104104.75 rows=382659 width=8) (actual time=5149.308..5152.251 rows=72 loops=1) -> Nested Loop Anti Join (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.119..5148.954 rows=74 loops=1) -> Seq Scan on page_content p (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.021..444.487 rows=428467 loops=1) -> Index Scan using idx_clause2_source_id on clause2 c (cost=0.00..18.18 rows=781 width=4) (actual time=0.009..0.009 rows=1 loops=428467) Index Cond: (c.source_id = p.crawled_page_id) Total runtime: 5155.874 ms (6 rows) I don't think that the above results are optimized further. Thanks & best Regards, Adarsh Sharma regards Tomas
[PERFORM] Why Index is not used
Dear all, Today I got to run a query internally from my application by more than 10 connections. But The query performed very badly. A the data size of tables are as : pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2')); pg_size_pretty 5858 MB (1 row) pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty 4719 MB (1 row) I explain the query as after making the indexes as : pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c. pdc_uima-# sentence_id=s.sentence_id ; QUERY PLAN -- Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id)) -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) Sort Key: c.clause_id, c.source_id, c.sentence_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) Indexes are : CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id); CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id); I don't know why it not uses the index scan for clause2 table. Any suggestions to tune the query. Thanks & best Regards, Adarsh Sharma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Index is not used
Thanks Andreas, I was about print the output but it takes too much time. Below is the output of explain analyze command : pdc_uima=# explain analyze select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c. pdc_uima-# sentence_id=s.sentence_id ; QUERY PLAN Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) (actual time=216281.162..630721.636 rows=30473117 loops=1) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id)) -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) (actual time=0.130..177599.310 rows=27471560 loops=1) -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) (actual time=216280.596..370507.452 rows=52037763 loops=1) -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) (actual time=216280.591..324707.956 rows=31853083 loops=1) Sort Key: c.clause_id, c.source_id, c.sentence_id Sort Method: external merge Disk: 2616520kB -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) (actual time=0.025..25018.665 rows=31853083 loops=1) Total runtime: 647804.037 ms (9 rows) Thanks , Adarsh Andreas Kretschmer wrote: Adarsh Sharma wrote: Dear all, Today I got to run a query internally from my application by more than 10 connections. But The query performed very badly. A the data size of tables are as : pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2')); pg_size_pretty 5858 MB (1 row) pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty 4719 MB (1 row) I explain the query as after making the indexes as : pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c. pdc_uima-# sentence_id=s.sentence_id ; QUERY PLAN -- Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id)) -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) Sort Key: c.clause_id, c.source_id, c.sentence_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) Indexes are : CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id); CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id); I don't know why it not uses the index scan for clause2 table. How many rows contains clause2? The planner expected 167324179 returning rows, can you run the same explain with ANALYSE to see the real amount of returning rows? Andreas
Re: [PERFORM] Why Index is not used
Chetan Suttraway wrote: On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma mailto:adarsh.sha...@orkash.com>> wrote: Dear all, Today I got to run a query internally from my application by more than 10 connections. But The query performed very badly. A the data size of tables are as : pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2')); pg_size_pretty 5858 MB (1 row) pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty 4719 MB (1 row) I explain the query as after making the indexes as : pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c. pdc_uima-# sentence_id=s.sentence_id ; QUERY PLAN -- Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id)) -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) Sort Key: c.clause_id, c.source_id, c.sentence_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) Indexes are : CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id); CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id); I don't know why it not uses the index scan for clause2 table. In this case, there are no predicates or filters on individual table. (maybe something like c.source_id=10) so either of the 2 tables will have to go for simple scan. Are you expecting seq. scan on svo2 and index scan on clause2? As per the size consideration and the number of rows, I think index scan on clause2 is better. Your constraint is valid but I need to perform this query faster. What is the reason behind the seq scan of clause2. Regards, Adarsh
Re: [PERFORM] Why Index is not used
Could you please post output of below queries: explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id; explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id; explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ; As per your instructions, Please check the below output :- pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id; QUERY PLAN - Hash Join (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053) Hash Cond: (c.clause_id = s.clause_id) -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64) -> Hash (cost=697537.60..697537.60 rows=27471560 width=1993) -> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993) (5 rows) pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id; QUERY PLAN --- Merge Join (cost=43635232.12..358368926.66 rows=20954686217 width=2053) Merge Cond: (c.source_id = s.doc_id) -> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64) Sort Key: c.source_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64) -> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993) -> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993) Sort Key: s.doc_id -> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993) (9 rows) pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ; QUERY PLAN --- Merge Join (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053) Merge Cond: (c.sentence_id = s.sentence_id) -> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64) Sort Key: c.sentence_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64) -> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993) -> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993) Sort Key: s.sentence_id -> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993) (9 rows) Please let me know if any other information is required. -- Best Regards, Adarsh Sharma
Re: [PERFORM] Why Index is not used
Chetan Suttraway wrote: On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma mailto:adarsh.sha...@orkash.com>> wrote: Could you please post output of below queries: explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id; explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id; explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ; As per your instructions, Please check the below output :- pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id; QUERY PLAN - Hash Join (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053) Hash Cond: (c.clause_id = s.clause_id) -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64) -> Hash (cost=697537.60..697537.60 rows=27471560 width=1993) -> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993) (5 rows) pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id; QUERY PLAN --- Merge Join (cost=43635232.12..358368926.66 rows=20954686217 width=2053) Merge Cond: (c.source_id = s.doc_id) -> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64) Sort Key: c.source_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64) -> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993) -> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993) Sort Key: s.doc_id -> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993) (9 rows) pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ; QUERY PLAN --- Merge Join (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053) Merge Cond: (c.sentence_id = s.sentence_id) -> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64) Sort Key: c.sentence_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64) -> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993) -> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993) Sort Key: s.sentence_id -> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993) (9 rows) Please let me know if any other information is required. -- Best Regards, Adarsh Sharma The ideas is to have maximum filtering occuring on leading column of index. the first plan with only the predicates on clause_id is returning 379772050555842 rows whereas in the second plan with doc_id predicates is returning only 20954686217. So maybe you should consider re-ordering of the index on clause2. I am thinking that you created the indexes by looking at the columns used in the where clause. But its not always helpful to create indexes based on exact order of predicates specified in query. Instead the idea should be consider the predicate which is going to do filter out the results. Likewise we should consider all possible uses of index columns across all queries and then decide on the order of columns for the composite index to be created. Whats your take on this? I am sorry but I am not able to got your points completely. My table definitions are as : *Clause2 Table : *CREATE TABLE clause2 ( id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass), source_id integer, sentence_id integer, clause_id integer, tense character varying(30), clause text, CONSTRAINT pk_clause_demo_id PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id); *svo2 table :*-- CREATE TABLE svo2 ( svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass), doc_id integer, sentence_id integer, clause_id integer, negation integer, subject character varying(3000), verb character varying(3000), "object" character varying(3000), preposition character varying(3000), subject_type character varying(3000), object_type character varying(3000), subject_attribute character varying(3000), object_attribute character varying(3000), ver
[PERFORM] Postgres Performance Tuning
Dear all, I have a Postgres database server with 16GB RAM. Our application runs by making connections to Postgres Server from different servers and selecting data from one table & insert into remaining tables in a database. Below is the no. of connections output :- postgres=# select datname,numbackends from pg_stat_database; datname | numbackends ---+- template1 | 0 template0 | 0 postgres | 3 template_postgis | 0 pdc_uima_dummy| 107 pdc_uima_version3 | 1 pdc_uima_olap | 0 pdc_uima_s9 | 3 pdc_uima | 1 (9 rows) I am totally confused for setting configuration parameters in Postgres Parameters :- First of all, I research on some tuning parameters and set mu postgresql.conf as:- max_connections = 1000 shared_buffers = 4096MB temp_buffers = 16MB work_mem = 64MB maintenance_work_mem = 128MB wal_buffers = 32MB checkpoint_segments = 3 random_page_cost = 2.0 effective_cache_size = 8192MB Then I got some problems from Application Users that the Postgres Slows down and free commands output is :- [root@s8-mysd-2 ~]# free -g total used free sharedbuffers cached Mem:15 15 0 0 0 14 -/+ buffers/cache: 0 14 Swap: 16 0 15 [root@s8-mysd-2 ~]# free total used free sharedbuffers cached Mem: 16299476 16202264 97212 0 58924 15231852 -/+ buffers/cache: 911488 15387988 Swap: 16787884 153136 16634748 I think there may be some problem in my Configuration parameters and change it as : max_connections = 700 shared_buffers = 4096MB temp_buffers = 16MB work_mem = 64MB maintenance_work_mem = 128MB wal_buffers = 32MB checkpoint_segments = 32 random_page_cost = 2.0 effective_cache_size = 4096MB but Still Postgres Server uses Swap Memory While SELECT & INSERT into database tables. Please check the attached postgresql.conf . And also have some views on how to tune this server. DO I need to Increase my RAM s.t I hit H/W limitation. Thanks & best Regards, Adarsh Sharma # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The "=" is optional.) Whitespace may be used. Comments are introduced with # "#" anywhere on a line. The complete list of parameter names and allowed # values can be found in the PostgreSQL documentation. # # The commented-out settings shown in this file represent the default values. # Re-commenting a setting is NOT sufficient to revert it to the default value; # you need to reload the server. # # This file is read on server startup and when the server receives a SIGHUP # signal. If you edit the file on a running system, you have to SIGHUP the # server for the changes to take effect, or use "pg_ctl reload". Some # parameters, which are marked below, require a server shutdown and restart to # take effect. # # Any parameter can also be given as a command-line option to the server, e.g., # "postgres -c log_connections=on". Some parameters can be changed at run time # with the "SET" SQL command. # # Memory units: kB = kilobytesTime units: ms = milliseconds #MB = megabytes s = seconds #GB = gigabytes min = minutes # h = hours # d = days #-- # FILE LOCATIONS #-- # The default values of these variables are driven from the -D command-line # option or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory # (change requires restart) #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file # (change requires restart) #ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file # (change requires restart) # If external_pid_file is not explicitly set, no extra PID file is written. #external_pid_file = '(none)' # write an extra PID file # (change requires restart) #-- # CONNECTIONS AND AUTHENTICATION #---
Re: [PERFORM] Postgres Performance Tuning
t...@fuzzy.cz wrote: max_connections = 700 shared_buffers = 4096MB temp_buffers = 16MB work_mem = 64MB maintenance_work_mem = 128MB wal_buffers = 32MB checkpoint_segments = 32 random_page_cost = 2.0 effective_cache_size = 4096MB First of all, there's no reason to increase wal_buffers above 32MB. AFAIK the largest sensible value is 16MB - I doubt increasing it further will improve performance. Second - effective_cache_size is just a hint how much memory is used by the operating system for filesystem cache. So this does not influence amount of allocated memory in any way. but Still Postgres Server uses Swap Memory While SELECT & INSERT into database tables. Are you sure it's PostgreSQL. What else is running on the box? Have you analyzed why the SQL queries are slow (using EXPLAIN)? Thanks , Below is my action points :- max_connections = 300 ( I don't think that application uses more than 300 connections ) shared_buffers = 4096MB temp_buffers = 16MB work_mem = 64MB maintenance_work_mem = 128MB wal_buffers = 16MB ( As per U'r suggestions ) checkpoint_segments = 32 random_page_cost = 2.0 effective_cache_size = 8192MB ( Recommended 50% of RAM ) My Shared Memory Variables are as:- [root@s8-mysd-2 ~]# cat /proc/sys/kernel/shmmax 6442450944 [root@s8-mysd-2 ~]# cat /proc/sys/kernel/shmall 6442450944 [root@s8-mysd-2 ~] Please let me know if any parameter need some change. As now I am going change my parameters as per the below link :- http://airumman.blogspot.com/2011/03/postgresql-parameters-for-new-dedicated.html But one thing I am not able to understand is :- Start the server and find out how much memory is still available for the OS filesystem cache U'r absolutely right I am also researching on the explain of all select statements and i find one reason of poor indexing on TEXT columns. Thanks & best Regards, Adarsh Sharma regards Tomas
Re: [PERFORM] Postgres Performance Tuning
My database size is :- postgres=# select pg_size_pretty(pg_database_size('pdc_uima_dummy')); pg_size_pretty 49 GB (1 row) I have a doubt regarding postgres Memory Usage :- Say my Application makes Connection to Database Server ( *.*.*.106) from (*.*.*.111, *.*.*.113, *.*.*.114) Servers and I check the top command as :-- Say it makes 100 Connections top - 17:01:02 up 5:39, 4 users, load average: 0.00, 0.00, 0.00 Tasks: 170 total, 1 running, 169 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% us, 0.2% sy, 0.0% ni, 99.6% id, 0.1% wa, 0.0% hi, 0.0% si, 0.0% st Mem: 16299476k total, 16198784k used, 100692k free,73776k buffers Swap: 16787884k total, 148176k used, 16639708k free, 15585396k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3401 postgres 20 0 4288m 3.3g 3.3g S0 21.1 0:24.73 postgres 3397 postgres 20 0 4286m 119m 119m S0 0.8 0:00.36 postgres 4083 postgres 20 0 4303m 104m 101m S0 0.7 0:07.68 postgres 3402 postgres 20 0 4288m 33m 32m S0 0.2 0:03.67 postgres 4082 postgres 20 0 4301m 27m 25m S0 0.2 0:00.85 postgres 4748 postgres 20 0 4290m 5160 3700 S0 0.0 0:00.00 postgres 4173 root 20 0 12340 3028 1280 S0 0.0 0:00.12 bash 4084 postgres 20 0 4290m 2952 1736 S0 0.0 0:00.00 postgres 4612 root 20 0 12340 2920 1276 S0 0.0 0:00.06 bash 4681 root 20 0 12340 2920 1276 S0 0.0 0:00.05 bash 4550 root 20 0 12208 2884 1260 S0 0.0 0:00.08 bash 4547 root 20 0 63580 2780 2204 S and free command says :-- [root@s8-mysd-2 8.4SS]# free -g total used free sharedbuffers cached Mem:15 15 0 0 0 14 -/+ buffers/cache: 0 15 Swap: 16 0 15 [root@s8-mysd-2 8.4SS]# Now when my job finishes and I close the Connections from 2 Servers , the top & free output remains the same :- I don't know What is the reason behind this as I have only 3 Connections from the below command : postgres=# select datname, client_addr,current_query from pg_stat_activity; datname | client_addr | current_query +---+-- postgres | | select datname, client_addr,current_query from pg_stat_activity; postgres | 192.168.0.208 | pdc_uima_s9| 192.168.0.208 | pdc_uima_s9| 192.168.0.208 | pdc_uima_dummy | 192.168.0.208 | pdc_uima_dummy | 192.168.1.102 | pdc_uima_dummy | 192.168.1.102 | pdc_uima_dummy | 192.168.1.102 | (8 rows) PLease help me to understand how much memory does 1 Connection Uses and how to use Server parameters accordingly. Thanks & best Regards, Adarsh Sharma Raghavendra wrote: Adarsh, What is the Size of Database? Best Regards, Raghavendra EnterpriseDB Corporation On Mon, Apr 4, 2011 at 4:24 PM, Scott Marlowe <mailto:scott.marl...@gmail.com>> wrote: On Mon, Apr 4, 2011 at 4:43 AM, Scott Marlowe mailto:scott.marl...@gmail.com>> wrote: > >> [root@s8-mysd-2 ~]# free total used free shared >>buffers cached >> Mem: 16299476 16202264 97212 0 58924 15231852 >> -/+ buffers/cache: 911488 15387988 >> Swap: 16787884 153136 16634748 > > There is nothing wrong here. You're using 153M out of 16G swap. 15.x > Gig is shared buffers. If your system is slow, it's not because it's > running out of memory or using too much swap. Sorry that's 15.xG is system cache, not shared buffers. Anyw
Re: [PERFORM] Postgres Performance Tuning
Thanks Scott : My iostat package is not installed but have a look on below output: [root@s8-mysd-2 8.4SS]# vmstat 10 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo incs us sy id wa st 1 0 147664 93920 72332 1558074801 113 170 47 177 6 1 92 1 0 0 0 147664 94020 72348 1558074800 0 4 993 565 0 0 100 0 0 0 0 147664 93896 72364 1558074800 0 5 993 571 0 0 100 0 0 0 0 147664 93524 72416 1558086000 0 160 1015 591 0 0 100 0 0 0 0 147664 93524 72448 1558086000 0 8 1019 553 0 0 100 0 0 0 0 147664 93648 72448 1558086000 0 0 1019 555 0 0 100 0 0 0 0 147664 93648 72448 1558086000 0 3 1023 560 0 0 100 0 0 [root@s8-mysd-2 8.4SS]# iostat -bash: iostat: command not found [root@s8-mysd-2 8.4SS]# Best regards, Adarsh Scott Marlowe wrote: On Mon, Apr 4, 2011 at 5:34 AM, Adarsh Sharma wrote: Mem: 16299476k total, 16198784k used, 100692k free,73776k buffers Swap: 16787884k total, 148176k used, 16639708k free, 15585396k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3401 postgres 20 0 4288m 3.3g 3.3g S0 21.1 0:24.73 postgres 3397 postgres 20 0 4286m 119m 119m S0 0.8 0:00.36 postgres PLease help me to understand how much memory does 1 Connection Uses and how to use Server parameters accordingly. OK, first, see the 15585396k cached? That's how much memory your OS is using to cache file systems etc. Basically that's memory not being used by anything else right now, so the OS borrows it and uses it for caching. Next, VIRT is how much memory your process would need to load every lib it might need but may not be using now, plus all the shared memory it might need, plus it's own space etc. It's not memory in use, it's memory that might under the worst circumstances, be used by that one process. RES is the amount of memory the process IS actually touching, including shared memory that other processes may be sharing. Finally, SHR is the amount of shared memory the process is touching. so, taking your biggest process, it is linked to enough libraries and shared memory and it's own private memory to add up to 4288Meg. It is currently actually touching 3.3G. Of that 3.3G it is touching 3.3G is shared with other processes. So, the difference between RES and SHR is 0, so the delta, or extra memory it's using besides shared memory is ZERO (or very close to it, probably dozens or fewer of megabytes). So, you're NOT running out of memory. Remember when I mentioned iostat, vmstat, etc up above? Have you run any of those?
Re: [PERFORM] Postgres Performance Tuning
Thank U all, I know some things to work on & after some work & study on them , I will continue this discussion tomorrow . Best Regards, Adarsh Raghavendra wrote: Adarsh, [root@s8-mysd-2 8.4SS]# iostat -bash: iostat: command not found /usr/bin/iostat Our application runs by making connections to Postgres Server from different servers and selecting data from one table & insert into remaining tables in a database. When you are doing bulk inserts you need to tune AUTOVACUUM parameters or Change the autovacuum settings for those tables doing bulk INSERTs. Insert's need analyze. #autovacuum = on# Enable autovacuum subprocess? 'on' # requires track_counts to also be on. #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds. #autovacuum_max_workers = 3 # max number of autovacuum subprocesses #autovacuum_naptime = 1min # time between autovacuum runs #autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum #autovacuum_analyze_threshold = 50 # min number of row updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 2 # maximum XID age before forced vacuum # (change requires restart) #autovacuum_vacuum_cost_delay = 20ms# default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit These are all default AUTOVACUUM settings. If you are using PG 8.4 or above, try AUTOVACUUM settings on bulk insert tables for better performance. Also need to tune the 'autovacuum_naptime' Eg:- ALTER table SET (autovacuum_vacuum_threshold=x, autovacuum_analyze_threshold=); wal_buffers //max is 16MB checkpoint_segment/// Its very less in your setting checkpoint_timeout temp_buffer // If application is using temp tables These parameter will also boost the performance. Best Regards Raghavendra EnterpriseDB Corporation. Scott Marlowe wrote: On Mon, Apr 4, 2011 at 5:34 AM, Adarsh Sharma <mailto:adarsh.sha...@orkash.com> wrote: Mem: 16299476k total, 16198784k used, 100692k free,73776k buffers Swap: 16787884k total, 148176k used, 16639708k free, 15585396k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3401 postgres 20 0 4288m 3.3g 3.3g S0 21.1 0:24.73 postgres 3397 postgres 20 0 4286m 119m 119m S0 0.8 0:00.36 postgres PLease help me to understand how much memory does 1 Connection Uses and how to use Server parameters accordingly. OK, first, see the 15585396k cached? That's how much memory your OS is using to cache file systems etc. Basically that's memory not being used by anything else right now, so the OS borrows it and uses it for caching. Next, VIRT is how much memory your process would need to load every lib it might need but may not be using now, plus all the shared memory it might need, plus it's own space etc. It's not memory in use, it's memory that might under the worst circumstances, be used by that one process. RES is the amount of memory the process IS actually touching, including shared memory that other processes may be sharing. Finally, SHR is the amount of shared memory the process is touching. so, taking your biggest process, it is linked to enough libraries and shared memory and it's own private memory to add up to 4288Meg. It is currently actually touching 3.3G. Of that 3.3G it is touching 3.3G is shared with other processes. So, the difference between RES and SHR is 0, so the delta, or extra memory it's using besides shared memory is ZERO (or very close to it, probably dozens or fewer of megabytes). So, you're NOT running out of memory. Remember when I mentioned iostat, vmstat, etc up above? Have you run any of those?
Re: [PERFORM] Postgres Performance Tuning
Hi, Good Morning To All of You. Yesterday I had some research on my problems. As Scott rightly suggest me to have pre information before posting in the list, I aggreed to him. Here is my first doubt , that I explain as: My application makes several connections to Database Server & done their work : During this process have a look on below output of free command : [root@s8-mysd-2 ~]# free -m total used free sharedbuffers cached Mem: 15917 15826 90 0101 15013 -/+ buffers/cache:711 15205 Swap:16394143 16250 It means 15 GB memory is cached. [root@s8-mysd-2 ~]# cat /proc/meminfo MemTotal: 16299476 kB MemFree: 96268 kB Buffers: 104388 kB Cached: 15370008 kB SwapCached: 3892 kB Active: 6574788 kB Inactive:8951884 kB Active(anon):3909024 kB Inactive(anon): 459720 kB Active(file):2665764 kB Inactive(file): 8492164 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 16787884 kB SwapFree: 16640472 kB Dirty: 1068 kB Writeback: 0 kB AnonPages: 48864 kB Mapped: 4277000 kB Slab: 481960 kB SReclaimable: 466544 kB SUnreclaim:15416 kB PageTables:57860 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:24904852 kB Committed_AS:5022172 kB VmallocTotal: 34359738367 kB VmallocUsed: 310088 kB VmallocChunk: 34359422091 kB HugePages_Total: 32 HugePages_Free: 32 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:3776 kB DirectMap2M:16773120 kB [root@s8-mysd-2 ~]# Now Can I know why the cached memory is not freed after the connections done their work & their is no more connections : pdc_uima_dummy=# select datname,numbackends from pg_stat_database; datname | numbackends ---+- template1 | 0 template0 | 0 postgres | 2 template_postgis | 0 pdc_uima_dummy| 11 pdc_uima_version3 | 0 pdc_uima_olap | 0 pdc_uima_s9 | 0 pdc_uima | 0 (9 rows) Same output is when it has 100 connections. Now I have to start more queries on Database Server and issue new connections after some time. Why the cached memory is not freed. Flushing the cache memory is needed & how it could use so much if I set effective_cache_size = 4096 MB. I think if i issue some new select queries on large set of data, it will use Swap Memory & degrades Performance. Please correct if I'm wrong. Thanks & best Regards, Adarsh Sharma Raghavendra wrote: Adarsh, [root@s8-mysd-2 8.4SS]# iostat -bash: iostat: command not found /usr/bin/iostat Our application runs by making connections to Postgres Server from different servers and selecting data from one table & insert into remaining tables in a database. When you are doing bulk inserts you need to tune AUTOVACUUM parameters or Change the autovacuum settings for those tables doing bulk INSERTs. Insert's need analyze. #autovacuum = on# Enable autovacuum subprocess? 'on' # requires track_counts to also be on. #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds. #autovacuum_max_workers = 3 # max number of autovacuum subprocesses #autovacuum_naptime = 1min # time between autovacuum runs #autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum #autovacuum_analyze_threshold = 50 # min number of row updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 2 # maximum XID age before forced vacuum # (change requires restart) #autovacuum_vacuum_cost_delay = 20ms# default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use
[PERFORM] Which is better Index
Dear all, I want to clear my doubts regarding creating several single or a multi-column indexes. My table schema is :- CREATE TABLE svo2( svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass), doc_id integer, sentence_id integer, clause_id integer, negation integer, subject charactervarying(3000), verb character varying(3000), "object" character varying(3000), preposition character varying(3000), subject_type character varying(3000), object_type charactervarying(3000), subject_attribute character varying(3000), object_attribute character varying(3000), verb_attribute character varying(3000), subject_concept character varying(100), object_concept character varying(100), subject_sense character varying(100), object_sense character varying(100), subject_chain character varying(5000), object_chain character varying(5000), sub_type_id integer, obj_type_id integer, CONSTRAINT pk_svo_id PRIMARY KEY (svo_id))WITH ( OIDS=FALSE); _*Fore.g*_ CREATE INDEX idx_svo2_id_dummy ON svo2 USING btree (doc_id, clause_id, sentence_id); or CREATE INDEX idx_svo2_id_dummy ON svo2 USING btree (doc_id); CREATE INDEX idx_svo2_id_dummy1 ON svo2 USING btree (clause_id); CREATE INDEX idx_svo2_id_dummy2 ON svo2 USING btree (sentence_id); Which is better if a query uses all three columns in join where clause. Thanks & best regards, Adarsh Sharma
Re: [PERFORM] Postgres Performance Tuning
Scott Marlowe wrote: On Tue, Apr 5, 2011 at 1:33 AM, Adarsh Sharma wrote: [root@s8-mysd-2 ~]# free -m total used free sharedbuffers cached Mem: 15917 15826 90 0101 15013 -/+ buffers/cache:711 15205 Swap:16394143 16250 It means 15 GB memory is cached. Note that the kernel takes all otherwise unused memory and uses it for cache. If, at any time a process needs more memory, the kernel just dumps some cached data and frees up the memory and hands it over, it's all automatic. As long as cache is large, things are OK. You need to be looking to see if you're IO bound or CPU bound first. so, vmstat (install the sysstat package) is the first thing to use. Thanks a lot , Scott. :-) Best Regards , Adarsh
Re: [PERFORM] The right SHMMAX and FILE_MAX
I am also in need of a proper documentation that explains how to set SHMAX and SHMALL variables in Postgres. What things need to be taken in consideration before doing that ? What is the value of SHMAX & SHMALL if u have 16 GB RAM for Postgres Server ? Thanks Phoenix Kiula wrote: Hi. I'm on a 64 Bit CentOS 5 system, quadcore processor, 8GB RAM and tons of data storage (1 TB SATAII disks). The current SHMMAX and SHMMIN are (commas added for legibility) -- kernel.shmmax = 68,719,476,736 kernel.shmall = 4,294,967,296 Now, according to my reading in the PG manual and this list, a good recommended value for SHMMAX is (shared_buffers * 8192) My postgresql.conf settings at the moment are: max_connections = 300 shared_buffers = 300MB effective_cache_size = 2000MB By this calculation, shared_b * 8192 will be: 2,457,600,000,000 That's a humongous number. So either the principle for SHMMAX is amiss, or I am reading this wrongly? Similarly with "fs.file_max". There are articles like this one: http://tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec72.html Is this relevant for PostgreSQL performance at all, or should I skip that? Thanks for any pointers! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The right SHMMAX and FILE_MAX
Tomas Vondra wrote: Dne 2.5.2011 06:53, Adarsh Sharma napsal(a): I am also in need of a proper documentation that explains how to set SHMAX and SHMALL variables in Postgres. What things need to be taken in consideration before doing that ? What is the value of SHMAX & SHMALL if u have 16 GB RAM for Postgres Server ? Well, those two values actually define kernel limits for shared memory segments (i.e. memory shared by multiple processes, in this case the postmaster proces and backends). So it's rather a question of tuning shared_buffers (because that's the shared memory segment) and then setting those two values. When I was tuning Postgresql for best Performance, I set my shared_buffers= 4096 MB as I set 25% of RAM ( 1/4 ) So Do I need to set my SHMMAX =4096 MB. What is the SHMALL size now ? SHMMAX - max. size of a single shared segment (in bytes) SHMALL - total size of shared segments (in pages, page is usually 4kB) So if you decide you want 1GB shared buffers, you'll need at least this SHMMAX = 1024 * 1024 * 1024 (i.e. 1GB) SHMALL = 1024 * 256 (1GB in 4kB pages) (althouth the SHMALL should be higher, as there will be other processes that need shared memory). There's a lot of docs about this, e.g. this one (it's mostly for Oracle, but it describes the shared memory quite nicely): http://www.puschitz.com/TuningLinuxForOracle.shtml#SettingSharedMemory regards Tomas
[PERFORM] Why query takes soo much time
Dear all, I have a query on 3 tables in a database as :- _*Explain Analyze Output :-*_ explain anayze select c.clause, s.subject ,s.object , s.verb, s.subject_type , s.object_type ,s.doc_id ,s.svo_id from clause2 c, svo2 s ,page_content p where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.sentence_id=s.sentence_id and s.doc_id=p.crawled_page_id order by s.svo_id limit 1000 offset 17929000 "Limit (cost=21685592.91..21686802.44 rows=1000 width=2624) (actual time=414601.802..414622.920 rows=1000 loops=1)" " -> Nested Loop (cost=59.77..320659013645.28 rows=265112018116 width=2624) (actual time=0.422..404902.314 rows=1793 loops=1)" "-> Nested Loop (cost=0.00..313889654.42 rows=109882338 width=2628) (actual time=0.242..174223.789 rows=17736897 loops=1)" " -> Index Scan using pk_svo_id on svo2 s (cost=0.00..33914955.13 rows=26840752 width=2600) (actual time=0.157..14691.039 rows=14238271 loops=1)" " -> Index Scan using idx_clause2_id on clause2 c (cost=0.00..10.36 rows=4 width=44) (actual time=0.007..0.008 rows=1 loops=14238271)" "Index Cond: ((c.source_id = s.doc_id) AND (c.clause_id = s.clause_id) AND (c.sentence_id = s.sentence_id))" "-> Bitmap Heap Scan on page_content p (cost=59.77..2885.18 rows=2413 width=8) (actual time=0.007..0.008 rows=1 loops=17736897)" " Recheck Cond: (p.crawled_page_id = s.doc_id)" " -> Bitmap Index Scan on idx_crawled_id (cost=0.00..59.17 rows=2413 width=0) (actual time=0.005..0.005 rows=1 loops=17736897)" "Index Cond: (p.crawled_page_id = s.doc_id)" "Total runtime: 414623.634 ms" _*My Table & index definitions are as under :- *_Estimated rows in 3 tables are :- clause2 10341700 svo2 26008000 page_content 479785 CREATE TABLE clause2 ( id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass), source_id integer, sentence_id integer, clause_id integer, tense character varying(30), clause text, CONSTRAINT pk_clause_id PRIMARY KEY (id) )WITH ( OIDS=FALSE); CREATE INDEX idx_clause2_id ON clause2 USING btree (source_id, clause_id, sentence_id); CREATE TABLE svo2 ( svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass), doc_id integer, sentence_id integer, clause_id integer, negation integer, subject character varying(3000), verb character varying(3000), "object" character varying(3000), preposition character varying(3000), subject_type character varying(3000), object_type character varying(3000), subject_attribute character varying(3000), object_attribute character varying(3000), verb_attribute character varying(3000), subject_concept character varying(100), object_concept character varying(100), subject_sense character varying(100), object_sense character varying(100), subject_chain character varying(5000), object_chain character varying(5000), sub_type_id integer, obj_type_id integer, CONSTRAINT pk_svo_id PRIMARY KEY (svo_id) )WITH ( OIDS=FALSE); CREATE INDEX idx_svo2_id_dummy ON svo2 USING btree (doc_id, clause_id, sentence_id); CREATE TABLE page_content ( content_id integer NOT NULL DEFAULT nextval('page_content_ogc_fid_seq'::regclass), wkb_geometry geometry, link_level integer, isprocessable integer, isvalid integer, isanalyzed integer, islocked integer, content_language character(10), url_id integer, publishing_date character(40), heading character(150), category character(150), crawled_page_url character(500), keywords character(500), dt_stamp timestamp with time zone, "content" character varying, crawled_page_id bigint, CONSTRAINT page_content_pk PRIMARY KEY (content_id), CONSTRAINT enforce_dims_wkb_geometry CHECK (st_ndims(wkb_geometry) = 2), CONSTRAINT enforce_srid_wkb_geometry CHECK (st_srid(wkb_geometry) = (-1)) )WITH ( OIDS=FALSE); CREATE INDEX idx_crawled_id ON page_content USING btree (crawled_page_id); CREATE INDEX pgweb_idx ON page_content USING gin (to_tsvector('english'::regconfig, content::text)); If possible, Please let me know if I am something wrong or any alternate query to run it faster. Thanks
[PERFORM] Restore database after drop command
Dear all, I am using Postgres-8.4.2 on Windows system. I have 2 databases in my postgres database ( globedatabase (21GB), urldatabase). I restore globedatabase from a .sql file on yesterday morning.I insert some new data in that database. In the evening, by mistake I issued a *drop database globedatabase* command. Today morning, I restore again the same database from backup (.sql) file. My .sql file have data till yesterday morning but I want newly insert data now. Is it possible. Is it possible to get the data back till the state before drop database command. My pglog files is in the E:/data directory & Binary log is also enabled. Please let me know if it is possible. It's urgent. Thanks & Regards Adarsh Sharma
Re: [PERFORM] [ADMIN] Restore database after drop command
I go through the link, so it is impossible to get the data back. I have following files in my pg_xlog directory : 000100070091 000100070092 000100070093 000100070094 000100070095 000100070096 000100070097 000100070098 I think I issued the drop database command 1 month ago. From the manual, I understand that my segment files are recycled to newer ones : /The segment files are given numeric names that reflect their position in the abstract WAL sequence. When not using WAL archiving, the system normally creates just a few segment files and then "recycles" them by renaming no-longer-needed segment files to higher segment numbers. It's assumed that a segment file whose contents precede the checkpoint-before-last is no longer of interest and can be recycled. /My archive_status folder is empty. How would we know that which data these segment files corresponds too. I followed below steps 1 month ago : 1. Load globdatabase through backup.sql (21 GB)file 2. Insert some data near about 3-4 tables ( KB) data. 3. Drop database globdatabase. 4. Load globdatabase through backup.sql (21GB)file May be there is chance because we work very rarely on that system. Now i have the backup file bt I want that 3-4 tables. Thanks Vibhor Kumar wrote: On Jul 25, 2011, at 12:08 PM, Adarsh Sharma wrote: I restore globedatabase from a .sql file on yesterday morning.I insert some new data in that database. In the evening, by mistake I issued a drop database globedatabase command. Today morning, I restore again the same database from backup (.sql) file. My .sql file have data till yesterday morning but I want newly insert data now. Is it possible. Is it possible to get the data back till the state before drop database command. No you won't be able to recover. If you have Online Backup, then PITR would help you. Thanks & Regards, Vibhor Kumar Blogs: http://vibhork.blogspot.com http://vibhorkumar.wordpress.com
[PERFORM] How to Speed up Insert from Multiple Connections
Dear all, I research a lot on Postgresql Performance Tuning and find some parameters to increase the select performance in postgresql. By increasing shared_buffers,effective_cache_size ,work_mem, maintainance etc , we can achieve performance in select queries. But In my application about 200 connections are made to DB server and insert into 2 tables occured. And it takes more than hours to complete. I understand the variable checkpoint_segments & want to know is there any more ways to increase the write performance. Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Which Join is better
Dear all, Just want to know which join is better for querying data faster. I have 2 tables A ( 70 GB ) & B ( 7 MB ) A has 10 columns & B has 3 columns.Indexes exist on both tables's ids. select p.* from table A p, B q where p.id=q.id or select p.* from table B q , A p where q.id=p.id Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Need to tune for Heavy Write
Dear all, From the last few days, I researched a lot on Postgresql Performance Tuning due to slow speed of my server. My application selects data from mysql database about 10 rows , process it & insert into postgres 2 tables by making about 45 connections. I set my postgresql parameters in postgresql.conf as below: ( OS : Ubuntu, RAM : 16 GB, Postgres : 8.4.2 ) max_connections= 80 shared_buffers= 2048MB work_mem = 32MB maintenance_work_mem = 512MB fsync=off full_page_writes=off synchronous_commit=off checkpoint_segments = 32 checkpoint_completion_target = 0.7 effective_cache_size = 4096MB After this I change my pg_xlog directory to a separate directory other than data directory by symlinking. By Application issue insert statements through postgresql connections only. Please let me know if I missing any other important configuration. Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Parameters for PostgreSQL
I think RAID 10 is best among all the RAID Levels. Thanks Craig Ringer wrote: On 04/08/11 11:42, Jayadevan M wrote: Hello, >The most important spec has been omitted. What's the storage subsystem? We have storage on SAN, RAID 5. RAID 5? That's *really* not ideal for database workloads, either Pg or Oracle, unless your RAID 5 storage backend has enough battery-backed write cache to keep huge amounts of writes in RAM and reorder them really effectively. I hope each RAID 5 LUN is only across a few disks and is layered with RAID 1, though. RAID 5 becomes less reliable than using a single disk when used with too many HDDs, because the probability of a double-disk failure becomes greater than that of a single standalone disk failing. After being bitten by that a few times, these days I'm using RAID 6 in most cases where RAID 10 isn't practical. In any case, "SAN" can be anything from a Linux box running an iSCSI target on top of a RAID 5 `md' software RAID volume on four 5400RPM HDDs, right up to a giant hundreds-of-fast-disks monster filer full of dedicated ASICs and great gobs of battery backed write cache DRAM. Are you able to be any more specific about what you're dealing with? > > We are suing weblogic. > ^ > Best. Typo. Ever. > > I hear most people who use it want to, you're just brave enough to do it :-P I wish I could make a few millions that way. Thank you for all the replies. The first step is, of course, to migrate the data. I am working with ora2pg for that. I assume creating files with 'COPY' to work as input for PostgreSQL is the right approach? We don't have many stored procedures or packages. So that part should be OK.
Re: [PERFORM] How to track number of connections and hosts to Postgres cluster
pg_stat_activity keeps track of all this information. select * from pg_stat_activity where datname='databasename'; Venkat Balaji wrote: Thanks Guillaume !! But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ? Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ). Thanks Venkat On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge mailto:guilla...@lelarge.info>> wrote: On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote: > Hello Everyone, > > I am working on an alert script to track the number of connections with the > host IPs to the Postgres cluster. > > 1. I need all the host IPs making a connection to Postgres Cluster (even for > a fraction of second). You should set log_connections to on. > 2. I would also want to track number of IDLE connections, IDLE IN > TRANSACTION connections and length of the connections as well. > IDLE and IDLE in transactions are the kind of informations you get in pg_stat_activity. Length of connections, you can get it with log_disconnections. > I would be making use of pg_stat_activity and also thought of enabling > logging the host ips in the db server log files which seems to be expensive > for me (in terms of IO and logfile size). > Using pg_stat_activity won't get you really small connections. You need log_connections for that, and log_disconnections for the duration of connections. So you'll have to work on a tool that could get some informations with queries on pg_stat_activity, and that could read PostgreSQL log files. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com