[PERFORM] Is Query need to be optimized

2011-02-27 Thread Adarsh Sharma
ops=1)

Hash Cond: (l.source_id = m.doc_id)
->  Seq Scan on loc_context_demo l  
(cost=0.00..48108.71 rows=356 width=18) (actual time=0.986..316.129 
rows=816 loops=1)
  Filter: ((geom && 
'010320E61001000A00935A97CF5D7E3A408BA46A930EAF5740935A97CF5D7E3A408BA46A930EAF5740F023C92F807E3

A403D5E90251CAF5740B2BD8E20E745394059E2DB9683CD5640A6A712BBC793364091548ABA9CD0564002B050A337C93540EBA0A9236E785740319F7772E0E33840758E85A069CA574003618D4205
E43840B48CC28F72CA574003618D4205E43840B48CC28F72CA5740935A97CF5D7E3A408BA46A930EAF5740'::geometry) 
AND _st_within(geom, '010320E61001000A0093

5A97CF5D7E3A408BA46A930EAF5740935A97CF5D7E3A408BA46A930EAF5740F023C92F807E3A403D5E90251CAF5740B2BD8E20E745394059E2DB9683CD5640A6A712BBC793364091548ABA9CD0564
002B050A337C93540EBA0A9236E785740319F7772E0E33840758E85A069CA574003618D4205E43840B48CC28F72CA574003618D4205E43840B48CC28F72CA5740935A97CF5D7E3A408BA46A930EAF
5740'::geometry))
->  Hash  (cost=37186.32..37186.32 rows=9248 
width=55) (actual time=190.396..190.396 rows=9016 loops=1)
  ->  Seq Scan on metadata_demo m  
(cost=0.00..37186.32 rows=9248 width=55) (actual time=38.895..183.396 
rows=9016 loops=1)
        Filter: (doc_category = 
'Terrorism'::bpchar)

Total runtime: 1210.112 ms
(15 rows)



Best regards,
Adarsh Sharma

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Is it require further tuning

2011-03-02 Thread Adarsh Sharma
ops=1)

Total runtime: 64.396 ms


Don't know why it uses Seq Scan on loc_context_terror as i have indexes 
on the desired columns as well.



Thanks & best Regards,

Adarsh Sharma

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How to tune this query

2011-03-07 Thread Adarsh Sharma
8C5BE4C64526406D4676A905705240A55424D1E75F28402385E2519739524084C31EB73D7C2A401E60
E240D11D5240D786518A3F712D40F35BED31F40E52403C8BF8C96E62304026AE5FD5A7F15140A381208F97D531405E60389EC4E1514086BD630860DE324036012694EDDF5140AD741D40E17C3340E
00EA98AC6DE5140E9339DDE5C7D334057D066D3D6DE5140F428BE555EE63340D2983401FADD51406269CD198B4F3440B9FC8ECE16DD5140BC38DFDB0B503440CA8056C85BDD514085A28D108C5034
40E0F9A841A1DD514003F100B50B51344028F11A3BE7DD5140824556C68A513440364940B52DDE514052B27C71748834406197CA83AEF851402902454461C434405BB0871D3D30524058A819DA7FC
A3440C6EEBDFC22685240D7C07A36B7CA34407CC17F7F32685240F442A351EECA34402D04B1E84168524022BBE72B25CB344026AB843851685240FB4E9CC55BCB3440B8302D6F606852406DBD141F
92CB34408E96DC8C6F685240E45EA438C8CB3440FE66C4917E68524082259E12FECB344054B6157E8D685240419D54AD33CC3440182401529C68524030EC190969CC344052DCB60DAB685240ADD23
F269ECC3440CA9866B1B968524092AB1705D3CC344044A23F3DC8685240756CF2A507CD3440B8D170B1D6685240406C9864EAFA3340D89D889DF98452403FAD44C337FB334042381684FE84524089
B70D3485FB33403B021A7503855240EAD919B7D2FB3340E43D9E70088552406A7C8F4C20FC3340EA46AD760D855240772095F46DFC3340AD92518712855240146151AFBBFC334075B095A21785524
015F3EA7C09FD33409A4984C81C855240270BB5E1518C334075A7F23A044E5240'::geometry))
  ->  Hash  (cost=43457.32..43457.32 rows=82682 width=55) 
(actual time=255.707..255.707 rows=82443 loops=1)
->  Seq Scan on metadata_demo m  
(cost=0.00..43457.32 rows=82682 width=55) (actual time=0.013..230.904 
rows=82443 loops=1)

  Filter: (doc_category = 'Naxalism'::bpchar)
->  Index Scan using idx_crawled_id on page_content_demo p  
(cost=0.00..9.24 rows=1 width=1353) (actual time=4.822..4.946 rows=1 
loops=4129)

  Index Cond: (p.crawled_page_id = l.source_id)
Total runtime: 21379.870 ms
(14 rows)


Yesterday after some Performance tuning ( shared-buffers=1GB,effective 
cache-size=2Gb, work mem=64MB, maintenance_work_mem=256MB)  and creating 
indexes as :


CREATE INDEX idx1_source_id_l2
 ON l1  USING btree(source_id,lat,lon);

CREATE INDEX idx_doc_id_m1
 ON m1  USING btree(doc_id,doc_category);

 CREATE INDEX idx_crawled_id_p1
 ON p1
 USING btree
 (crawled_page_id,heading,category,crawled_page_url);

my Total runtime := Total runtime: 704.383 ms

And if run the same explain analyze command again ,Total runtime: 696.856 ms

What is the reason that first time it takes so much time and I know 
second time , Postgres uses cache .


Is it possible to make it run faster at the first time too. Please let 
me know.





Thanks & best Regards,

Adarsh Sharma


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Help with Query Tuning

2011-03-16 Thread Adarsh Sharma

Dear all,

I am facing a problem while  creating the index to make the below query 
run faster. My table  size is near about 1065 MB and 428467 rows.


explain analyze select  count(*)  from page_content where 
publishing_date like '%2010%' and content_language='en'  and content is 
not null and isprocessable = 1 and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content 
like '%terrorist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR 
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') 
AND (content like '%kill%' or content like '%injure%');


*Output:

* Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
  ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 
width=0) (actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable 
= 1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text 
~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ 
'%dsf%'::text) OR ((content)::text

~~ '%ssb%'::text)))
Total runtime: 18564.673 ms


*Index on that Table :

*CREATE INDEX idx_page_id
 ON page_content
 USING btree
 (crawled_page_id);

*Index I create :*
CREATE INDEX idx_page_id_content
 ON page_content
 USING btree
 (crawled_page_id,content_language,publishing_date,isprocessable);

*Index that fail to create:

*CREATE INDEX idx_page_id_content1
 ON page_content
 USING btree
 (crawled_page_id,content);

Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
** Error **

ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000

How to resolve this error
Please give any suggestion to tune the query.

Thanks & best Regards,

Adarsh Sharma



Re: [PERFORM] Help with Query Tuning

2011-03-16 Thread Adarsh Sharma
Thanks Marshall, would I need to change the data type  of *content 
*column to tsvector and create a Gist Index on it.


Best Regards,
Adarsh


Kenneth Marshall wrote:

On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
  

Dear all,

I am facing a problem while  creating the index to make the below query run 
faster. My table  size is near about 1065 MB and 428467 rows.


explain analyze select  count(*)  from page_content where publishing_date 
like '%2010%' and content_language='en'  and content is not null and 
isprocessable = 1 and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content like 
'%terrorist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR content 
like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content 
like '%kill%' or content like '%injure%');


*Output:

* Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
  ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 width=0) 
(actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 
1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ 
'%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) 
OR ((content)::text

~~ '%ssb%'::text)))
Total runtime: 18564.673 ms


*Index on that Table :

*CREATE INDEX idx_page_id
 ON page_content
 USING btree
 (crawled_page_id);

*Index I create :*
CREATE INDEX idx_page_id_content
 ON page_content
 USING btree
 (crawled_page_id,content_language,publishing_date,isprocessable);

*Index that fail to create:

*CREATE INDEX idx_page_id_content1
 ON page_content
 USING btree
 (crawled_page_id,content);

Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
** Error **

ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000

How to resolve this error
Please give any suggestion to tune the query.

Thanks & best Regards,

Adarsh Sharma




You should probably be looking at using full-text indexing:

http://www.postgresql.org/docs/9.0/static/textsearch.html

or limit the size of content for the index.

Cheers,
Ken
  




Re: [PERFORM] Help with Query Tuning

2011-03-16 Thread Adarsh Sharma

Thanks, I understand it know :-

But My one doubt which isn't clear  :

*Original Query :-*

select  count(*)  from page_content where (content like '%Militant%'
OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content 
like '%terrORist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR 
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') 
AND (content like '%kill%' OR content like '%injure%');


*Output :-*
count
---
57061
(1 row)

Time: 19726.555 ms

I need to tune it , use full-text searching as :

*Modified Query :-

*SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and 
content is not null and isprocessable = 1 and 
to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 
'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 
'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 
'dsf' || 'ssb');


*Output :-*
count
---
0
(1 row)

Time: 194685.125 ms
*
*I try, SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and 
content is not null and isprocessable = 1 and 
to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || 
'%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || 
'%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' 
|| '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');


count
---
0
(1 row)

Time: 194722.468 ms

I know I have to create index but index is the next step, first you have 
to get the correct result .


CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', 
content));



Please guide me where I am going wrong.


Thanks & best Regards,

Adarsh Sharma
Kenneth Marshall wrote:

On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
  

Dear all,

I am facing a problem while  creating the index to make the below query run 
faster. My table  size is near about 1065 MB and 428467 rows.


explain analyze select  count(*)  from page_content where publishing_date 
like '%2010%' and content_language='en'  and content is not null and 
isprocessable = 1 and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content like 
'%terrorist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR content 
like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content 
like '%kill%' or content like '%injure%');


*Output:

* Aggregate  (cost=107557.78..107557.79 rows=1 width=0) (actual 
time=18564.631..18564.631 rows=1 loops=1)
  ->  Seq Scan on page_content  (cost=0.00..107466.82 rows=36381 width=0) 
(actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ 
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 
1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND 
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ 
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ 
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR 
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ 
'%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR 
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR 
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) 
OR ((content)::text

~~ '%ssb%'::text)))
Total runtime: 18564.673 ms


*Index on that Table :

*CREATE INDEX idx_page_id
 ON page_content
 USING btree
 (crawled_page_id);

*Index I create :*
CREATE INDEX idx_page_id_content
 ON page_content
 USING btree
 (crawled_page_id,content_language,publishing_date,isprocessable);

*Index that fail to create:

*CREATE INDEX idx_page_id_content1
 ON page_content
 USING btree
 (crawled_page_id,content);

Error :-ERROR:  index row requires 13240 bytes, maximum size is 8191
** Error **

ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000

How to resolve this error
Please give any suggestion to tune the query.

Thanks & best Regards,

Adarsh Sharma




You should probably be looking at using full-text indexing:

http://www.postgresql.org/docs/9.0/static/textsearch.html

or limit the size of content for the index.

Cheers,
Ken
  




Re: [PERFORM] Help with Query Tuning

2011-03-17 Thread Adarsh Sharma

Thanks , it works now .. :-)

Here is the output :

pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like 
'%2010%' and
pdc_uima-# content_language='en' and content is not null and 
isprocessable = 1 and
pdc_uima-# to_tsvector('english',content) @@ 
to_tsquery('english','Mujahid' || ' | '

pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || 
' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 
'crpf' || ' | '

pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

count 


137193
(1 row)

Time: 195441.894 ms


But my original query is to use AND also i.e

select  count(*)  from page_content where publishing_date like '%2010%' 
and content_language='en'  and content is not null and isprocessable = 1 
and (content like '%Militant%'

OR content like '%jihad%' OR  content like '%Mujahid%'  OR
content like '%fedayeen%' OR content like '%insurgent%'  OR content 
like '%terrORist%' OR
 content like '%cadre%'  OR content like '%civilians%' OR content like 
'%police%' OR content like '%defence%' OR content like '%cops%' OR 
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') 
AND (content like '%kill%' OR content like '%injure%');


count
---
57061
(1 row)

Time: 19423.087 ms


Now I have to add AND condition (  AND (content like '%kill%' OR content 
like '%injure%')  )  also.



Thanks & Regards,
Adarsh Sharma



t...@fuzzy.cz wrote:

t...@fuzzy.cz wrote:


Yes , I think we caught the problem but it results in the below error :

SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and
content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid ' ||
'jihad ' || 'Militant ' || 'fedayeen ' || 'insurgent ' || 'terrORist '
|| 'cadre ' || 'civilians ' || 'police ' || 'defence ' || 'cops ' ||
'crpf ' || 'dsf ' || 'ssb');

ERROR:  syntax error in tsquery: "Mujahid jihad Militant fedayeen
insurgent terrORist cadre civilians police defence cops crpf dsf ssb"



The text passed to to_tsquery has to be a proper query, i.e. single
tokens
separated by boolean operators. In your case, you should put there '|'
(which means OR) to get something like this

  'Mujahid | jihad | Militant | ...'

or you can use plainto_tsquery() as that accepts simple text, but it
puts
'&' (AND) between the tokens and I guess that's not what you want.

Tomas


  

What to do to make it satisfies the OR condition to match any of the
to_tsquery values as we got it right through like '%Mujahid' or .
or 



You can't force the plainto_tsquery to somehow use the OR instead of AND.
You need to modify the piece of code that produces the search text to put
there '|' characters. So do something like this

SELECT count(*)  from page_content WHERE publishing_date like '%2010%' and
content_language='en' and content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | '
|| 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen);

Not sure where does this text come from, but you can do this in a higher
level language, e.g. in PHP. Something like this

$words = implode(' | ', explode(' ',$text));

and then pass the $words into the query. Or something like that.

Tomas

  




[PERFORM] Reason of Slowness of query

2011-03-22 Thread Adarsh Sharma

Dear all,

I have 2 tables in my database name clause2( 4900 MB) & 
page_content(1582 MB).


My table definations are as :

*page_content :-

*CREATE TABLE page_content
(
 content_id integer,
 wkb_geometry geometry,
 link_level integer,
 isprocessable integer,
 isvalid integer,
 isanalyzed integer,
 islocked integer,
 content_language character(10),
 url_id integer,
 publishing_date character(40),
 heading character(150),
 category character(150),
 crawled_page_url character(500),
 keywords character(500),
 dt_stamp timestamp with time zone,
 "content" character varying,
 crawled_page_id bigint,
 id integer
)
WITH (
 OIDS=FALSE
);

*Indexes on it :-*
CREATE INDEX idx_page_id  ON page_content  USING btree  (crawled_page_id);
CREATE INDEX idx_page_id_content   ON page_content  USING btree  
(crawled_page_id, content_language, publishing_date, isprocessable);
CREATE INDEX pgweb_idx  ON page_content   USING gin   
(to_tsvector('english'::regconfig, content::text));


*clause 2:-
*CREATE TABLE clause2
(
 id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
 source_id integer,
 sentence_id integer,
 clause_id integer,
 tense character varying(30),
 clause text,
 CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)WITH ( OIDS=FALSE);

*Indexes on it :

*CREATE INDEX idx_clause2_march10
 ON clause2
 USING btree
 (id, source_id);*

*I perform a join query on it as :

* explain analyze select distinct(p.crawled_page_id) from page_content p 
, clause2  c where p.crawled_page_id != c.source_id ;


*What it takes more than 1 hour to complete. As I issue the explain 
analyze command and cannot able to wait for output but I send my explain 
output as :
QUERY 
PLAN  


Unique  (cost=927576.16..395122387390.13 rows=382659 width=8)
  ->  Nested Loop  (cost=927576.16..360949839832.15 rows=13669019023195 
width=8)

Join Filter: (p.crawled_page_id <> c.source_id)
->  Index Scan using idx_page_id on page_content p  
(cost=0.00..174214.02 rows=428817 width=8)

->  Materialize  (cost=927576.16..1370855.12 rows=31876196 width=4)
  ->  Seq Scan on clause2 c  (cost=0.00..771182.96 
rows=31876196 width=4)

(6 rows)


Please guide me how to make the above query run faster as I am not able 
to do that.



Thanks, Adarsh

*


*




Re: [PERFORM] Reason of Slowness of query

2011-03-23 Thread Adarsh Sharma

Thanks Chetan, here is the output of your updated query :


*explain  select distinct(p.crawled_page_id) from page_content p where 
NOT EXISTS (select 1 from  clause2 c where c.source_id = p.crawled_page_id);


*
 QUERY 
PLAN  
---

HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8)
  ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8)
Hash Cond: (p.crawled_page_id = c.source_id)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8)

->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4)
  ->  Seq Scan on clause2 c  (cost=0.00..771182.96 
rows=31876196 width=4)

(6 rows)

And my explain analyze output is :

 QUERY 
PLAN

HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8) 
(actual time=5.181..56669.270 rows=72 loops=1)
  ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8) 
(actual time=45740.789..56665.816 rows=74 loops=1)

Hash Cond: (p.crawled_page_id = c.source_id)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.012..715.915 rows=428467 loops=1)
->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4) 
(actual time=45310.524..45310.524 rows=31853083 loops=1)
  ->  Seq Scan on clause2 c  (cost=0.00..771182.96 
rows=31876196 width=4) (actual time=0.055..23408.884 rows=31853083 loops=1)

Total runtime: 56687.660 ms
(7 rows)

But Is there is any option to tune it further and one more thing output 
rows varies from 6 to 7.



Thanks & best Regards,
Adarsh Sharma






Chetan Suttraway wrote:



On Wed, Mar 23, 2011 at 11:58 AM, Adarsh Sharma 
mailto:adarsh.sha...@orkash.com>> wrote:


Dear all,

I have 2 tables in my database name clause2( 4900 MB) &
page_content(1582 MB).

My table definations are as :

*page_content :-

*CREATE TABLE page_content
(
  content_id integer,
  wkb_geometry geometry,
  link_level integer,
  isprocessable integer,
  isvalid integer,
  isanalyzed integer,
  islocked integer,
  content_language character(10),
  url_id integer,
  publishing_date character(40),
  heading character(150),
  category character(150),
  crawled_page_url character(500),
  keywords character(500),
  dt_stamp timestamp with time zone,
  "content" character varying,
  crawled_page_id bigint,
  id integer
)
WITH (
  OIDS=FALSE
);

*Indexes on it :-*
CREATE INDEX idx_page_id  ON page_content  USING btree 
(crawled_page_id);
CREATE INDEX idx_page_id_content   ON page_content  USING btree 
(crawled_page_id, content_language, publishing_date, isprocessable);
CREATE INDEX pgweb_idx  ON page_content   USING gin  
(to_tsvector('english'::regconfig, content::text));


*clause 2:-
*CREATE TABLE clause2
(
  id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
  source_id integer,
  sentence_id integer,
  clause_id integer,
  tense character varying(30),
  clause text,
  CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)WITH ( OIDS=FALSE);

*Indexes on it :

*CREATE INDEX idx_clause2_march10
  ON clause2
  USING btree
  (id, source_id);*

*I perform a join query on it as :

* explain analyze select distinct(p.crawled_page_id) from
page_content p , clause2  c where p.crawled_page_id != c.source_id ;

*What it takes more than 1 hour to complete. As I issue the
explain analyze command and cannot able to wait for output but I
send my explain output as :
 QUERY
PLAN  


 Unique  (cost=927576.16..395122387390.13 rows=382659 width=8)
   ->  Nested Loop  (cost=927576.16..360949839832.15
rows=13669019023195 width=8)
 Join Filter: (p.crawled_page_id <> c.source_id)
 ->  Index Scan using idx_page_id on page_content p 
(cost=0.00..174214.02 rows=428817 width=8)

 ->  Materialize  (cost=927576.16..1370855.12
rows=31876196 width=4)
   ->  Seq Scan on clause2 c  (cost=0.00..771182.96
rows=31876196 width=4)
(6 rows)


Please guide me how to make the above query run faster as I am not
able to do that.


Thanks,

[PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Thanks Chetan, After my Lunch Break, I tried the below steps :

*My original query was :
*explain analyze select distinct(p.crawled_page_id) from page_content p 
, clause2  c where p.crawled_page_id != c.source_id


which hangs because it is wrong query to fetch the desired output .

*Next Updated Query be Chetan Suttraway :*

explain analyze select distinct(p.crawled_page_id) from page_content p
where NOT EXISTS (select 1 from  clause2 c where c.source_id = 
p.crawled_page_id);


   
QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7192.843..7195.923 rows=72 loops=1)
  ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.040..7192.426 rows=74 loops=1)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.009..395.599 rows=428467 loops=1)
->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.014..0.014 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 7199.748 ms
(6 rows)

I think it is very much faster but I don't understand the query :

*explain select distinct(b) from t1,t2 where t1.b >t2.d union all  
select distinct(b) from t1,t2 where  t1.b 

*As i transform it into my format as:

explain select distinct(p.crawled_page_id) from page_content p , clause2 
c where p.crawled_page_id > c.source_id union all  select 
distinct(p.crawled_page_id) from page_content p,clause2 c where 
p.crawled_page_id < c.source_id;


QUERY 
PLAN 
-

Append  (cost=0.00..296085951076.34 rows=765318 width=8)
  ->  Unique  (cost=0.00..148042971711.58 rows=382659 width=8)
->  Nested Loop  (cost=0.00..136655213119.84 rows=4555103436696 
width=8)
  ->  Index Scan using idx_page_id on page_content p  
(cost=0.00..174214.02 rows=428817 width=8)
  ->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..185898.05 rows=10622488 width=4)

Index Cond: (p.crawled_page_id > c.source_id)
  ->  Unique  (cost=0.00..148042971711.58 rows=382659 width=8)
->  Nested Loop  (cost=0.00..136655213119.84 rows=4555103436696 
width=8)
  ->  Index Scan using idx_page_id on page_content p  
(cost=0.00..174214.02 rows=428817 width=8)
  ->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..185898.05 rows=10622488 width=4)

Index Cond: (p.crawled_page_id < c.source_id)
(11 rows)

I don't think this is correct because it produce 11 rows output.

Any further suggestions, Please guide.

Thanks & best Regards,
Adarsh Sharma



Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma
I just want to retrieve that id 's from page_content which do not have 
any entry in clause2 table.


Thanks , Adarsh

Vitalii Tymchyshyn wrote:

23.03.11 11:17, Adarsh Sharma ???(??):


I think it is very much faster but I don't understand the query :

*explain select distinct(b) from t1,t2 where t1.b >t2.d union all  
select distinct(b) from t1,t2 where  t1.b 
*

I don't understand it too. What are you trying to get? Is it
select distinct(b) from t1 where  b > (select min(d) from t2)* *or b  
< (select max(d) from t2)

?

Can you explain in words, not SQL, what do you expect do retrieve?

Best regards, Vitalii Tymchyshyn




Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Vitalii Tymchyshyn wrote:

23.03.11 12:10, Adarsh Sharma ???(??):
I just want to retrieve that id 's from page_content which do not 
have any entry in clause2 table.



Then
select distinct(p.crawled_page_id) from page_content p
 where NOT EXISTS (select 1 from  clause2 c where c.source_id = 
p.crawled_page_id);

is correct query.



I can't understand how* select 1 from  clause2 c where c.source_id = 
p.crawled_page_id works too, *i get my output .


What is the significance of 1 here.

Thanks , Adarsh
**

Best regards, Vitalii Tymchyshyn.




Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding 
the needed queries :


*First Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id)
pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id =
pdc_uima(# c.source_id) where (c.source_id is null);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=87927.000..87930.084 rows=72 loops=1)
  ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.191..87926.546 rows=74 loops=1)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 
loops=428467)

  Index Cond: (p.crawled_page_id = c.source_id)
Total runtime: 87933.882 ms :-(
(6 rows)

*Second Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where c.source_id 
= p.crawled_page_id);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7047.259..7050.261 rows=72 loops=1)
  ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.039..7046.826 rows=74 loops=1)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 7054.074 ms :-)
(6 rows)


Thanks & best Regards,
Adarsh Sharma




Chetan Suttraway wrote:



On Wed, Mar 23, 2011 at 4:08 PM, mailto:t...@fuzzy.cz>> wrote:

> I just want to retrieve that id 's from page_content which do
not have
> any entry in clause2 table.

In that case the query probably does not work (at least the query
you've
sent in the first post) as it will return even those IDs that have at
least one other row in 'clause2' (not matching the != condition).
At least
that's how I understand it.

true.
 


So instead of this

select distinct(p.crawled_page_id)
from page_content p, clause2 c where p.crawled_page_id !=
c.source_id ;

you should probably do this

select distinct(p.crawled_page_id)
from page_content p left join clause2 c on (p.crawled_page_id =
c.source_id) where (c.source_id is null);

I guess this will be much more efficient too.


This looks like to give expected results. Also note that the where 
clause "is null" is really required and is not an

optional predicate.

 


regards
Tomas


--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org
<mailto:pgsql-performance@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




--
Regards,
Chetan Suttraway
EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL 
<http://www.enterprisedb.com/> company.








Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Vitalii Tymchyshyn wrote:

23.03.11 13:21, Adarsh Sharma ???(??):

Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding 
the needed queries :


*First Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p left join clause2 c on (p.crawled_page_id = 
c.source_id) where (c.source_id is null);
 
QUERY 
PLAN  


-
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) 
(actual time=87927.000..87930.084 rows=72 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 
width=8) (actual time=0.191..87926.546 rows=74 loops=1)
 ->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
 ->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 
loops=428467)

   Index Cond: (p.crawled_page_id = c.source_id)
 Total runtime: 87933.882 ms :-(
(6 rows)

*Second Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where 
c.source_id = p.crawled_page_id);
 
QUERY 
PLAN  


-
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) 
(actual time=7047.259..7050.261 rows=72 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 
width=8) (actual time=0.039..7046.826 rows=74 loops=1)
 ->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
 ->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 
loops=428467)

   Index Cond: (c.source_id = p.crawled_page_id)
 Total runtime: 7054.074 ms :-)
(6 rows)



Actually the plans are equal, so I suppose it depends on what were run 
first :). Slow query operates with data mostly on disk, while fast one 
with data in memory.


Yes U 'r absolutely right, if I run it again, it display the output as :

pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p left join clause2 c on (p.crawled_page_id = c.source_id) 
where (c.source_id is null);


   
QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7618.452..7621.427 rows=72 loops=1)
  ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.131..7618.043 rows=74 loops=1)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.020..472.811 rows=428467 loops=1)
->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.015..0.015 rows=1 
loops=428467)

  Index Cond: (p.crawled_page_id = c.source_id)
Total runtime: 7637.132 ms
(6 rows)

I let U know after a fresh start (session ).
Then the true result comes and if further tuning required can be performed.

Best Regards, Adarsh


Best regards, Vitalii Tymchyshyn




Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma



Actually the plans are equal, so I suppose it depends on what were
run first :). Slow query operates with data mostly on disk, while
fast one with data in memory.

yeah. maybe the easiest way, is to start a fresh session and fire the 
queries.



After the fresh start , the results obtained are :

pdc_uima=# explain analyze select distinct(p.crawled_page_id)
pdc_uima-#  from page_content p left join clause2 c on (p.crawled_page_id =
pdc_uima(#  c.source_id) where (c.source_id is null);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7725.132..7728.341 rows=72 loops=1)
  ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.115..7724.713 rows=74 loops=1)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.021..472.199 rows=428467 loops=1)
->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.015..0.015 rows=1 
loops=428467)

  Index Cond: (p.crawled_page_id = c.source_id)
Total runtime: 7731.840 ms
(6 rows)

pdc_uima=#  explain analyze select distinct(p.crawled_page_id) 
from page_content p
pdc_uima-#   where NOT EXISTS (select 1 from  clause2 c where 
c.source_id = p.crawled_page_id);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=6192.249..6195.368 rows=72 loops=1)
  ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.036..6191.838 rows=74 loops=1)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.008..372.489 rows=428467 loops=1)
->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.012..0.012 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 6198.567 ms
(6 rows)


This seems a slight upper hand of the second query .


Would it be possible to tune it further.
My postgresql.conf parameters are as follows : ( Total RAM = 16 GB )

shared_buffers = 4GB
max_connections=700
effective_cache_size = 6GB
work_mem=16MB
maintenance_mem=64MB

I think to change

work_mem=64MB
maintenance_mem=256MB

Does it has some effects now.


Thanks & best Regards,
Adarsh Sharma



Best regards, Vitalii Tymchyshyn




--
Regards,
Chetan Suttraway
EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise PostgreSQL 
<http://www.enterprisedb.com/> company.








Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

t...@fuzzy.cz wrote:

On 03/23/2011 04:17 AM, Adarsh Sharma wrote:



explain analyze select distinct(p.crawled_page_id) from page_content
p where NOT EXISTS (select 1 from clause2 c where c.source_id =
p.crawled_page_id);
  

You know... I'm surprised nobody has mentioned this, but DISTINCT is
very slow unless you have a fairly recent version of Postgres that
replaces it with something faster. Try this:



Nobody mentioned that because the explain plan already uses hash aggregate
(instead of the old sort)

 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
time=7047.259..7050.261 rows=72 loops=1)

which means this is at least 8.4. Plus the 'distinct' step uses less than
1% of total time, so even if you improve it the impact will be minimal.

  


Yes, U"r absolutely right I am using Version 8.4SS and i am satisfied 
with the below query results:


pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where c.source_id 
= p.crawled_page_id);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=5149.308..5152.251 rows=72 loops=1)
  ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.119..5148.954 rows=74 loops=1)
->  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.021..444.487 rows=428467 loops=1)
->  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.009..0.009 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 5155.874 ms
(6 rows)

I don't think that the above results are optimized further.


Thanks & best Regards,
Adarsh Sharma

regards
Tomas

  




[PERFORM] Why Index is not used

2011-03-24 Thread Adarsh Sharma

Dear all,

Today I got to run a query internally from my application by more than 
10 connections.


But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty

5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));  
pg_size_pretty


4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where 
c.clause_id=s.clause_id and s.doc_id=c.source_id and c.

pdc_uima-# sentence_id=s.sentence_id ;
 QUERY 
PLAN 
--

Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
  Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) 
AND (s.sentence_id = c.sentence_id))
  ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65 
rows=27471560 width=1993)

  ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
  Sort Key: c.clause_id, c.source_id, c.sentence_id
  ->  Seq Scan on clause2 c  (cost=0.00..770951.84 
rows=31853084 width=72)




Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, 
sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id, 
sentence_id);


I don't know why it not uses the index scan for clause2 table.

Any suggestions to tune the query.


Thanks & best Regards,
Adarsh Sharma

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why Index is not used

2011-03-24 Thread Adarsh Sharma


Thanks Andreas, I was about print the output but it takes too much time.

Below is the output of explain analyze command :
pdc_uima=# explain analyze select c.clause, s.* from clause2 c, svo2 s 
where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.

pdc_uima-# sentence_id=s.sentence_id ;

QUERY 
PLAN

Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053) 
(actual time=216281.162..630721.636 rows=30473117 loops=1)
  Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) 
AND (s.sentence_id = c.sentence_id))
  ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65 
rows=27471560 width=1993) (actual time=0.130..177599.310 rows=27471560 
loops=1)
  ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72) 
(actual time=216280.596..370507.452 rows=52037763 loops=1)
->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72) 
(actual time=216280.591..324707.956 rows=31853083 loops=1)

  Sort Key: c.clause_id, c.source_id, c.sentence_id
  Sort Method:  external merge  Disk: 2616520kB
  ->  Seq Scan on clause2 c  (cost=0.00..770951.84 
rows=31853084 width=72) (actual time=0.025..25018.665 rows=31853083 loops=1)

Total runtime: 647804.037 ms
(9 rows)


Thanks , Adarsh

Andreas Kretschmer wrote:

Adarsh Sharma  wrote:

  

Dear all,

Today I got to run a query internally from my application by more than  
10 connections.


But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty

5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));   
pg_size_pretty


4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where  
c.clause_id=s.clause_id and s.doc_id=c.source_id and c.

pdc_uima-# sentence_id=s.sentence_id ;
 QUERY PLAN   
   
--

Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
  Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id)  
AND (s.sentence_id = c.sentence_id))
  ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65  
rows=27471560 width=1993)

  ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
  Sort Key: c.clause_id, c.source_id, c.sentence_id
  ->  Seq Scan on clause2 c  (cost=0.00..770951.84  
rows=31853084 width=72)




Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id,  
sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id,  
sentence_id);


I don't know why it not uses the index scan for clause2 table.



How many rows contains clause2? The planner expected 167324179 returning
rows, can you run the same explain with ANALYSE to see the real amount
of returning rows?


Andreas
  




Re: [PERFORM] Why Index is not used

2011-03-25 Thread Adarsh Sharma

Chetan Suttraway wrote:



On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma 
mailto:adarsh.sha...@orkash.com>> wrote:


Dear all,

Today I got to run a query internally from my application by more
than 10 connections.

But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty

5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));
 pg_size_pretty

4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
QUERY PLAN
   
--

Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
 Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id =
c.source_id) AND (s.sentence_id = c.sentence_id))
 ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65
rows=27471560 width=1993)
 ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
   ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
 Sort Key: c.clause_id, c.source_id, c.sentence_id
 ->  Seq Scan on clause2 c  (cost=0.00..770951.84
rows=31853084 width=72)



Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id,
source_id, sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id,
sentence_id);

I don't know why it not uses the index scan for clause2 table.


In this case, there are no predicates or filters on individual table. 
(maybe something like c.source_id=10)

so either of the 2 tables will have to go for simple scan.

Are you expecting seq. scan on svo2 and index scan on clause2?



As per the size consideration and the number of rows, I think index scan 
on clause2 is better.


Your constraint is valid  but  I need to perform  this query faster. 
What is the reason behind the seq scan of clause2.




Regards,
Adarsh







Re: [PERFORM] Why Index is not used

2011-03-25 Thread Adarsh Sharma




Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where 
c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where 
s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where 
c.sentence_id=s.sentence_id ;



As per your instructions, Please  check the below output :-

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where 
c.clause_id=s.clause_id;
  QUERY 
PLAN   
-
Hash Join  (cost=7828339.10..4349603998133.96 rows=379772050555842 
width=2053)

  Hash Cond: (c.clause_id = s.clause_id)
  ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
  ->  Hash  (cost=697537.60..697537.60 rows=27471560 width=1993)
->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 
width=1993)

(5 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where 
s.doc_id=c.source_id;
 QUERY 
PLAN  
---

Merge Join  (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
  Merge Cond: (c.source_id = s.doc_id)
  ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.source_id
->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 
width=64)

  ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
  Sort Key: s.doc_id
  ->  Seq Scan on svo2 s  (cost=0.00..697537.60 
rows=27471560 width=1993)

(9 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where 
c.sentence_id=s.sentence_id ;
 QUERY 
PLAN  
---

Merge Join  (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
  Merge Cond: (c.sentence_id = s.sentence_id)
  ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.sentence_id
->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 
width=64)

  ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
  Sort Key: s.sentence_id
  ->  Seq Scan on svo2 s  (cost=0.00..697537.60 
rows=27471560 width=1993)

(9 rows)

Please  let me know if any other information is required.







--
Best Regards,
Adarsh Sharma






Re: [PERFORM] Why Index is not used

2011-03-25 Thread Adarsh Sharma

Chetan Suttraway wrote:



On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma 
mailto:adarsh.sha...@orkash.com>> wrote:




Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where
c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where
s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where
c.sentence_id=s.sentence_id ;



As per your instructions, Please  check the below output :-

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
where c.clause_id=s.clause_id;
   QUERY
PLAN   
-

 Hash Join  (cost=7828339.10..4349603998133.96
rows=379772050555842 width=2053)
   Hash Cond: (c.clause_id = s.clause_id)
   ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084
width=64)
   ->  Hash  (cost=697537.60..697537.60 rows=27471560 width=1993)
 ->  Seq Scan on svo2 s  (cost=0.00..697537.60
rows=27471560 width=1993)
(5 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
where s.doc_id=c.source_id;
  QUERY
PLAN  
---

 Merge Join  (cost=43635232.12..358368926.66 rows=20954686217
width=2053)
   Merge Cond: (c.source_id = s.doc_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
 Sort Key: c.source_id
 ->  Seq Scan on clause2 c  (cost=0.00..770951.84
rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560
width=1993)
 ->  Sort  (cost=38028881.02..38097559.92 rows=27471560
width=1993)
   Sort Key: s.doc_id
   ->  Seq Scan on svo2 s  (cost=0.00..697537.60
rows=27471560 width=1993)
(9 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s
where c.sentence_id=s.sentence_id ;
  QUERY
PLAN  
---

 Merge Join  (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
   Merge Cond: (c.sentence_id = s.sentence_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
 Sort Key: c.sentence_id
 ->  Seq Scan on clause2 c  (cost=0.00..770951.84
rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560
width=1993)
 ->  Sort  (cost=38028881.02..38097559.92 rows=27471560
width=1993)
   Sort Key: s.sentence_id
   ->  Seq Scan on svo2 s  (cost=0.00..697537.60
rows=27471560 width=1993)
(9 rows)

Please  let me know if any other information is required.







-- 
Best Regards,

Adarsh Sharma




The ideas is to have maximum filtering occuring on leading column of 
index.
the first plan with only the predicates on clause_id is returning 
379772050555842 rows whereas

in the second plan with doc_id predicates is returning only 20954686217.

So maybe you should consider re-ordering of the index on clause2.

I am thinking that you created the indexes by looking at the columns 
used in the where clause.
But its not always helpful to create  indexes based on exact order of 
predicates specified in query.
Instead the idea should be consider the predicate which is going to do 
filter out the results.
Likewise we should consider all possible uses of index columns across 
all queries and then decide on the

order of columns for the composite index to be created.

Whats your take on this?


I am sorry but I am not able to got your points completely.

My table definitions are as :

*Clause2 Table :

*CREATE TABLE clause2
(
 id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
 source_id integer,
 sentence_id integer,
 clause_id integer,
 tense character varying(30),
 clause text,
 CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)
WITH (
 OIDS=FALSE
);
CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, 
sentence_id);


*svo2 table :*--

CREATE TABLE svo2
(
 svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
 doc_id integer,
 sentence_id integer,
 clause_id integer,
 negation integer,
 subject character varying(3000),
 verb character varying(3000),
 "object" character varying(3000),
 preposition character varying(3000),
 subject_type character varying(3000),
 object_type character varying(3000),
 subject_attribute character varying(3000),
 object_attribute character varying(3000),
 ver

[PERFORM] Postgres Performance Tuning

2011-04-04 Thread Adarsh Sharma

Dear all,

I have a Postgres database server with 16GB RAM.
Our application runs by making connections to Postgres Server from 
different servers and selecting data from one table & insert into 
remaining tables in a database.


Below is the no. of connections output :-

postgres=# select datname,numbackends from pg_stat_database;
 datname  | numbackends
---+-
template1 |   0
template0 |   0
postgres  |   3
template_postgis  |   0
pdc_uima_dummy| 107
pdc_uima_version3 |   1
pdc_uima_olap |   0
pdc_uima_s9   |   3
pdc_uima  |   1
(9 rows)

I am totally confused for setting configuration parameters in Postgres 
Parameters :-


First of all, I research on some tuning parameters and set mu 
postgresql.conf as:-


max_connections = 1000
shared_buffers = 4096MB
temp_buffers = 16MB   
work_mem = 64MB

maintenance_work_mem = 128MB
wal_buffers = 32MB
checkpoint_segments = 3   
random_page_cost = 2.0

effective_cache_size = 8192MB

Then I got some problems from Application Users that the Postgres Slows 
down and free commands output is :-


[root@s8-mysd-2 ~]# free -g
total   used   free sharedbuffers cached
Mem:15 15  0  0  0 14
-/+ buffers/cache:  0 14
Swap:   16  0 15
[root@s8-mysd-2 ~]# free  
total   used   free sharedbuffers cached

Mem:  16299476   16202264  97212  0  58924   15231852
-/+ buffers/cache: 911488   15387988
Swap: 16787884 153136   16634748

I think there may be some problem in my Configuration parameters and 
change it as :


max_connections = 700
shared_buffers = 4096MB
temp_buffers = 16MB   
work_mem = 64MB

maintenance_work_mem = 128MB
wal_buffers = 32MB
checkpoint_segments = 32  
random_page_cost = 2.0

effective_cache_size = 4096MB

but Still Postgres Server uses Swap Memory While SELECT & INSERT into 
database tables.


Please check the attached postgresql.conf .

And also have some views on how to tune this server.

DO I need to Increase my RAM s.t I hit H/W limitation.



Thanks & best Regards,
Adarsh Sharma





# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The "=" is optional.)  Whitespace may be used.  Comments are introduced with
# "#" anywhere on a line.  The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
#
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pg_ctl reload".  Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#
# Any parameter can also be given as a command-line option to the server, e.g.,
# "postgres -c log_connections=on".  Some parameters can be changed at run time
# with the "SET" SQL command.
#
# Memory units:  kB = kilobytesTime units:  ms  = milliseconds
#MB = megabytes s   = seconds
#GB = gigabytes min = minutes
#   h   = hours
#   d   = days


#--
# FILE LOCATIONS
#--

# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'   # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '(none)'   # write an extra PID file
# (change requires restart)


#--
# CONNECTIONS AND AUTHENTICATION
#---

Re: [PERFORM] Postgres Performance Tuning

2011-04-04 Thread Adarsh Sharma

t...@fuzzy.cz wrote:

max_connections = 700
shared_buffers = 4096MB
temp_buffers = 16MB
work_mem = 64MB
maintenance_work_mem = 128MB
wal_buffers = 32MB
checkpoint_segments = 32
random_page_cost = 2.0
effective_cache_size = 4096MB


First of all, there's no reason to increase wal_buffers above 32MB. AFAIK
the largest sensible value is 16MB - I doubt increasing it further will
improve performance.

Second - effective_cache_size is just a hint how much memory is used by
the operating system for filesystem cache. So this does not influence
amount of allocated memory in any way.


but Still Postgres Server uses Swap Memory While SELECT & INSERT into
database tables.


Are you sure it's PostgreSQL. What else is running on the box? Have you
analyzed why the SQL queries are slow (using EXPLAIN)?


Thanks , Below is my action points :-

max_connections = 300 ( I don't think that application uses more than 
300 connections )

shared_buffers = 4096MB
temp_buffers = 16MB
work_mem = 64MB
maintenance_work_mem = 128MB
wal_buffers = 16MB ( As per U'r suggestions )
checkpoint_segments = 32
random_page_cost = 2.0
effective_cache_size = 8192MB ( Recommended 50% of RAM )


My Shared Memory Variables are as:-


[root@s8-mysd-2 ~]# cat /proc/sys/kernel/shmmax

6442450944

[root@s8-mysd-2 ~]# cat /proc/sys/kernel/shmall

6442450944

[root@s8-mysd-2 ~]


Please let me know if any parameter need some change.

As now I am going change my parameters as per the below link :-

http://airumman.blogspot.com/2011/03/postgresql-parameters-for-new-dedicated.html

But one thing I am not able to understand is :-

Start the server and find out how much memory is still available for the 
OS filesystem cache



U'r absolutely right I am also researching on the explain of all select 
statements and i find one reason of poor indexing on TEXT columns.




Thanks & best Regards,
Adarsh Sharma





regards
Tomas






Re: [PERFORM] Postgres Performance Tuning

2011-04-04 Thread Adarsh Sharma

My database size is :-
postgres=# select pg_size_pretty(pg_database_size('pdc_uima_dummy'));
pg_size_pretty

49 GB
(1 row)

I have a doubt regarding postgres Memory Usage :-

Say my Application makes Connection to Database Server ( *.*.*.106) from 
(*.*.*.111, *.*.*.113, *.*.*.114) Servers and I check the top command as 
:-- Say it makes 100 Connections


top - 17:01:02 up  5:39,  4 users,  load average: 0.00, 0.00, 0.00
Tasks: 170 total,   1 running, 169 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.2% sy,  0.0% ni, 99.6% id,  0.1% wa,  0.0% hi,  
0.0% si,  0.0% st

Mem:  16299476k total, 16198784k used,   100692k free,73776k buffers
Swap: 16787884k total,   148176k used, 16639708k free, 15585396k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  
COMMAND 

3401 postgres  20   0 4288m 3.3g 3.3g S0 21.1   0:24.73 
postgres

3397 postgres  20   0 4286m 119m 119m S0  0.8   0:00.36 
postgres

4083 postgres  20   0 4303m 104m 101m S0  0.7   0:07.68 
postgres

3402 postgres  20   0 4288m  33m  32m S0  0.2   0:03.67 
postgres

4082 postgres  20   0 4301m  27m  25m S0  0.2   0:00.85 
postgres

4748 postgres  20   0 4290m 5160 3700 S0  0.0   0:00.00 
postgres

4173 root  20   0 12340 3028 1280 S0  0.0   0:00.12 
bash

4084 postgres  20   0 4290m 2952 1736 S0  0.0   0:00.00 
postgres

4612 root  20   0 12340 2920 1276 S0  0.0   0:00.06 
bash

4681 root  20   0 12340 2920 1276 S0  0.0   0:00.05 
bash

4550 root  20   0 12208 2884 1260 S0  0.0   0:00.08 
bash

4547 root  20   0 63580 2780 2204 S  


and free command says :--
[root@s8-mysd-2 8.4SS]# free -g
total   used   free sharedbuffers cached
Mem:15 15  0  0  0 14
-/+ buffers/cache:  0 15
Swap:   16  0 15
[root@s8-mysd-2 8.4SS]#


Now when my job finishes and I close the Connections from 2 Servers , 
the top & free output remains the same :-


I don't know What is the reason behind this as I have only 3 Connections 
from the below command :


postgres=# select datname, client_addr,current_query from pg_stat_activity;
   datname |  client_addr  |  
current_query  
+---+--
postgres   |   | select datname, 
client_addr,current_query from pg_stat_activity;

postgres   | 192.168.0.208 | 
pdc_uima_s9| 192.168.0.208 | 
pdc_uima_s9| 192.168.0.208 | 
pdc_uima_dummy | 192.168.0.208 | 
pdc_uima_dummy | 192.168.1.102 | 
pdc_uima_dummy | 192.168.1.102 | 
pdc_uima_dummy | 192.168.1.102 | 
(8 rows)


PLease help me to understand how much memory does 1 Connection Uses and 
how to use Server parameters accordingly.



Thanks & best Regards,
Adarsh Sharma



Raghavendra wrote:

Adarsh,

What is the Size of Database?

Best Regards,
Raghavendra
EnterpriseDB Corporation

On Mon, Apr 4, 2011 at 4:24 PM, Scott Marlowe <mailto:scott.marl...@gmail.com>> wrote:


On Mon, Apr 4, 2011 at 4:43 AM, Scott Marlowe
mailto:scott.marl...@gmail.com>> wrote:
>
>> [root@s8-mysd-2 ~]# free  total   used  
free shared

>>buffers cached
>> Mem:  16299476   16202264  97212  0  58924
  15231852
>> -/+ buffers/cache: 911488   15387988
>> Swap: 16787884 153136   16634748
>
> There is nothing wrong here.  You're using 153M out of 16G swap.
 15.x
> Gig is shared buffers.  If your system is slow, it's not because
it's
> running out of memory or using too much swap.

Sorry that's 15.xG is system cache, not shared buffers.  Anyw

Re: [PERFORM] Postgres Performance Tuning

2011-04-04 Thread Adarsh Sharma



Thanks Scott :

My iostat package is not installed but have a look on below output:

[root@s8-mysd-2 8.4SS]# vmstat 10
procs ---memory-- ---swap-- -io --system-- 
-cpu--
r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa st
1  0 147664  93920  72332 1558074801   113   170   47   177  6  
1 92  1  0
0  0 147664  94020  72348 1558074800 0 4  993   565  0  
0 100  0  0
0  0 147664  93896  72364 1558074800 0 5  993   571  0  
0 100  0  0
0  0 147664  93524  72416 1558086000 0   160 1015   591  0  
0 100  0  0
0  0 147664  93524  72448 1558086000 0 8 1019   553  0  
0 100  0  0
0  0 147664  93648  72448 1558086000 0 0 1019   555  0  
0 100  0  0
0  0 147664  93648  72448 1558086000 0 3 1023   560  0  
0 100  0  0


[root@s8-mysd-2 8.4SS]# iostat
-bash: iostat: command not found
[root@s8-mysd-2 8.4SS]#

Best regards,
Adarsh

Scott Marlowe wrote:

On Mon, Apr 4, 2011 at 5:34 AM, Adarsh Sharma  wrote:
  

Mem:  16299476k total, 16198784k used,   100692k free,73776k buffers
Swap: 16787884k total,   148176k used, 16639708k free, 15585396k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND
 3401 postgres  20   0 4288m 3.3g 3.3g S0 21.1   0:24.73
postgres
 3397 postgres  20   0 4286m 119m 119m S0  0.8   0:00.36
postgres
PLease help me to understand how much memory does 1 Connection Uses and how
to use Server parameters accordingly.



OK, first, see the 15585396k cached?  That's how much memory your OS
is using to cache file systems etc.  Basically that's memory not being
used by anything else right now, so the OS borrows it and uses it for
caching.

Next, VIRT is how much memory your process would need to load every
lib it might need but may not be using now, plus all the shared memory
it might need, plus it's own space etc.  It's not memory in use, it's
memory that might under the worst circumstances, be used by that one
process.  RES is the amount of memory the process IS actually
touching, including shared memory that other processes may be sharing.
 Finally, SHR is the amount of shared memory the process is touching.
so, taking your biggest process, it is linked to enough libraries and
shared memory and it's own private memory to add up to 4288Meg.  It is
currently actually touching 3.3G.  Of that 3.3G it is touching 3.3G is
shared with other processes.  So, the difference between RES and SHR
is 0, so the delta, or extra memory it's using besides shared memory
is ZERO (or very close to it, probably dozens or fewer of megabytes).

So, you're NOT running out of memory.  Remember when I mentioned
iostat, vmstat, etc up above?  Have you run any of those?
  




Re: [PERFORM] Postgres Performance Tuning

2011-04-04 Thread Adarsh Sharma



Thank U all,

I know some things to work on & after some work & study on them , I will 
continue this discussion tomorrow .



Best  Regards,
Adarsh

Raghavendra wrote:

Adarsh,
 


[root@s8-mysd-2 8.4SS]# iostat
-bash: iostat: command not found

/usr/bin/iostat

Our application runs by making connections to Postgres Server from
different servers and selecting data from one table & insert into
remaining tables in a database.


When you are doing bulk inserts you need to tune AUTOVACUUM parameters 
or Change the autovacuum settings for those tables doing bulk INSERTs. 
Insert's need analyze.


 


#autovacuum = on# Enable autovacuum
subprocess?  'on'
   # requires track_counts to
also be on.
#log_autovacuum_min_duration = -1   # -1 disables, 0 logs all
actions and
   # their durations, > 0 logs
only
   # actions running at least
this number
   # of milliseconds.
#autovacuum_max_workers = 3 # max number of autovacuum
subprocesses
#autovacuum_naptime = 1min  # time between autovacuum runs
#autovacuum_vacuum_threshold = 50   # min number of row
updates before
   # vacuum
#autovacuum_analyze_threshold = 50  # min number of row
updates before
   # analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size
before vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of table size
before analyze
#autovacuum_freeze_max_age = 2  # maximum XID age before
forced vacuum
   # (change requires restart)
#autovacuum_vacuum_cost_delay = 20ms# default vacuum cost
delay for
   # autovacuum, in milliseconds;
   # -1 means use
vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1  # default vacuum cost
limit for
   # autovacuum, -1 means use
   # vacuum_cost_limit


These are all default AUTOVACUUM settings. If you are using PG 8.4 or 
above, try AUTOVACUUM settings on bulk insert tables for better 
performance. Also need to tune the 'autovacuum_naptime' 


Eg:-
 ALTER table  SET (autovacuum_vacuum_threshold=x, 
autovacuum_analyze_threshold=);


wal_buffers  //max is 16MB
checkpoint_segment/// Its very less in your setting
checkpoint_timeout 
temp_buffer  // If application is using temp tables



These parameter will also boost the performance.

Best Regards
Raghavendra
EnterpriseDB Corporation.

 




 


Scott Marlowe wrote:

    On Mon, Apr 4, 2011 at 5:34 AM, Adarsh Sharma  
<mailto:adarsh.sha...@orkash.com> wrote:
  

Mem:  16299476k total, 16198784k used,   100692k free,73776k buffers
Swap: 16787884k total,   148176k used, 16639708k free, 15585396k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND
 3401 postgres  20   0 4288m 3.3g 3.3g S0 21.1   0:24.73
postgres
 3397 postgres  20   0 4286m 119m 119m S0  0.8   0:00.36
postgres
PLease help me to understand how much memory does 1 Connection Uses and how
to use Server parameters accordingly.


OK, first, see the 15585396k cached?  That's how much memory your OS
is using to cache file systems etc.  Basically that's memory not being
used by anything else right now, so the OS borrows it and uses it for
caching.

Next, VIRT is how much memory your process would need to load every
lib it might need but may not be using now, plus all the shared memory
it might need, plus it's own space etc.  It's not memory in use, it's
memory that might under the worst circumstances, be used by that one
process.  RES is the amount of memory the process IS actually
touching, including shared memory that other processes may be sharing.
 Finally, SHR is the amount of shared memory the process is touching.
so, taking your biggest process, it is linked to enough libraries and
shared memory and it's own private memory to add up to 4288Meg.  It is
currently actually touching 3.3G.  Of that 3.3G it is touching 3.3G is
shared with other processes.  So, the difference between RES and SHR
is 0, so the delta, or extra memory it's using besides shared memory
is ZERO (or very close to it, probably dozens or fewer of megabytes).

So, you're NOT running out of memory.  Remember when I mentioned
iostat, vmstat, etc up above?  Have you run any of those?
  







Re: [PERFORM] Postgres Performance Tuning

2011-04-05 Thread Adarsh Sharma

Hi, Good Morning To All of You.

Yesterday I had some research on my problems. As Scott rightly suggest 
me to have pre information before posting in the list, I aggreed to him.


Here is my first doubt , that I explain as:

My application makes several connections to Database Server & done their 
work :


During this process have a look on below output of free command :

[root@s8-mysd-2 ~]# free -m
total   used   free sharedbuffers cached
Mem: 15917  15826 90  0101  15013
-/+ buffers/cache:711  15205
Swap:16394143  16250

It means 15 GB memory is cached.

[root@s8-mysd-2 ~]#  cat /proc/meminfo
MemTotal:   16299476 kB
MemFree:   96268 kB
Buffers:  104388 kB
Cached: 15370008 kB
SwapCached: 3892 kB
Active:  6574788 kB
Inactive:8951884 kB
Active(anon):3909024 kB
Inactive(anon):   459720 kB
Active(file):2665764 kB
Inactive(file):  8492164 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal:  16787884 kB
SwapFree:   16640472 kB
Dirty:  1068 kB
Writeback: 0 kB
AnonPages: 48864 kB
Mapped:  4277000 kB
Slab: 481960 kB
SReclaimable: 466544 kB
SUnreclaim:15416 kB
PageTables:57860 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:24904852 kB
Committed_AS:5022172 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  310088 kB
VmallocChunk:   34359422091 kB
HugePages_Total:  32
HugePages_Free:   32
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:3776 kB
DirectMap2M:16773120 kB
[root@s8-mysd-2 ~]#

Now Can I know why the cached memory is not freed after the connections 
done their work & their is no more connections :


pdc_uima_dummy=# select datname,numbackends from pg_stat_database;
 datname  | numbackends
---+-
template1 |   0
template0 |   0
postgres  |   2
template_postgis  |   0
pdc_uima_dummy|  11
pdc_uima_version3 |   0
pdc_uima_olap |   0
pdc_uima_s9   |   0
pdc_uima  |   0
(9 rows)

Same output is when it has 100 connections.

Now I have to start more queries on Database Server and issue new 
connections after some time. Why the cached memory is not freed.


Flushing the cache memory is needed & how it could use so much if I set

effective_cache_size = 4096 MB.

I think if i issue some new select queries on large set of data, it will 
use Swap Memory & degrades Performance.


Please correct if I'm wrong.


Thanks & best Regards,
Adarsh Sharma














Raghavendra wrote:

Adarsh,
 


[root@s8-mysd-2 8.4SS]# iostat
-bash: iostat: command not found

/usr/bin/iostat

Our application runs by making connections to Postgres Server from
different servers and selecting data from one table & insert into
remaining tables in a database.


When you are doing bulk inserts you need to tune AUTOVACUUM parameters 
or Change the autovacuum settings for those tables doing bulk INSERTs. 
Insert's need analyze.


 


#autovacuum = on# Enable autovacuum
subprocess?  'on'
   # requires track_counts to
also be on.
#log_autovacuum_min_duration = -1   # -1 disables, 0 logs all
actions and
   # their durations, > 0 logs
only
   # actions running at least
this number
   # of milliseconds.
#autovacuum_max_workers = 3 # max number of autovacuum
subprocesses
#autovacuum_naptime = 1min  # time between autovacuum runs
#autovacuum_vacuum_threshold = 50   # min number of row
updates before
   # vacuum
#autovacuum_analyze_threshold = 50  # min number of row
updates before
   # analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size
before vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of table size
before analyze
#autovacuum_freeze_max_age = 2  # maximum XID age before
forced vacuum
   # (change requires restart)
#autovacuum_vacuum_cost_delay = 20ms# default vacuum cost
delay for
   # autovacuum, in milliseconds;
   # -1 means use
vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1  # default vacuum cost
limit for
   # autovacuum, -1 means use
   

[PERFORM] Which is better Index

2011-04-05 Thread Adarsh Sharma

Dear all,

I want to clear my doubts regarding creating several single or a 
multi-column indexes.

My table schema is :-
CREATE TABLE svo2(  svo_id bigint NOT NULL DEFAULT 
nextval('svo_svo_id_seq'::regclass),  doc_id integer,  sentence_id 
integer,  clause_id integer,  negation integer,  subject 
charactervarying(3000),  verb character varying(3000),  "object" 
character varying(3000),  preposition character varying(3000),  
subject_type character varying(3000),  object_type 
charactervarying(3000),  subject_attribute character varying(3000),  
object_attribute character varying(3000),  verb_attribute character 
varying(3000),  subject_concept character varying(100), object_concept 
character varying(100),  subject_sense character varying(100),  
object_sense character varying(100),  subject_chain character 
varying(5000),  object_chain character varying(5000),  sub_type_id 
integer,  obj_type_id integer,  CONSTRAINT pk_svo_id PRIMARY KEY 
(svo_id))WITH (  OIDS=FALSE);



_*Fore.g*_

CREATE INDEX idx_svo2_id_dummy  ON svo2  USING btree (doc_id, clause_id, 
sentence_id);


or

CREATE INDEX idx_svo2_id_dummy  ON svo2  USING btree (doc_id);
CREATE INDEX idx_svo2_id_dummy1  ON svo2  USING btree (clause_id);
CREATE INDEX idx_svo2_id_dummy2  ON svo2  USING btree (sentence_id);

Which is better if a query uses all three columns in join where clause.



Thanks & best regards,
Adarsh Sharma



Re: [PERFORM] Postgres Performance Tuning

2011-04-05 Thread Adarsh Sharma

Scott Marlowe wrote:

On Tue, Apr 5, 2011 at 1:33 AM, Adarsh Sharma  wrote:
  

[root@s8-mysd-2 ~]# free -m
   total   used   free sharedbuffers cached
Mem: 15917  15826 90  0101  15013
-/+ buffers/cache:711  15205
Swap:16394143  16250

It means 15 GB memory is cached.



Note that the kernel takes all otherwise unused memory and uses it for
cache.  If, at any time a process needs more memory, the kernel just
dumps some cached data and frees up the memory and hands it over, it's
all automatic.  As long as cache is large, things are OK.  You need to
be looking to see if you're IO bound or CPU bound first.  so, vmstat
(install the sysstat package) is the first thing to use.
  

Thanks a lot , Scott. :-)



Best Regards , Adarsh




Re: [PERFORM] The right SHMMAX and FILE_MAX

2011-05-01 Thread Adarsh Sharma
I am also in need of a proper documentation that explains how to set  
SHMAX and SHMALL variables in Postgres.


What things need to be taken in consideration before doing that ?
What is the value of SHMAX & SHMALL if u have 16 GB RAM for Postgres 
Server ?




Thanks

Phoenix Kiula wrote:

Hi. I'm on a 64 Bit CentOS 5 system, quadcore processor, 8GB RAM and
tons of data storage (1 TB SATAII disks).

The current SHMMAX and SHMMIN are (commas added for legibility) --

kernel.shmmax = 68,719,476,736
kernel.shmall = 4,294,967,296

Now, according to my reading in the PG manual and this list, a good
recommended value for SHMMAX is

   (shared_buffers * 8192)

My postgresql.conf settings at the moment are:

max_connections = 300
shared_buffers = 300MB
effective_cache_size = 2000MB

By this calculation, shared_b * 8192 will be:

 2,457,600,000,000

That's a humongous number. So either the principle for SHMMAX is
amiss, or I am reading this wrongly?

Similarly with "fs.file_max". There are articles like this one:
http://tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec72.html

Is this relevant for PostgreSQL performance at all, or should I skip that?

Thanks for any pointers!

  



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The right SHMMAX and FILE_MAX

2011-05-02 Thread Adarsh Sharma

Tomas Vondra wrote:

Dne 2.5.2011 06:53, Adarsh Sharma napsal(a):
  
I am also in need of a proper documentation that explains how to set 
SHMAX and SHMALL variables in Postgres.


What things need to be taken in consideration before doing that ?
What is the value of SHMAX & SHMALL if u have 16 GB RAM for Postgres
Server ?



Well, those two values actually define kernel limits for shared memory
segments (i.e. memory shared by multiple processes, in this case the
postmaster proces and backends). So it's rather a question of tuning
shared_buffers (because that's the shared memory segment) and then
setting those two values.
  


When I was tuning Postgresql for best Performance, I set my 
shared_buffers= 4096 MB as I set 25% of RAM ( 1/4 )


So Do I need to set my SHMMAX =4096 MB.

What is the SHMALL size now ?


SHMMAX - max. size of a single shared segment (in bytes)
SHMALL - total size of shared segments (in pages, page is usually 4kB)

So if you decide you want 1GB shared buffers, you'll need at least this

SHMMAX = 1024 * 1024 * 1024 (i.e. 1GB)
SHMALL = 1024 * 256 (1GB in 4kB pages)

(althouth the SHMALL should be higher, as there will be other processes
that need shared memory).

There's a lot of docs about this, e.g. this one (it's mostly for Oracle,
but it describes the shared memory quite nicely):

http://www.puschitz.com/TuningLinuxForOracle.shtml#SettingSharedMemory

regards
Tomas

  




[PERFORM] Why query takes soo much time

2011-05-15 Thread Adarsh Sharma

Dear all,
I have a query on 3 tables in a database as :-

_*Explain Analyze Output :-*_

explain anayze select c.clause, s.subject ,s.object , s.verb, 
s.subject_type , s.object_type ,s.doc_id ,s.svo_id from clause2 c, svo2 
s ,page_content p where c.clause_id=s.clause_id and s.doc_id=c.source_id 
and c.sentence_id=s.sentence_id and s.doc_id=p.crawled_page_id order by 
s.svo_id limit 1000 offset 17929000


"Limit  (cost=21685592.91..21686802.44 rows=1000 width=2624) (actual 
time=414601.802..414622.920 rows=1000 loops=1)"
"  ->  Nested Loop  (cost=59.77..320659013645.28 rows=265112018116 
width=2624) (actual time=0.422..404902.314 rows=1793 loops=1)"
"->  Nested Loop  (cost=0.00..313889654.42 rows=109882338 
width=2628) (actual time=0.242..174223.789 rows=17736897 loops=1)"
"  ->  Index Scan using pk_svo_id on svo2 s  
(cost=0.00..33914955.13 rows=26840752 width=2600) (actual 
time=0.157..14691.039 rows=14238271 loops=1)"
"  ->  Index Scan using idx_clause2_id on clause2 c  
(cost=0.00..10.36 rows=4 width=44) (actual time=0.007..0.008 rows=1 
loops=14238271)"
"Index Cond: ((c.source_id = s.doc_id) AND 
(c.clause_id = s.clause_id) AND (c.sentence_id = s.sentence_id))"
"->  Bitmap Heap Scan on page_content p  (cost=59.77..2885.18 
rows=2413 width=8) (actual time=0.007..0.008 rows=1 loops=17736897)"

"  Recheck Cond: (p.crawled_page_id = s.doc_id)"
"  ->  Bitmap Index Scan on idx_crawled_id  
(cost=0.00..59.17 rows=2413 width=0) (actual time=0.005..0.005 rows=1 
loops=17736897)"

"Index Cond: (p.crawled_page_id = s.doc_id)"
"Total runtime: 414623.634 ms"

_*My Table & index definitions are as under :-

*_Estimated rows in 3 tables are :-

clause2 10341700
svo2 26008000
page_content 479785

CREATE TABLE clause2
(
 id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
 source_id integer,
 sentence_id integer,
 clause_id integer,
 tense character varying(30),
 clause text,
 CONSTRAINT pk_clause_id PRIMARY KEY (id)
)WITH ( OIDS=FALSE);
CREATE INDEX idx_clause2_id  ON clause2  USING btree (source_id, 
clause_id, sentence_id);


CREATE TABLE svo2
(
 svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
 doc_id integer,
 sentence_id integer,
 clause_id integer,
 negation integer,
 subject character varying(3000),
 verb character varying(3000),
 "object" character varying(3000),
 preposition character varying(3000),
 subject_type character varying(3000),
 object_type character varying(3000),
 subject_attribute character varying(3000),
 object_attribute character varying(3000),
 verb_attribute character varying(3000),
 subject_concept character varying(100),
 object_concept character varying(100),
 subject_sense character varying(100),
 object_sense character varying(100),
 subject_chain character varying(5000),
 object_chain character varying(5000),
 sub_type_id integer,
 obj_type_id integer,
 CONSTRAINT pk_svo_id PRIMARY KEY (svo_id)
)WITH (  OIDS=FALSE);
CREATE INDEX idx_svo2_id_dummy  ON svo2  USING btree  (doc_id, 
clause_id, sentence_id);


CREATE TABLE page_content
(
 content_id integer NOT NULL DEFAULT 
nextval('page_content_ogc_fid_seq'::regclass),

 wkb_geometry geometry,
 link_level integer,
 isprocessable integer,
 isvalid integer,
 isanalyzed integer,
 islocked integer,
 content_language character(10),
 url_id integer,
 publishing_date character(40),
 heading character(150),
 category character(150),
 crawled_page_url character(500),
 keywords character(500),
 dt_stamp timestamp with time zone,
 "content" character varying,
 crawled_page_id bigint,
 CONSTRAINT page_content_pk PRIMARY KEY (content_id),
 CONSTRAINT enforce_dims_wkb_geometry CHECK (st_ndims(wkb_geometry) = 2),
 CONSTRAINT enforce_srid_wkb_geometry CHECK (st_srid(wkb_geometry) = (-1))
)WITH (  OIDS=FALSE);
CREATE INDEX idx_crawled_id  ON page_content  USING btree  
(crawled_page_id);
CREATE INDEX pgweb_idx  ON page_content  USING gin  
(to_tsvector('english'::regconfig, content::text));


If possible, Please let me know if I am something wrong or any alternate 
query to run it faster.



Thanks


[PERFORM] Restore database after drop command

2011-07-24 Thread Adarsh Sharma

Dear all,

I am using Postgres-8.4.2 on Windows system.
I have 2 databases in my postgres database ( globedatabase (21GB), 
urldatabase).


I restore globedatabase from a .sql file on yesterday morning.I insert 
some new data in that database.

In the evening, by mistake I issued a *drop database globedatabase* command.

Today morning, I restore again the same database from backup (.sql) file.
My .sql file have data till yesterday morning but I want newly insert 
data now. Is it possible.


Is it possible to get the data back till the state before drop database 
command.


My pglog files is in the E:/data directory & Binary log is also enabled.

Please let me know if it is possible. It's urgent.


Thanks & Regards
Adarsh Sharma


Re: [PERFORM] [ADMIN] Restore database after drop command

2011-07-25 Thread Adarsh Sharma

I go through the link, so it is impossible to get the data back.
I have following files in my pg_xlog directory :

000100070091
000100070092
000100070093
000100070094
000100070095
000100070096
000100070097
000100070098

I think I issued the drop database command 1 month ago.
From the manual, I understand that my segment files are recycled to 
newer ones :


/The segment files are given numeric names that reflect their position 
in the abstract WAL sequence. When not using WAL archiving, the system 
normally creates just a few segment files and then "recycles" them by 
renaming no-longer-needed segment files to higher segment numbers. It's 
assumed that a segment file whose contents precede the 
checkpoint-before-last is no longer of interest and can be recycled.


/My archive_status folder is empty.
How would we know that which data these segment files corresponds too.

I followed below steps 1 month ago :
1. Load globdatabase through backup.sql (21 GB)file
2. Insert some data near about 3-4 tables ( KB) data.
3. Drop database globdatabase.
4. Load globdatabase through backup.sql (21GB)file

May be there is chance because we work very rarely on that system.
Now i have the backup file bt I want that 3-4 tables.


Thanks

Vibhor Kumar wrote:

On Jul 25, 2011, at 12:08 PM, Adarsh Sharma wrote:

  

I restore globedatabase from a .sql file on yesterday morning.I insert some new 
data in that database.
In the evening, by mistake I issued a drop database globedatabase command.
Today morning, I restore again the same database from backup (.sql) file.
My .sql file have data till yesterday morning but I want newly insert data now. 
Is it possible.
Is it possible to get the data back till the state before drop database command.



No you won't be able to recover. 


If you have Online Backup, then PITR would help you.

Thanks & Regards,
Vibhor Kumar
Blogs: http://vibhork.blogspot.com
http://vibhorkumar.wordpress.com

  




[PERFORM] How to Speed up Insert from Multiple Connections

2011-08-01 Thread Adarsh Sharma

Dear all,

I research a lot on Postgresql Performance Tuning and find some 
parameters to increase the select performance in postgresql.
By increasing shared_buffers,effective_cache_size ,work_mem, 
maintainance etc , we can achieve performance in select queries.


But In my application about 200 connections are made to DB server and 
insert into 2 tables occured.

And it takes more than hours to complete.

I understand the variable checkpoint_segments & want to know is there 
any more ways to increase the write performance.



Thanks

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Which Join is better

2011-08-01 Thread Adarsh Sharma

Dear all,

Just want to know which join is better for querying data faster.

I have 2 tables A ( 70 GB ) & B ( 7 MB )

A has 10 columns & B has 3 columns.Indexes exist on both tables's ids.

select p.* from table A p, B q where p.id=q.id

or

select p.* from table B q , A p where q.id=p.id


Thanks

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Need to tune for Heavy Write

2011-08-03 Thread Adarsh Sharma

Dear all,

From the last few days, I researched a lot on Postgresql Performance 
Tuning due to slow speed of my server.
My application selects data from mysql database about 10 rows , 
process it & insert into postgres 2 tables by making about 45 connections.


I set my postgresql parameters in postgresql.conf as below: ( OS : 
Ubuntu, RAM : 16 GB, Postgres : 8.4.2 )


max_connections= 80
shared_buffers= 2048MB
work_mem = 32MB
maintenance_work_mem = 512MB
fsync=off 
full_page_writes=off 
synchronous_commit=off 
checkpoint_segments = 32
checkpoint_completion_target = 0.7  
effective_cache_size = 4096MB



After this I change my pg_xlog directory to a separate directory other 
than data directory by symlinking.



By Application issue insert statements through postgresql connections only.

Please let me know if I missing any other important configuration.



Thanks



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Parameters for PostgreSQL

2011-08-03 Thread Adarsh Sharma

I think RAID 10 is best among all the RAID Levels.


Thanks


Craig Ringer wrote:

On 04/08/11 11:42, Jayadevan M wrote:

Hello,

>The most important spec has been omitted. What's the storage subsystem?
We have storage on SAN, RAID 5.


RAID 5? That's *really* not ideal for database workloads, either Pg or 
Oracle, unless your RAID 5 storage backend has enough battery-backed 
write cache to keep huge amounts of writes in RAM and reorder them 
really effectively.


I hope each RAID 5 LUN is only across a few disks and is layered with 
RAID 1, though. RAID 5 becomes less reliable than using a single disk 
when used with too many HDDs, because the probability of a double-disk 
failure becomes greater than that of a single standalone disk failing. 
After being bitten by that a few times, these days I'm using RAID 6 in 
most cases where RAID 10 isn't practical.


In any case, "SAN" can be anything from a Linux box running an iSCSI 
target on top of a RAID 5 `md' software RAID volume on four 5400RPM 
HDDs, right up to a giant hundreds-of-fast-disks monster filer full of 
dedicated ASICs and great gobs of battery backed write cache DRAM. Are 
you able to be any more specific about what you're dealing with?


 
> > We are suing weblogic.

>   ^
> Best. Typo. Ever.
>
> I hear most people who use it want to, you're just brave enough to 
do it :-P

I wish I could make a few millions that way.


Thank you for all the replies. The first step is, of course, to 
migrate the data. I am working with ora2pg for that. I assume 
creating files with 'COPY' to work as input for PostgreSQL is the 
right approach? We don't have many stored procedures or packages. So 
that part should be OK.







Re: [PERFORM] How to track number of connections and hosts to Postgres cluster

2011-08-24 Thread Adarsh Sharma

pg_stat_activity keeps track of all this information.

select * from pg_stat_activity where datname='databasename';



Venkat Balaji wrote:

Thanks Guillaume !!

But, if put log_connections to on and log_disconnections to on 
wouldn't the Postgres be logging in lot of data ?


Will this not be IO intensive ? I understand that this is the best 
way, but, would want to know if there is an other way to reduce IO ( 
may be through queries to catalog tables ).


Thanks
Venkat

On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge 
mailto:guilla...@lelarge.info>> wrote:


On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:
> Hello Everyone,
>
> I am working on an alert script to track the number of
connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres
Cluster (even for
> a fraction of second).

You should set log_connections to on.

> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>

IDLE and IDLE in transactions are the kind of informations you get in
pg_stat_activity.

Length of connections, you can get it with log_disconnections.

> I would be making use of pg_stat_activity and also thought of
enabling
> logging the host ips in the db server log files which seems to
be expensive
> for me (in terms of IO and logfile size).
>

Using pg_stat_activity won't get you really small connections. You
need
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.


--
Guillaume
 http://blog.guillaume.lelarge.info
 http://www.dalibo.com