[PERFORM] Query is running very slow......

2017-05-24 Thread Dinesh Chandra 12108
Dear Expert,

While executing the blow query, its taking too long time to fetch output.
Could you please help to fine tune the same?

SELECT
date_trunc('day', insert_time),
workflow.project.project_name,
workflow.tool_performance.project_id,
workflow.tool_performance.user_id,
workflow.tool_performance.step_id,
count(*),
round(sum(execution_time)/1000) as Sum_time_sec,
round(((round(sum(execution_time)/1000))/60)/count(*),2) as Efficency_Min,
round (((round(sum(execution_time)/1000)))/count(*),2) as Efficency_sec
FROM workflow.project,workflow.tool_performance,workflow.evidence_to_do
WHERE
workflow.evidence_to_do.project_id = workflow.tool_performance.project_id AND
workflow.evidence_to_do.project_id = workflow.project.project_id AND
workflow.tool_performance.insert_time >'2017-05-19' AND
workflow.tool_performance.insert_time <'2017-05-20' AND
workflow.evidence_to_do.status_id in (15100,15150,15200,15300,15400,15500)
Group BY
date_trunc('day', insert_time),
workflow.project.project_name,
workflow.tool_performance.project_id,
workflow.tool_performance.user_id,
workflow.tool_performance.step_id
ORDER BY
workflow.tool_performance.project_id,
workflow.project.project_name,
workflow.tool_performance.step_id

I am using PostgreSQL 9.1 with Linux Platform.

Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.
--
Mobile: +91-9953975849 | Ext 1078 
|dinesh.chan...@cyient.com
Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.



Re: [PERFORM] Query is running very slow......

2017-05-24 Thread Tomas Vondra

Hi,

On 5/24/17 7:04 PM, Dinesh Chandra 12108 wrote:

Dear Expert,

While executing the blow query, its taking too long time to fetch output.

Could you please help to fine tune the same?



You'll have to provide far more details - the query alone is certainly 
not enough for anyone to guess why it's slow. Perhaps look at this:


https://wiki.postgresql.org/wiki/Slow_Query_Questions

In particular, you'll have to tell us

(a) something about the hardware it's running on

(b) amounts of data in the tables / databases

(c) EXPLAIN or even better EXPLAIN ANALYZE of the query

(d) configuration of the database (work_mem, shared_buffers etc.)


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] join estimate of subqueries with range conditions and constraint exclusion

2017-05-24 Thread Justin Pryzby
We got bitten again by what appears to be the same issue I reported (perhaps
poorly) here:
https://www.postgresql.org/message-id/20170326193344.GS31628%40telsasoft.com

We have PG9.6.3 table heirarchies partitioned by time.  Our reports use
subqueries each with their own copies of a range clauses on time column, as
needed to get constraint exclusion reference:
https://www.postgresql.org/message-id/25076.1366321335%40sss.pgh.pa.us

SELECT * FROM
(SELECT * FROM t WHERE col>const) a JOIN
(SELECT * FROM t WHERE col>const) b USING (col)

I'm diagnosing a bad estimate/plan due to excessively high n_distinct leading
to underestimated rowcount when selecting from a small fraction of the table
heirarchy.  This leads intermittently to bad things, specifically a cascade of
misestimates and associated nested loops around millions of rows.

Artificial/generated/contrived test case, involving table with 99 instances
each of 99 values:

postgres=# CREATE TABLE t(i INT);
postgres=# TRUNCATE t;INSERT INTO t SELECT i FROM generate_series(1,99) 
i,generate_series(1,99);ANALYZE t;
postgres=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, 
tablename, attname, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist, (SELECT MAX(x) FROM 
unnest(most_common_vals::text::text[]) x) maxmcv, 
(histogram_bounds::text::text[])[array_length(histogram_bounds,1)] maxhist FROM 
pg_stats WHERE attname~'i' AND tablename='t' GROUP BY 1,2,3,4,5,6,7,8 ORDER BY 
1 DESC;
-[ RECORD 1 ]--
frac_mcv   | 1
tablename  | t
attname| i
n_distinct | 99
n_mcv  | 99
n_hist |
maxmcv | 99
maxhist|

range query (which could use constraint exclusion), but bad estimate:
postgres=# explain ANALYZE SELECT * FROM (SELECT * FROM t WHERE i<2) AS a JOIN 
(SELECT * FROM t WHERE i<2) AS b USING (i);
 Merge Join  (cost=339.59..341.57 rows=99 width=4) (actual time=8.272..16.892 
rows=9801 loops=1)

range query which could NOT use constraint exclusion, good estimate:
postgres=# explain ANALYZE SELECT * FROM (SELECT * FROM t) AS a JOIN (SELECT * 
FROM t) AS b USING (i) WHERE i<2;
 Hash Join  (cost=264.52..541.54 rows=9801 width=4) (actual time=12.688..22.325 
rows=9801 loops=1)

non-range query, good estimate:
postgres=# explain ANALYZE SELECT * FROM (SELECT * FROM t WHERE i=3) AS a JOIN 
(SELECT * FROM t WHERE i=3) AS b USING (i);
 Nested Loop  (cost=0.00..455.78 rows=9801 width=4) (actual time=0.482..15.820 
rows=9801 loops=1)

My understanding:
Postgres estimates join selectivity using number of distinct values of
underlying.  For the subqueries "a" and "b", the estimate is same as for
underlying table "t", even when selecting only a small fraction of the table...
This is adt/selfuncs:eqjoinsel_inner().

Note, in my tests, report queries on the child table have correct estimates;
and, queries with only "push down" WHERE clause outside the subquery have
correct estimate (but not constraint exclusion), apparently due to
calc_joinrel_size_estimate() returning the size of the parent table, planning
an join without restriction clause, following by filtering the join result, at
which point I guess the MCV list becomes useful and estimate is perfect..

SELECT * FROM
(SELECT * FROM t)a JOIN(SELECT * FROM t)b
USING (col) WHERE col>const

So my original question is basically still opened ... is it possible to get
both good estimates/plans AND constraint exclusion ??

Thanks
Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query is very much slow

2017-05-24 Thread Daulat Ram
Hi team,

We are getting very slow response of this query.

SELECT date_trunc('day', insert_time),workflow.project.project_name,
workflow.tool_performance.project_id,workflow.tool_performance.user_id,workflow.tool_performance.step_id,count(*),
round(sum(execution_time)/1000) as 
Sum_time_sec,round(((round(sum(execution_time)/1000))/60)/count(*),2) as 
Efficency_Min,round
(((round(sum(execution_time)/1000)))/count(*),2)
as Efficency_sec FROM 
workflow.project,workflow.tool_performance,workflow.evidence_to_do WHERE 
workflow.project
.project_id = workflow.tool_performance.project_id AND insert_time 
>'2017-05-01' AND insert_time <'2017-05-02' AND
workflow.evidence_to_do.status_id in (15100,15150,15200,15300,15400,15500) 
Group BY date_trunc('day', insert_time),workflow.project.project_name,
workflow.tool_performance.project_id,workflow.tool_performance.user_id,workflow.tool_performance.step_id
 ORDER BY
workflow.tool_performance.project_id,workflow.project.project_name,
workflow.tool_performance.step_id;

The following indexes are created on project & evidence_to_do table.

"CREATE INDEX project_id_idx ON workflow.project USING btree (project_id)"
"CREATE INDEX evidence_to_do_status_id_index ON workflow.evidence_to_do USING 
btree (status_id)"


Explain plan of the Query is:

"GroupAggregate  (cost=18675703613.60..20443753075.67 rows=6689718 width=69)"
"  ->  Sort  (cost=18675703613.60..18823015982.33 rows=58924947492 width=69)"
"Sort Key: tool_performance.project_id, project.project_name, 
tool_performance.step_id, (date_trunc('day'::text, 
tool_performance.insert_time)), tool_performance.user_id"
"->  Nested Loop  (cost=2.42..884042104.67 rows=58924947492 width=69)"
"  ->  Seq Scan on evidence_to_do  (cost=0.00..118722.17 
rows=554922 width=0)"
"Filter: (status_id = ANY 
('{15100,15150,15200,15300,15400,15500}'::bigint[]))"
"  ->  Materialize  (cost=2.42..49435.58 rows=106186 width=69)"
"->  Hash Join  (cost=2.42..48904.65 rows=106186 width=69)"
"  Hash Cond: (tool_performance.project_id = 
project.project_id)"
"  ->  Seq Scan on tool_performance  
(cost=0.00..47442.18 rows=106186 width=39)"
"Filter: ((insert_time > '2017-05-01 
00:00:00+05:30'::timestamp with time zone) AND (insert_time < '2017-05-02 
00:00:00+05:30'::timestamp with time zone))"
"  ->  Hash  (cost=1.63..1.63 rows=63 width=38)"
"->  Seq Scan on project  (cost=0.00..1.63 
rows=63 width=38)"


We have 64 GB of RAM &

CPU(s):40
Thread(s) per core:2
Core(s) per socket:10
Socket(s): 2


PostgreSQL.conf parameter:
shared_buffers =16GB
work_mem =32MB

Would you please help how we can tune this query at database & code level.

Regards Daulat