[PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread pgdba

Hi, I am having some trouble understanding a plan and was wondering if anyone
could guide me. The query in question here seems to be showing some
incorrect row counts. I have vacuumed and analyzed the table, but the
estimate versus the actual total seems to be way out (est 2870 vs actual
85k). Perhaps I am reading the plan incorrectly though. (hopefully the plan
below is readable)

db=# select version();
version
-
PostgreSQL 8.2.4 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115
(prerelease) (Debian 4.1.1-21)

db=# show shared_buffers ;
 shared_buffers

 300MB

#4GB ram, 2 SATA striped, XFS

db=# show default_statistics_target;
 default_statistics_target
---
 100

# stats have been raised to 1000 on both the destip and srcip columns
# create index slog_gri_idx on slog (gid,rule,(case when rule in (8,9) then
destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10));
# vacuum analyze verbose slog;

db=# show random_page_cost ;
 random_page_cost
--
 3

db=# select count(*) from slog
  count
-
 1,019,121

db=#select  count(*) as total
from slog
where gid=1::INTEGER
and rule in (1,2,8,9,10)
and (case when rule in (8,9) then destip else srcip
end)='192.168.10.23'::INET;
 total
---
 83,538

# problematic query
explain analyze
select  coalesce(uri,host((case when rule in (8,9) then srcip else destip
end))) as
destip,
case when rule in (8,9) then 'ext' else 'int' end as tp,
count(*) as total,
coalesce(sum(destbytes),0)+coalesce(sum(srcbytes),0) as bytes
from slog
where gid=1::INTEGER
and rule in (1,2,8,9,10)
and (case when rule in (8,9) then destip else srcip
end)='192.168.10.23'::INET
group by destip,tp
order by bytes desc,total desc,destip limit 20


Limit  (cost=6490.18..6490.23 rows=20 width=61) (actual
time=2036.968..2037.220 rows=20 loops=1)
->  Sort  (cost=6490.18..6490.90 rows=288 width=61) (actual
time=2036.960..2037.027 rows=20 loops=1)
Sort Key: (COALESCE(sum(destbytes), 0::numeric) +
COALESCE(sum(srcbytes), 0::numeric)), count(*), COALESCE(uri, host(CASE WHEN
(rule = ANY ('{8,9}'::integer[])) THEN srcip ELSE destip END))
->  HashAggregate  (cost=6470.50..6478.42 rows=288 width=61) (actual
time=2008.478..2022.125 rows=2057 loops=1)
->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
Recheck Cond: ((gid = 1) AND (rule = ANY
('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
->  Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26
rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
Index Cond: ((gid = 1) AND (rule = ANY
('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
Total runtime: 2037.585 ms

Does anyone have any suggestions?

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12902068
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread pgdba

Hi Tom,

Tom Lane-2 wrote:
> 
> pgdba <[EMAIL PROTECTED]> writes:
>> ->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
>> width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
>> Recheck Cond: ((gid = 1) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
>> ->  Bitmap Index Scan on slog_gri_idx 
>> (cost=0.00..82.26
>> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
>> Index Cond: ((gid = 1) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
> 
> [ blink... ]  Pray tell, what is the definition of this index?
> 
> With such a bizarre scan condition, it's unlikely you'll get any really
> accurate row estimate.
> 
>   regards, tom lane
> 
> 

Original index: "create index slog_gri_idx on slog (gid,rule,(case when rule
in (8,9) then
destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"

The purpose of that index is to match a specific query (one that gets run
frequently and needs to be fast). It is using the destip when rule 8/9, and
srcip when other, but only for a subset of the rules (1,2,8,9,10). There are
about 18 rules in total, but I'm only interested in those 5. I have tried a
couple of indices like:
create index test_destip_idx on slog (gid,destip) where rule in (8,9);
create index test_srcip_idx on slog (gid,srcip) where rule in (1,2,10);

But the original slog_gri_idx index was used instead. Is there a way that I
can rewrite that index then? Not that I'm a fan of a CASE statement in a
functional index, but I'm at a loss as to how else I can create this. Or
what else I can look into to make this faster?


-- 
View this message in context: 
http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12903194
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread pgdba



Tom Lane-2 wrote:
> 
> pgdba <[EMAIL PROTECTED]> writes:
>> Tom Lane-2 wrote:
>> ->  Bitmap Index Scan on slog_gri_idx 
>> (cost=0.00..82.26
>> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
>> Index Cond: ((gid = 1) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
>>> 
>>> [ blink... ]  Pray tell, what is the definition of this index?
> 
>> Original index: "create index slog_gri_idx on slog (gid,rule,(case when
>> rule
>> in (8,9) then
>> destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"
> 
>> The purpose of that index is to match a specific query (one that gets run
>> frequently and needs to be fast).
> 
> Ah.  I didn't think you would've put such a specific thing into an index
> definition, but if you're stuck supporting such badly written queries,
> maybe there's no other way.
> 
> I rather doubt that you're going to be able to make this query any
> faster than it is, short of buying enough RAM to keep the whole table
> RAM-resident.  Pulling 8 random rows in 1200 msec doesn't sound
> all that slow to me.
> 
> The ultimate solution might be to rethink your table designs ...
> 
>   regards, tom lane
> 

Badly written the query may be, but I do have the opportunity to change it.
Part of the problem is that I cannot come up with a better way of writing
it.

What about the discrepancy between the estimated row count and the actual
row count for that index access?
"Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26 rows=2870 width=0)
(actual time=41.306..41.306 rows=83538 loops=1)"

Is there anything I can do to influence that (not that it is likely to
change the plan, but...). I vacuumed and analyzed after I created the index,
so the stats should be at least be close (with stats target set to 1000
there).

-- 
View this message in context: 
http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12905186
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster