[BUGS] bitmap index scan problem?

2006-09-10 Thread stig erikson

hi
we have a table with some 30M records.
running PG8.1.4. on linux.

when we run with enable_bitmapscan true, PG begins by doing a bitmap index scan 
plus a BitmapAnd.

when we run with enable_bitmapscan false, PG finds a better index directly and 
chooses a much better plan.


below is some data, the query and the plans.
as you can see, when using the Index Scan PG finds 62 rows.
but when using Bitmap Index Scan it finds 2563790 + 506 rows which should never 
be better!?

the question is simply why the planner is not smart enough to skip the bitmap 
scan if normal operation is faster.




stat=# \d stats
 Table "public.stats"
  Column   |Type | Modifiers
---+-+
 id| bigint  | not null default 0
 timestamp | timestamp without time zone |
 aid   | integer |
 i | integer |
 ct| integer |
 total | bigint  |
 bid   | integer | not null default 0
Indexes:
"id_idx" btree (id)
"bid_index" btree (bid)
"ct_index" btree (ct)



--GOOD PLAN FIRST--
stat=# set enable_bitmapscan to false;
SET
Time: 0.645 ms
stat=# explain analyze select aid, ct, sum(total) from stats where ct='90' and 
bid=17675 GROUP BY aid, ct;
  QUERY PLAN
---
 HashAggregate  (cost=28713.95..28714.63 rows=54 width=16) (actual 
time=1.072..1.080 rows=3 loops=1)
   ->  Index Scan using bid_index on stats  (cost=0.00..28709.92 rows=538 
width=16) (actual time=0.100..0.804 rows=62 loops=1)
 Index Cond: (bid = 17675)
 Filter: (ct = 90)
 Total runtime: 1.163 ms
(5 rows)

Time: 2.692 ms




--NOW THE BAD PLAN--
stat=# set enable_bitmapscan to true;
SET
Time: 2.775 ms
stat=# explain analyze select aid, ct, sum(total) from stats where ct='90' and 
bid=17675 GROUP BY aid, ct;
 QUERY PLAN
-
HashAggregate  (cost=15848.76..15849.44 rows=54 width=16) (actual 
time=13210.811..13210.818 rows=3 loops=1)
   ->  Bitmap Heap Scan on stats  (cost=13754.80..15844.73 rows=538 width=16) 
(actual time=13206.714..13210.525 rows=62 loops=1)
 Recheck Cond: ((bid = 17675) AND (ct = 90))
 ->  BitmapAnd  (cost=13754.80..13754.80 rows=538 width=0) (actual 
time=13206.659..13206.659 rows=0 loops=1)
   ->  Bitmap Index Scan on bid_index  (cost=0.00..44.51 rows=7576 
width=0) (actual time=0.137..0.137 rows=506 loops=1)
 Index Cond: (bid = 17675)
   ->  Bitmap Index Scan on ct_index  (cost=0.00..13710.04 
rows=2409440 width=0) (actual time=13206.216..13206.216 rows=2563790 loops=1)
 Index Cond: (ct = 90)
 Total runtime: 13210.918 ms
(9 rows)

Time: 13212.121 ms




/stig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [BUGS] bitmap index scan problem?

2006-09-11 Thread stig erikson

Tom Lane wrote:

stig erikson <[EMAIL PROTECTED]> writes:

the question is simply why the planner is not smart enough to skip the bitmap 
scan if normal operation is faster.


Probably because it hasn't got good statistics about the distribution of
"bid":


->  Bitmap Index Scan on bid_index  (cost=0.00..44.51 rows=7576 
width=0) (actual time=0.137..0.137 rows=506 loops=1)
  Index Cond: (bid = 17675)


When the rowcount estimate is off by more than a factor of 10, the costs
are going to be wrong too.  Try increasing the statistics target for this
table.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



Hi.
thank you for your answer.
the last example was not 100% according to what my intention was, even though 
it shows the problem.
here is a better example.
i have not yet changed the default statistics target, how do i find out what i 
should change it to?

again i am quite confused that PG does not use the bid_index when 
enable_bitmapscan is true.

some other information:
- there are around 30 000 000 rows in total.

- there are usually 28-35 different values for ct at any given time, the number 
of times each value occurs varies from
  less then 10 to over 10 000 000.

- there are usually 23 000 different values for bid at any given time. the 
number of times each value occurs varies from
  100 to 20 000.




stat=# show default_statistics_target;
 default_statistics_target
---
 10
(1 row)

stat=# VACUUM FULL ANALYZE stats ;
VACUUM

stat=# set enable_bitmapscan to 1;
SET
stat=# explain analyze select aid, ct, sum(total) from stats where (ct='90' OR 
ct='212') and bid=17675 GROUP BY aid, ct;
QUERY 
PLAN
---
 HashAggregate  (cost=18149.28..18149.94 rows=53 width=16) (actual 
time=14458.638..14458.644 rows=3 loops=1)
   ->  Bitmap Heap Scan on stats  (cost=15786.50..18144.74 rows=605 width=16) 
(actual time=14033.693..14458.260 rows=62 loops=1)
 Recheck Cond: ((bid = 17675) AND ((ct = 90) OR (ct = 212)))
 ->  BitmapAnd  (cost=15786.50..15786.50 rows=608 width=0) (actual 
time=14026.953..14026.953 rows=0 loops=1)
   ->  Bitmap Index Scan on bid_index  (cost=0.00..44.10 rows=7456 
width=0) (actual time=79.293..79.293 rows=506 loops=1)
 Index Cond: (bid = 17675)
   ->  BitmapOr  (cost=15742.16..15742.16 rows=2766331 width=0) 
(actual time=13947.348..13947.348 rows=0 loops=1)
 ->  Bitmap Index Scan on ct_index  (cost=0.00..14675.92 
rows=2579119 width=0) (actual time=13526.774..13526.774 rows=2563790 loops=1)
   Index Cond: (ct = 90)
 ->  Bitmap Index Scan on ct_index  (cost=0.00..1066.24 
rows=187212 width=0) (actual time=420.564..420.564 rows=374354 loops=1)
   Index Cond: (ct = 212)
 Total runtime: 14458.747 ms
(12 rows)



stat=# set enable_bitmapscan to 0;
SET
stat=# explain analyze select aid, ct, sum(total) from stats where (ct='90' OR 
ct='212') and bid=17675 GROUP BY aid, ct;
  QUERY PLAN
---
 HashAggregate  (cost=28152.82..28153.48 rows=53 width=16) (actual 
time=7.759..7.768 rows=3 loops=1)
   ->  Index Scan using bid_index on stats  (cost=0.00..28148.28 rows=605 
width=16) (actual time=0.100..7.483 rows=62 loops=1)
 Index Cond: (bid = 17675)
 Filter: ((ct = 90) OR (ct = 212))
 Total runtime: 7.858 ms
(5 rows)




/stig

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[BUGS] 8.1.5 rpms for FC5

2006-10-17 Thread stig erikson

Hi.
There seems to be files missing in the rpm packages for postgresql 8.1.5.

[] rpm -Uvh postgresql*.rpm
error: Failed dependencies:
libpq.so is needed by postgresql-server-8.1.5-2PGDG.i686
libpq.so is needed by postgresql-tcl-1.5.2-4PGDG.i686








i got my rpms from ftp.no.postgresql.org
the files are:
postgresql-8.1.5-2PGDG.i686.rpm
postgresql-contrib-8.1.5-2PGDG.i686.rpm
postgresql-devel-8.1.5-2PGDG.i686.rpm
postgresql-docs-8.1.5-2PGDG.i686.rpm
postgresql-jdbc-8.1.407-4PGDG.i686.rpm
postgresql-libs-8.1.5-2PGDG.i686.rpm
postgresql-odbc-08.02.0100-2PGDG.i686.rpm
postgresql-pl-8.1.5-2PGDG.i686.rpm
postgresql-python-3.8.1-3PGDG.i686.rpm
postgresql-server-8.1.5-2PGDG.i686.rpm
postgresql-tcl-1.5.2-4PGDG.i686.rpm
postgresql-test-8.1.5-2PGDG.i686.rpm


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] BUG #1257: add support for 1-byte integer and 4-bit integer

2004-09-18 Thread stig erikson
Peter Eisentraut wrote:
PostgreSQL Bugs List wrote:
why is this feature important?
having in mind the development of datawarehouses with huge amount of
data (hundreds of millions, or billions of rows in fact tables) every
byte is of importance.

Yet how many applications could make use of the limited range that a 
1-byte integer type would provide?  Compared to the mess this would 
introduce into the resolution of numeric literals and functions, this 
seems to be of limited use.  I suggest that you implement this as an 
extension module for your own use.

4-bit integers are out of the question.  The architecture doesn't allow 
it.

Thanks for your answer, as Tom suggested i will try the "char"-type.
I would like to share some light on this. First you are right, there are 
not many applications that could make use of a 1-byte integer.

In a data warehouse (OLAP), it might be a good idea.
Consider a data warehouse where the time dimension (fact dimensions 
could benefit aswell) contains all kins of data about the time the fact 
record describes, this could include:

  NAME AND TYPETYPICAL VALUES REQUIRE
  dateid int8 8 bytes
  dateyearmonthday date NOT NULL, 8 bytes
  dateyearmonth date NOT NULL,8 bytes
  "year" int2 NOT NULL,2004   2 bytes
  monthinyear int2 NOT NULL,   1-12   1 byte
  dayinmonth int2 NOT NULL,1-31   1 byte
  dayinyear int2 NOT NULL, 3652 bytes
  dayinweek int2 NOT NULL, 1-71 byte
  calendarweeknumberinyear int2 NOT NULL,  1-52   1 byte
  calendarquarter int2 NOT NULL,   1-41 byte
  fiscalmonthinyear int2 NOT NULL, 1-12   1 byte
  fiscaldayinmonth int2 NOT NULL,  1-31   1 byte
  fiscaldayinyear int2 NOT NULL,   3652 bytes
  fiscaldayinweek int2 NOT NULL,   1-71 byte
  fiscalcalendarweeknumberinyear int2 NOT NULL, 1-52  1 byte
  fiscalcalendarquarter int2 NOT NULL, 1-41 byte
as you can see in this limited example, if int2 is used all along we end 
up with 50 bytes.
if there was an 1-byte integer we could get down to use only 38 bytes.
24% save that is. it is not neglectable when you have many millions of 
rows. data warehouses often have billions rows (10^9) or more.

in the example one could argue that using all the various ints instead 
of just date is stupid, however it fills a function in data warehouses 
(not explained here).

think of the analogy, in your wallet you have bank notes and coins.
However, you dont throw the coins away saying "I'll stick to bank notes, 
 the coins are worthless anyway". Of course we could argue that with a 
few coins you can't buy much, but think of the loss you would make if 
you always paid with a banknote and left the change behind, no big deal 
if you shop a few times, but if you have millions of transaction every 
coin is valuable. The coins fill a function, not always, but in some 
applications they are important. as is the 1-byte int.

if the current architecture does not allow it, i agree, stick with the 
current. but please give it a thought for the future.

by the way, can i enable table compression in some way?
i could not find it in the manual, found only compression for backups 
and EXTENDED column storage, but nothing for integers.

kind regards
stig
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match