[BUGS] bitmap index scan problem?
hi we have a table with some 30M records. running PG8.1.4. on linux. when we run with enable_bitmapscan true, PG begins by doing a bitmap index scan plus a BitmapAnd. when we run with enable_bitmapscan false, PG finds a better index directly and chooses a much better plan. below is some data, the query and the plans. as you can see, when using the Index Scan PG finds 62 rows. but when using Bitmap Index Scan it finds 2563790 + 506 rows which should never be better!? the question is simply why the planner is not smart enough to skip the bitmap scan if normal operation is faster. stat=# \d stats Table "public.stats" Column |Type | Modifiers ---+-+ id| bigint | not null default 0 timestamp | timestamp without time zone | aid | integer | i | integer | ct| integer | total | bigint | bid | integer | not null default 0 Indexes: "id_idx" btree (id) "bid_index" btree (bid) "ct_index" btree (ct) --GOOD PLAN FIRST-- stat=# set enable_bitmapscan to false; SET Time: 0.645 ms stat=# explain analyze select aid, ct, sum(total) from stats where ct='90' and bid=17675 GROUP BY aid, ct; QUERY PLAN --- HashAggregate (cost=28713.95..28714.63 rows=54 width=16) (actual time=1.072..1.080 rows=3 loops=1) -> Index Scan using bid_index on stats (cost=0.00..28709.92 rows=538 width=16) (actual time=0.100..0.804 rows=62 loops=1) Index Cond: (bid = 17675) Filter: (ct = 90) Total runtime: 1.163 ms (5 rows) Time: 2.692 ms --NOW THE BAD PLAN-- stat=# set enable_bitmapscan to true; SET Time: 2.775 ms stat=# explain analyze select aid, ct, sum(total) from stats where ct='90' and bid=17675 GROUP BY aid, ct; QUERY PLAN - HashAggregate (cost=15848.76..15849.44 rows=54 width=16) (actual time=13210.811..13210.818 rows=3 loops=1) -> Bitmap Heap Scan on stats (cost=13754.80..15844.73 rows=538 width=16) (actual time=13206.714..13210.525 rows=62 loops=1) Recheck Cond: ((bid = 17675) AND (ct = 90)) -> BitmapAnd (cost=13754.80..13754.80 rows=538 width=0) (actual time=13206.659..13206.659 rows=0 loops=1) -> Bitmap Index Scan on bid_index (cost=0.00..44.51 rows=7576 width=0) (actual time=0.137..0.137 rows=506 loops=1) Index Cond: (bid = 17675) -> Bitmap Index Scan on ct_index (cost=0.00..13710.04 rows=2409440 width=0) (actual time=13206.216..13206.216 rows=2563790 loops=1) Index Cond: (ct = 90) Total runtime: 13210.918 ms (9 rows) Time: 13212.121 ms /stig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] bitmap index scan problem?
Tom Lane wrote: stig erikson <[EMAIL PROTECTED]> writes: the question is simply why the planner is not smart enough to skip the bitmap scan if normal operation is faster. Probably because it hasn't got good statistics about the distribution of "bid": -> Bitmap Index Scan on bid_index (cost=0.00..44.51 rows=7576 width=0) (actual time=0.137..0.137 rows=506 loops=1) Index Cond: (bid = 17675) When the rowcount estimate is off by more than a factor of 10, the costs are going to be wrong too. Try increasing the statistics target for this table. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Hi. thank you for your answer. the last example was not 100% according to what my intention was, even though it shows the problem. here is a better example. i have not yet changed the default statistics target, how do i find out what i should change it to? again i am quite confused that PG does not use the bid_index when enable_bitmapscan is true. some other information: - there are around 30 000 000 rows in total. - there are usually 28-35 different values for ct at any given time, the number of times each value occurs varies from less then 10 to over 10 000 000. - there are usually 23 000 different values for bid at any given time. the number of times each value occurs varies from 100 to 20 000. stat=# show default_statistics_target; default_statistics_target --- 10 (1 row) stat=# VACUUM FULL ANALYZE stats ; VACUUM stat=# set enable_bitmapscan to 1; SET stat=# explain analyze select aid, ct, sum(total) from stats where (ct='90' OR ct='212') and bid=17675 GROUP BY aid, ct; QUERY PLAN --- HashAggregate (cost=18149.28..18149.94 rows=53 width=16) (actual time=14458.638..14458.644 rows=3 loops=1) -> Bitmap Heap Scan on stats (cost=15786.50..18144.74 rows=605 width=16) (actual time=14033.693..14458.260 rows=62 loops=1) Recheck Cond: ((bid = 17675) AND ((ct = 90) OR (ct = 212))) -> BitmapAnd (cost=15786.50..15786.50 rows=608 width=0) (actual time=14026.953..14026.953 rows=0 loops=1) -> Bitmap Index Scan on bid_index (cost=0.00..44.10 rows=7456 width=0) (actual time=79.293..79.293 rows=506 loops=1) Index Cond: (bid = 17675) -> BitmapOr (cost=15742.16..15742.16 rows=2766331 width=0) (actual time=13947.348..13947.348 rows=0 loops=1) -> Bitmap Index Scan on ct_index (cost=0.00..14675.92 rows=2579119 width=0) (actual time=13526.774..13526.774 rows=2563790 loops=1) Index Cond: (ct = 90) -> Bitmap Index Scan on ct_index (cost=0.00..1066.24 rows=187212 width=0) (actual time=420.564..420.564 rows=374354 loops=1) Index Cond: (ct = 212) Total runtime: 14458.747 ms (12 rows) stat=# set enable_bitmapscan to 0; SET stat=# explain analyze select aid, ct, sum(total) from stats where (ct='90' OR ct='212') and bid=17675 GROUP BY aid, ct; QUERY PLAN --- HashAggregate (cost=28152.82..28153.48 rows=53 width=16) (actual time=7.759..7.768 rows=3 loops=1) -> Index Scan using bid_index on stats (cost=0.00..28148.28 rows=605 width=16) (actual time=0.100..7.483 rows=62 loops=1) Index Cond: (bid = 17675) Filter: ((ct = 90) OR (ct = 212)) Total runtime: 7.858 ms (5 rows) /stig ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[BUGS] 8.1.5 rpms for FC5
Hi. There seems to be files missing in the rpm packages for postgresql 8.1.5. [] rpm -Uvh postgresql*.rpm error: Failed dependencies: libpq.so is needed by postgresql-server-8.1.5-2PGDG.i686 libpq.so is needed by postgresql-tcl-1.5.2-4PGDG.i686 i got my rpms from ftp.no.postgresql.org the files are: postgresql-8.1.5-2PGDG.i686.rpm postgresql-contrib-8.1.5-2PGDG.i686.rpm postgresql-devel-8.1.5-2PGDG.i686.rpm postgresql-docs-8.1.5-2PGDG.i686.rpm postgresql-jdbc-8.1.407-4PGDG.i686.rpm postgresql-libs-8.1.5-2PGDG.i686.rpm postgresql-odbc-08.02.0100-2PGDG.i686.rpm postgresql-pl-8.1.5-2PGDG.i686.rpm postgresql-python-3.8.1-3PGDG.i686.rpm postgresql-server-8.1.5-2PGDG.i686.rpm postgresql-tcl-1.5.2-4PGDG.i686.rpm postgresql-test-8.1.5-2PGDG.i686.rpm ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] BUG #1257: add support for 1-byte integer and 4-bit integer
Peter Eisentraut wrote: PostgreSQL Bugs List wrote: why is this feature important? having in mind the development of datawarehouses with huge amount of data (hundreds of millions, or billions of rows in fact tables) every byte is of importance. Yet how many applications could make use of the limited range that a 1-byte integer type would provide? Compared to the mess this would introduce into the resolution of numeric literals and functions, this seems to be of limited use. I suggest that you implement this as an extension module for your own use. 4-bit integers are out of the question. The architecture doesn't allow it. Thanks for your answer, as Tom suggested i will try the "char"-type. I would like to share some light on this. First you are right, there are not many applications that could make use of a 1-byte integer. In a data warehouse (OLAP), it might be a good idea. Consider a data warehouse where the time dimension (fact dimensions could benefit aswell) contains all kins of data about the time the fact record describes, this could include: NAME AND TYPETYPICAL VALUES REQUIRE dateid int8 8 bytes dateyearmonthday date NOT NULL, 8 bytes dateyearmonth date NOT NULL,8 bytes "year" int2 NOT NULL,2004 2 bytes monthinyear int2 NOT NULL, 1-12 1 byte dayinmonth int2 NOT NULL,1-31 1 byte dayinyear int2 NOT NULL, 3652 bytes dayinweek int2 NOT NULL, 1-71 byte calendarweeknumberinyear int2 NOT NULL, 1-52 1 byte calendarquarter int2 NOT NULL, 1-41 byte fiscalmonthinyear int2 NOT NULL, 1-12 1 byte fiscaldayinmonth int2 NOT NULL, 1-31 1 byte fiscaldayinyear int2 NOT NULL, 3652 bytes fiscaldayinweek int2 NOT NULL, 1-71 byte fiscalcalendarweeknumberinyear int2 NOT NULL, 1-52 1 byte fiscalcalendarquarter int2 NOT NULL, 1-41 byte as you can see in this limited example, if int2 is used all along we end up with 50 bytes. if there was an 1-byte integer we could get down to use only 38 bytes. 24% save that is. it is not neglectable when you have many millions of rows. data warehouses often have billions rows (10^9) or more. in the example one could argue that using all the various ints instead of just date is stupid, however it fills a function in data warehouses (not explained here). think of the analogy, in your wallet you have bank notes and coins. However, you dont throw the coins away saying "I'll stick to bank notes, the coins are worthless anyway". Of course we could argue that with a few coins you can't buy much, but think of the loss you would make if you always paid with a banknote and left the change behind, no big deal if you shop a few times, but if you have millions of transaction every coin is valuable. The coins fill a function, not always, but in some applications they are important. as is the 1-byte int. if the current architecture does not allow it, i agree, stick with the current. but please give it a thought for the future. by the way, can i enable table compression in some way? i could not find it in the manual, found only compression for backups and EXTENDED column storage, but nothing for integers. kind regards stig ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match