Tom Lane kirjutas R, 30.01.2004 kell 16:48: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Also, what does an in-memory bitmapped index look like? > > One idea that might work: a binary search tree in which each node > represents a single page of the table, and contains a bit array with > one bit for each possible item number on the page. You could not need > more than BLCKSZ/(sizeof(HeapTupleHeaderData)+sizeof(ItemIdData)) bits > in a node, or about 36 bytes at default BLCKSZ --- for most tables you > could probably prove it would be a great deal less. You only allocate > nodes for pages that have at least one interesting row.
Another idea would be using bitmaps where we have just one bit per database page and do a seq scan but just over marked pages. Even when allocating them in full such indexes would occupy just 1/(8k*8bit) of the amount they describe, so index for 1GB table would be 1G/(8k*8bit) = 16 kilobytes (2 pages) Also, such indexes, if persistent, could also be used (together with FSM) when deciding placement of new tuples, so they provide a form of clustering. This would of course be most useful for data-warehouse type operations, where database is significantöy bigger than memory. And the seqscan over bitmap should not be done in simple page order, but rather in two passes - 1. over those pages which are already in cache (either postgresqls or systems (if we find a way to get such info from the system)) 2. in sequential order over the rest. > I think this would represent a reasonable compromise between size and > insertion speed. It would only get large if the indexscan output > demanded visiting many different pages --- but at some point you could > abandon index usage and do a sequential scan, so I think that property > is okay. One case where almost full intermediate bitmap could be needed is when doing a star join or just AND of several conditions, where each single index spans a significant part of the table, but the result does not. > A variant is to make the per-page bit arrays be entries in a hash table > with page number as hash key. This would reduce insertion to a nearly > constant-time operation, but the drawback is that you'd need an explicit > sort at the end to put the per-page entries into page number order > before you scan 'em. You might come out ahead anyway, not sure. > > Or we could try a true linear bitmap (indexed by page number times > max-items-per-page plus item number) that's compressed in some fashion, > probably just by eliminating large runs of zeroes. The difficulty here > is that inserting a new one-bit could be pretty expensive, and we need > it to be cheap. > > Perhaps someone can come up with other better ideas ... I have also contemplated a scenario, where we could use some not-quite-max power-of-2 bits-per-page linear bitmap and mark intra-page wraps (when we tried to mark a point past that not-quite-max number in a page) in high bit (or another bitmap) making info for that page folded. AN example would be setting bit 40 in 32-bits/page index - this would set bit 40&31 and mark the page folded. When combining such indexes using AND or OR, we need some spcial handling of folded pages, but could still get non-folded (0) results out from AND of 2 folded pages if the bits are distributed nicely. -------------- Hannu ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend