On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote: Bruce asked if these should be TODOs...
Index compression is possible in many ways, depending upon thesituation. All of the following sound similar at a high level, but eachcovers a different use case. * For Long, Similar data e.g. Text we can use Prefix CompressionWe still store one pointer per row, but we reduce the size of the index by reducing the size of the key values. This requires us to reach inside datatypes, so isn't a very general solution but is probably an importantone in the future for Text.
I think what would be even more useful is doing this within the table itself, and then bubbling that up to the index.
* For Unique/nearly-Unique indexes we can use Range CompressionWe reduce the size of the index by holding one index pointer per rangeof values, thus removing both keys and pointers. It's more efficient than prefix compression and isn't datatype-dependant.
Definitely.
* For Highly Non-Unique Data we can use Duplicate CompressionThe latter is the technique used by Bitmap Indexes. Efficient, but notuseful for unique/nearly-unique data
Also definitely. This would be hugely useful for things like "status" or "type" fields.
* Multi-Column Leading Value Compression - if you have a multi-columnindex, then leading columns are usually duplicated between rows inserted at the same time. Using an on-block dictionary we can remove duplicates.Only useful for multi-column indexes, possibly overlapping/contained subset of the GIT use case.
Also useful, though I generally try and put the most diverse values first in indexes to increase the odds of them being used. Perhaps if we had compression this would change.
-- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828
smime.p7s
Description: S/MIME cryptographic signature