On Mon, Mar 26, 2018 at 3:10 AM, Alexander Korotkov <a.korot...@postgrespro.ru> wrote: > So, as I get you're proposing to introduce INDEX_ALT_TID_MASK flag > which would indicate that we're storing something special in the t_tid > offset. And that should help us not only for covering indexes, but also for > further btree enhancements including suffix truncation. What exactly do > you propose to store into t_tid offset when INDEX_ALT_TID_MASK flag > is set? Is it number of attributes in this particular index tuple?
Yes. I think that once INDEX_ALT_TID_MASK is available, we should store the number of attributes in that particular "separator key" tuple (which has undergone suffix truncation), and always work off of that. You could then have status bits in offset as follows: * 1 bit that represents that this is a "separator key" IndexTuple (high key or internal IndexTuple). Otherwise, it's a leaf IndexTuple with an ordinary heap TID. (When INDEX_ALT_TID_MASK isn't set, it's the same as today.) * 3 reserved bits. I think that one of these bits can eventually be used to indicate that the internal IndexTuple actually has a "normalized key" representation [1], which seems like the best way to do suffix truncation, long term. I think that we should support simple suffix truncation, of the kind that this patch implements, alongside normalized key suffix truncation. We need both for various reasons [2]. Not sure what the other two flag bits might be used for, but they seem worth having. * 12 bits for the number of attributes, which should be more than enough, even when INDEX_MAX_KEYS is significantly higher than 32. A static assertion can keep this safe when INDEX_MAX_KEYS is set ridiculously high. I think that this scheme is future-proof. Maybe you have additional ideas on the representation. Please let me know what you think. When we eventually add optimizations that affect IndexTuples on the leaf level, we can start using the block number (bi_hi + bi_lo) itself, much like GIN posting lists. No need to further consider that (the leaf level optimizations) today, because using block number provides us with many more bits. In internal page items, the block number is always a block number, so internal IndexTuples are rather like GIN posting tree pointers in the main entry tree (its leaf level) -- a conventional item pointer block number is used, alongside unconventional use of the offset field, where there are 16 bits available because no real offset is required. [1] https://wiki.postgresql.org/wiki/Key_normalization#Optimizations_enabled_by_key_normalization [2] https://wiki.postgresql.org/wiki/Key_normalization#How_big_can_normalized_keys_get.2C_and_is_it_worth_it.3F -- Peter Geoghegan