Hi Andrey, On Sun, Dec 3, 2017 at 10:11 PM, Andrey Borodin <x4...@yandex-team.ru> wrote: > I like the idea of more compact B-tree. > Chances are that I didn't understood all your ideas. > > But ItemId's let you insert a tuple among two existing tuples without data > movement. New tuple is places wherever free space starts. You just shift > bytes in ItemId array. > And you always have to insert tuple in specific position, since B-tree relies > on tuple order.
It's certainly true that the need to memmove() more data is a new cost to be paid under my proposal: a simple random insertion must move almost 50% the entire page on average, rather than almost 10% on average, as is the case today. I think that this would probably be acceptable. We really only pay this extra cost during random writes into the index (sequential writes will still just append within a page). Random inserts are already much more expensive than sequential int4/int8 index inserts, because of the extra FPIs emitted for full_page_writes, the fact that more pages are dirtied, etc. It's not that I think that random insertions are rare. I think that they're a lot rarer with simple SERIAL/BIGSERIAL primary keys on fact tables, which are what this optimization is really about. The approach might actually be faster for a workload that consists only of random insertions into a table with a SERIAL/BIGSERIAL primary key -- the "worst case". I'm less confident about that, but it seems very possible. -- Peter Geoghegan