Hi Andrey,

On Sun, Dec 3, 2017 at 10:11 PM, Andrey Borodin <x4...@yandex-team.ru> wrote:
> I like the idea of more compact B-tree.
> Chances are that I didn't understood all your ideas.
>
> But ItemId's let you insert a tuple among two existing tuples without data 
> movement. New tuple is places wherever free space starts. You just shift 
> bytes in ItemId array.
> And you always have to insert tuple in specific position, since B-tree relies 
> on tuple order.

It's certainly true that the need to memmove() more data is a new cost
to be paid under my proposal: a simple random insertion must move
almost 50% the entire page on average, rather than almost 10% on
average, as is the case today. I think that this would probably be
acceptable.

We really only pay this extra cost during random writes into the index
(sequential writes will still just append within a page). Random
inserts are already much more expensive than sequential int4/int8
index inserts, because of the extra FPIs emitted for full_page_writes,
the fact that more pages are dirtied, etc. It's not that I think that
random insertions are rare. I think that they're a lot rarer with
simple SERIAL/BIGSERIAL primary keys on fact tables, which are what
this optimization is really about.

The approach might actually be faster for a workload that consists
only of random insertions into a table with a SERIAL/BIGSERIAL primary
key -- the "worst case". I'm less confident about that, but it seems
very possible.

-- 
Peter Geoghegan

Reply via email to