Re: "Write amplification" is made worse by "getting tired" while inserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)

Peter Geoghegan Thu, 02 Aug 2018 13:33:53 -0700

On Tue, Jul 17, 2018 at 10:42 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> If we knew that we were never going to do deletes/non-HOT updates from
> the table we could continue to use the existing mechanism, otherwise
> we will be better off to use sorted index entries. However, it does
> appear as if keeping entries in sorted order would be a win on
> concurrency from reduced block contention on the first few blocks of
> the index key, so it may also be a win in cases where there are heavy
> concurrent inserts but no deletes.


I think so too. I also expect a big reduction in the number of FPIs in
the event of many duplicates.

> I hope we can see a patch that just adds the sorting-by-TID property
> so we can evaluate that aspect before we try to add other more
> advanced index ideas.

I can certainly see why that's desirable. Unfortunately, it isn't that simple.

If I want to sort on heap TID as a tie-breaker, I cannot cut any
corners. That is, it has to be just another column, at least as far as
the implementation is concerned (heap TID will have a different
representation in internal pages and leaf high keys, but nonetheless
it's essentially just another column in the keyspace). This means that
if I don't have suffix truncation, I'll regress performance in many
important cases that have no compensating benefit (e.g. pgbench).
There is no point in trying to assess that.

It is true that I could opt to only "logically truncate" the heap TID
attribute during a leaf page split (i.e. there'd only be "logical
truncation", which is to say there'd only be the avoidance of adding a
heap TID to the new high key, and never true physical truncation of
user attributes). But doing only that much saves very little code,
since the logic for assessing whether or not it's safe to avoid adding
a new heap attribute (whether or not we logically truncate) still has
to involve an insertion scankey. It seems much more natural to do
everything at once. Again, the heap TID attribute is more or less just
another attribute. Also, the code for doing physical suffix truncation
already exists from the covering/INCLUDE index commit.

I'm currently improving the logic for picking a page split in light of
suffix truncation, which I've been working on for weeks now. I had
something that did quite well with the initial index sizes for TPC-C
and TPC-H, but then realized I'd totally regressed the motivating
example with many duplicates that I started this thread with. I now
have something that does both things well, which I'm trying to
simplify. Another thing to bear in mind is that page split logic for
suffix truncation also helps space utilization on the leaf level. I
can get the main TPC-C order_line pkey about 7% smaller with true
suffix truncation, even though the internal page index tuples can
never be any smaller due to alignment, and even though there are no
duplicates that would otherwise make the implementation "get tired".

Can I really fix space utilization in a piecemeal fashion? I strongly doubt it.

-- 
Peter Geoghegan

Re: "Write amplification" is made worse by "getting tired" while inserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)

Reply via email to