Re: Deleting older versions in unique indexes to avoid page splits

Peter Geoghegan Thu, 22 Oct 2020 10:43:12 -0700

On Thu, Oct 22, 2020 at 10:12 AM Simon Riggs <si...@2ndquadrant.com> wrote:
> > 18,988.762398 TPS for the patch
> > 11,123.551707 TPS for the master branch.
>
> Very good.


I'm happy with this result, but as I said it's not really the point. I
can probably get up to a 5x or more improvement in TPS if I simply add
enough indexes.

The point is that we're preventing pathological behavior. The patch
does not so much add something helpful as subtract something harmful.
You can contrive a case that has as much of that harmful element as
you like.

> The average latency is x2. What is the distribution of latencies?
> Occasional very long or all uniformly x2?

The latency is generally very even with the patch. There is a constant
hum of cleanup by the new mechanism in the case of the benchmark
workload. As opposed to a cascade of page splits, which occur in
clearly distinct correlated waves.

> I would guess that holding the page locks will also slow down SELECT
> workload, so I think you should also report that workload as well.
>
> Hopefully that will be better in the latest version.

But the same benchmark that you're asking about here has two SELECT
statements and only one UPDATE. It already is read-heavy in that
sense. And we see that the latency is also significantly improved for
the SELECT queries.

Even if there was often a performance hit rather than a benefit (which
is definitely not what we see), it would still probably be worth it.
Users create indexes for a reason. I believe that we are obligated to
maintain the indexes to a reasonable degree, and not just when it
happens to be convenient to do so in passing.

> I wonder whether we can put this work into a background process rather
> than pay the cost in the foreground? Perhaps that might not need us to
> hold page locks??

Holding a lock on the leaf page is unavoidable.

This patch is very effective because it intervenes at precisely the
right moment in precisely the right place only. We don't really have
to understand anything about workload characteristics to be sure of
this, because it's all based on the enormous asymmetries I've
described, which are so huge that it just seems impossible that
anything else could matter. Trying to do any work in a background
process works against this local-first, bottom-up laissez faire
strategy. The strength of the design is in how clever it isn't.

-- 
Peter Geoghegan

Re: Deleting older versions in unique indexes to avoid page splits

Reply via email to