Re: Undo logs

Robert Haas Mon, 02 Dec 2019 07:10:37 -0800

On Sat, Nov 30, 2019 at 9:25 PM Michael Paquier <mich...@paquier.xyz> wrote:
> On Thu, Feb 07, 2019 at 07:35:31AM +0530, Andres Freund wrote:
> > It was JUST added ... :) thought I saw you reply on the other thread
> > about it, but I was wrong...
>
> Six months later without any activity, I am marking this entry as
> returned with feedback.  The latest patch set does not apply anymore,
> so having a rebase would be nice if submitted again.


Sounds fair, thanks.  Actually, we've rewritten large amounts of this,
but unfortunately not to the point where it's ready to post yet. If
anyone wants to see the development in progress, see
https://github.com/EnterpriseDB/zheap/commits/undo-record-set

This is not really an EnterpriseDB project any more because Andres and
Thomas decided to leave EnterpriseDB, but both expressed an intention
to continue working on the project. So hopefully we'll get there. That
being said, here's what the three of us are working towards:

- Undo locations are identified by a 64-bit UndoRecPtr, which is very
similar to a WAL LSN. However, each undo log (1TB of address space)
has its own insertion point, so that many backends can insert
simultaneously without contending on the insertion point. The code for
this is by Thomas and is mostly the same as before.

- To insert undo data, you create an UndoRecordSet, which has a record
set header followed by any number of records. In the common case, an
UndoRecordSet corresponds to the intersection of a transaction and a
persistence level - that is, XID 12345 could have up to 3
UndoRecordSets, one for permanent undo, one for unlogged undo, and one
for temporary undo. We might in the future have support for other
kinds of UndoRecordSets, e.g. for multixact-like things that are
associated with a group of transactions rather than just one. This
code is new, by Thomas with contributions from me.

- The records that get stored into an UndoRecordSet will be serialized
from an in-memory representation and then deserialized when the data
is read later. Andres is writing the code for this, but hasn't pushed
it to the branch yet. The idea here is to allow a lot of flexibility
about what gets stored, responding to criticisms of the earlier design
from Heikki, while still being efficient about what we actually write
on disk, since we know from testing that undo volume is a significant
performance concern.

- Each transaction that writes permanent or unlogged undo gets an
UndoRequest, which tracks the fact that there is work to do if the
transaction aborts. Undo can be applied either in the foreground right
after abort or in the background. The latter case is necessary because
crashes or FATAL errors can abort transactions, but the former case is
important as a way of keeping the undo work from ballooning out of
control in a workload where people just abort transactions nonstop; we
have to slow things down so that we can keep up. This code is by me,
based on a design sketch from Andres.

Getting all of this working has been harder and slower than I'd hoped,
but I think the new design fixes a lot of things that weren't right in
earlier iterations, so I feel like we are at least headed in the right
direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Undo logs

Reply via email to