On Sat, Nov 30, 2019 at 9:25 PM Michael Paquier <mich...@paquier.xyz> wrote: > On Thu, Feb 07, 2019 at 07:35:31AM +0530, Andres Freund wrote: > > It was JUST added ... :) thought I saw you reply on the other thread > > about it, but I was wrong... > > Six months later without any activity, I am marking this entry as > returned with feedback. The latest patch set does not apply anymore, > so having a rebase would be nice if submitted again.
Sounds fair, thanks. Actually, we've rewritten large amounts of this, but unfortunately not to the point where it's ready to post yet. If anyone wants to see the development in progress, see https://github.com/EnterpriseDB/zheap/commits/undo-record-set This is not really an EnterpriseDB project any more because Andres and Thomas decided to leave EnterpriseDB, but both expressed an intention to continue working on the project. So hopefully we'll get there. That being said, here's what the three of us are working towards: - Undo locations are identified by a 64-bit UndoRecPtr, which is very similar to a WAL LSN. However, each undo log (1TB of address space) has its own insertion point, so that many backends can insert simultaneously without contending on the insertion point. The code for this is by Thomas and is mostly the same as before. - To insert undo data, you create an UndoRecordSet, which has a record set header followed by any number of records. In the common case, an UndoRecordSet corresponds to the intersection of a transaction and a persistence level - that is, XID 12345 could have up to 3 UndoRecordSets, one for permanent undo, one for unlogged undo, and one for temporary undo. We might in the future have support for other kinds of UndoRecordSets, e.g. for multixact-like things that are associated with a group of transactions rather than just one. This code is new, by Thomas with contributions from me. - The records that get stored into an UndoRecordSet will be serialized from an in-memory representation and then deserialized when the data is read later. Andres is writing the code for this, but hasn't pushed it to the branch yet. The idea here is to allow a lot of flexibility about what gets stored, responding to criticisms of the earlier design from Heikki, while still being efficient about what we actually write on disk, since we know from testing that undo volume is a significant performance concern. - Each transaction that writes permanent or unlogged undo gets an UndoRequest, which tracks the fact that there is work to do if the transaction aborts. Undo can be applied either in the foreground right after abort or in the background. The latter case is necessary because crashes or FATAL errors can abort transactions, but the former case is important as a way of keeping the undo work from ballooning out of control in a workload where people just abort transactions nonstop; we have to slow things down so that we can keep up. This code is by me, based on a design sketch from Andres. Getting all of this working has been harder and slower than I'd hoped, but I think the new design fixes a lot of things that weren't right in earlier iterations, so I feel like we are at least headed in the right direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company