On Fri, Jul 19, 2019 at 12:10 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > We are doing exactly what you have written in the last line of the > next paragraph "stop the transaction from writing undo when the hash > table is already too full.". So we will > never face the problems related to repeated crash recovery. The > definition of too full is that we stop allowing the new transactions > that can write undo when we have the hash table already have entries > equivalent to (UndoRollbackHashTableSize() - MaxBackends). Does this > make sense?
Oops, I was looking in the wrong place. Yes, that makes sense, but: 1. It looks like the test you added to PrepareUndoInsert has no locking, and I don't see how that can be right. 2. It seems like this is would result in testing for each new undo insertion that gets prepared, whereas surely we would want to only test when first attaching to an undo log. If you've already attached to the undo log, there's no reason not to continue inserting into it, because doing so doesn't increase the number of transactions (or transaction-persistence level combinations) that need undo. 3. I don't think the test itself is correct. It can fire even when there's no problem. It is correct (or would be if it said 2 * MaxBackends) if every other backend in the system is already attached to an undo log (or two). But if they are not, it will block transactions from being started for no reason. For instance, suppose max_connections = 100 and there are another 100 slots for background rollbacks. Now suppose that the system crashes when 101 slots are in use -- 100 pushed into the background plus 1 that was aborted by the crash. On recovery, this test will refuse to let any new transaction start. Actually it is OK for up to 99 transactions to write undo, just not 100. Or, given that you have a slot per persistence level, it's OK to have up to 199 transaction-persistence-level combinations in flight, just not 200. And that is the difference between the system being unusable after the crash until a rollback succeeds and being almost fully usable immediately. > > I don't follow this. If you have a hash table where the key is XID, > > there is no need to delete and reinsert anything just because you > > discover that the XID has not only permanent undo but also unlogged > > undo, or something of that sort. > > The size of the total undo to be processed will be changed if we find > anyone (permanent or unlogged) later. Based on the size, the entry > location should be changed in size queue. OK, true. But that's not a significant cost, either in runtime or code complexity. I still don't really see any good reason to the hash table key be anything other than XID, or really, FXID. I mean, sure, the data structure manipulations are a little different, but not in any way that really matters. And it seems to me that there are some benefits, the biggest of which is that the system becomes easier for users to understand. We can simply say that there is a limit on the number of transactions that either (1) are in progress and have written undo or (2) have aborted and not all of the undo has been processed. If the key is XID + persistence level, then it's a limit on the number of transaction-and-persistence-level combinations, which I feel is not so easy to understand. In most but not all scenarios, it means that the limit is about double what you think the limit is, and as the mistake in the current version of the patch makes clear, even the people writing the code can forget about that factor of two. It affects a few other things, too. If you made the key XID and fixed problems (2) and (3) from above, then you'd have a situation where a transaction could fail at only one times: either it bombs the first time it tries to write undo, or it works. As it is, there is a second failure scenario: you do a bunch of work on permanent (or unlogged) tables and then try to write to an unlogged (or permanent) table and it fails because there are not enough slots. Is that the end of the world? No, certainly not. The situation should be rare. But if we have to fail transactions, it's best to fail them before they've started doing any work, because that minimizes the amount of work we waste by having to retry. Of course, a transaction that fails midway through when it tries to write at a second persistence level is also consuming an undo slot in a situation where we're short of undo slots. Another thing which Andres pointed out to me off-list is that we might want to have a function that takes a transaction ID as an argument and tells you the status of that transaction from the point of view of the undo machinery: does it have any undo, and if so how much? As you have it now, such a function would require searching the whole hash table, because the user won't be able to provide an UndoRecPtr to go with the XID. If the hash table key were in fact <XID, undo persistence level> rather than <XID, UndoRecPtr>, then you could it with two lookups; if it were XID alone, you could do it with one lookup. The difference between one lookup and two is not significant, but having to search the whole hash table is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company