On Fri, Jul 19, 2019 at 12:52 PM Robert Haas <robertmh...@gmail.com> wrote: > Right, that's definitely a big part of the concern here, but I don't > really believe that retaining locks is absolutely required, or even > necessarily desirable. For instance, suppose that I create a table, > bulk-load a whole lotta data into it, and then abort. Further support > that by the time we start trying to process the undo in the > background, we can't get the lock. Well, that probably means somebody > is performing DDL on the table.
I believe that the primary reason why certain other database systems retain locks until rollback completes (or release their locks in reverse order, as UNDO processing progresses) is that application code will often repeat exactly the same actions on receiving a transient error, until the action finally completes successfully. Just like with serialization failures, or with manually implemented UPSERT loops that must sometimes retry. This is why UNDO is often (or always) processed synchronously, blocking progress of the client connection as its xact rolls back. Obviously these other systems could easily hand off the work of rolling back the transaction to an asynchronous worker process, and return success to the client that encounters an error (or asks to abort/roll back) almost immediately. I have to imagine that they haven't implemented this straightforward optimization because it makes sense that the cost of rolling back the transaction is primarily borne by the client that actually rolls back. And, as I said, because a lot of application code will immediately retry on failure, which needs to not deadlock with an asynchronous roll back process. > If they just did LOCK TABLE or ALTER > TABLE SET STATISTICS, we are going to need to execute that same undo > once the DDL is complete. However, if the DDL is DROP TABLE, we're > going to find that once we can get the lock, the undo is obsolete, and > we don't need to worry about it any more. Had we made it 100% certain > that the DROP TABLE couldn't go through until the undo was performed, > we could avoid having to worry about the undo having become obsolete > ... but that's hardly a win. We're better off allowing the drop and > then just chucking the undo. I'm sure that there are cases like that. And, I'm pretty sure that at least one of the other database systems that I'm thinking of isn't as naive as I suggest, without being sure of the specifics. The classic approach is to retain the locks, even though that sucks in some cases. That doesn't mean that you have to do it that way, but it's probably a good idea to present your design in a way that compares and contrasts with the classic approach. I'm pretty sure that this is related to the way in which other systems retain coarse-grained locks when bitmap indexes are used, even though that makes them totally unusable with OLTP apps. It seems like it would help users a lot if their bitmap indexes didn't come with that problem, but it's a price that they continue to have to pay. > Point being - there's at least some chance that the operations which > block forward progress also represent progress of another sort. That's good, provided that there isn't observable lock starvation. I don't think that you need to eliminate the theoretical risk of lock starvation. It deserves careful, ongoing consideration, though. It's difficult to codify exactly what I have in mind, but I can give you an informal definition now: It's probably okay if there is the occasional implementation-level deadlock because the user got unlucky once. However, it's not okay for there to be *continual* deadlocks because the user got unlucky just once. Even if the user had *extraordinarily* bad luck that one time. In short, my sense is that it's never okay for the system as a whole to "get stuck" in a deadlock or livelock loop. Actually, it might even be okay if somebody had a test case that exhibits "getting stuck" behavior, provided the test case is very delicate, and looks truly adversarial (i.e. it goes beyond being extraordinarily unlucky). I know that this is all pretty hand-wavy, and I don't expect you to have a definitive response. These are some high level concerns that I have, that may or may not apply to what you're trying to do. -- Peter Geoghegan