On Mon, 2007-02-26 at 23:04 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > COMMIT NOWAIT can co-exist with the normal form of COMMIT and does not > > threaten the consistency or robustness of other COMMIT modes. Read that > > again and think about it, before we go further, please. > > I read that, and thought about it, and don't think I believe it.
I don't mind saying its taken a while to believe it myself. And I very much want to have the concept tested so we all believe it. Doubt=>risk. > The > problem is that there are more interconnections between different > transactions than you're allowing for. In particular you need to > justify that the behavior is safe for non-transactional operations like > btree page splits and pg_clog buffer page writes. The idea that's > particularly bothering me at the moment is that after a system crash, > we might come back up in a state where a NOWAIT transaction appears > committed when its updates didn't all get to disk. "Database corrupt" > is a situation that threatens all your transactions... OK, I can see its possible for the following WAL history to occur: t1 btree split part1 t2 COMMIT; <---------- risk of inconsistent WAL t1 btree split part2 The COMMIT by t2 flushes WAL, so if a crash happens at the point shown we have an incomplete multi-part WAL operation. At this stage, lets assume that no data writes by either t1 or t2 have made it to disk, which is normal and likely. On recovery, we apply all the changes in WAL up to the end of WAL. At the last minute t2 sneaks in and gets committed. t1 was never committed, so we never found out whether it would be a COMMIT NOWAIT or a COMMIT. *Whichever* it will be the btree split is still incomplete and recovery understands this and acts accordingly. So the situation that sounds nasty is actually a normal situation currently, so presents no threat because we already handle this correctly. Multi-part operations seem good to me. > The idea that's > particularly bothering me at the moment is that after a system crash, > we might come back up in a state where a NOWAIT transaction appears > committed when its updates didn't all get to disk. "Database corrupt" > is a situation that threatens all your transactions... Well, I've spent 2 hours thinking this through, gone pale, thought hard and gone down a few blind alleys along the way. The bottom line is that COMMIT NOWAIT isn't fully safe, and thats part of the explicit non-full guarantee, written in big letters on the tin. If a transaction commits and then we crash before we flush WAL, then the transaction will be incomplete. As we define COMMIT now, this is broken and I use that word accurately: If you use COMMIT NOWAIT, you risk data loss *but* you have the choice of which transactions this applies to. However, WAL *is* consistent and there is no danger of database corruption. Simply put, this is not group commit. So if you issue COMMIT NOWAIT and then crash, the transactions that were marked committed need to be marked aborted in clog. So during recovery we will need to keep track of which transactions are in progress, so we can mark them explicitly aborted, rather than the current implicit mechanism. This would need to be done carefully, since a clog page that was created at transaction start may never have made it to disk by the time of the crash. We must extend clog in that case, even if the extension WAL never made it to WAL, either, so that we can record the aborted state of the Xids that exist somewhere on disk. Keeping track of transactions in progress won't take long. It isn't required at all during archive recovery, but that might form the basis for later use as a Snapshot creation mechanism for read-only access during startup. It isn't possible for a normal non-readonly transaction to change data that has been changed by a commit-nowait transaction, and then have the normal transaction commit, yet without also implicitly committing the commit-nowait transaction. Hopefully, I've got that right? > > New commit mode is available by explicit command, or as a default > > setting that will be applied to all COMMITs, or both. > > I dislike introducing new nonstandard syntax ("Oracle compatible" is not > standard). If we did this I'd vote for control via a GUC setting only; > I think that is more useful anyway, as an application can be made to run > with such a setting without invasive source code changes. OK. Having read through all of the above things again, ISTM that we should make this functionality available by a new GUC commit_fsync_delay, which must be set explicitly > 0 before this feature can be used at all. If I confused Tom by using commit_delay, then I'll confuse others also and group commit and deferred fsync are different techniques with different robustness guarantees. When enabled it should have a clear message in the log to show that some commits might be using commit_nowait. I'd even welcome a more descriptive term that summed up the relaxed transaction guarantee implied by the use of the deferred fsync technique. Perhaps even a very explicit USERSET GUC: transaction_guarantee = on (default) | off -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq