> > Torn pages (partial page write) are still a problem.
I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the below. I think in this way, we can always gaurantee its correctness and can always improve it. To Use It ---------- A "BEGIN TRANSACTION MINIMAL XLOG/END" block is a speicial "BEGIN/END" transaction block. It tries to avoid unnessary xlogs but still perserves transaction semantics. It is good for the situation that the user wants to do a big data load. It is issued like this: 1. BEGIN TRANSACTION MINIMAL XLOG 2. ... /* statements */ 3. END; >From user's view, it is almost the same as ordinary transaction: if everything run smoothly from step 1 to 3, the transaction will be made durable. If any step failed (including ABORT, transaction error, system crash), it looks like nothing happened. To make life easier, no subtransactions is allowed. To Implement It ---------------- At step 1, we will disallow some operations, including vacuum, PITR. At step 2, only minimal xlog entries are logged. If anything inside failed, handle it like ordinary transaction. At step 3, we issue a checkpoint, then mark the transaction commited. If step 8 itself failed, handle it like ordinary transaction. The correctness is easy: if we let "minimal xlog" equal to "all xlog", then it is exactly the same as an ordinary transaction plus a checkpoint inside the transaction block. Based on the above proof, we can have the following implementation steps: 1. We first make the framework without revising any XLogInsert() - thus the implementation is correct; 2. Examine each XLogInsert() and differenciate the content under MINIAML XLOG is set or not. The importance of the above steps is that it implies that there is no need to completely pick up what are the MINIAL XLOG content are, we can do them gradually in a non-invasive way. Minimal Xlog ------------- The xlog of failed transaction is not totally useless since later transaction may reply on something it creates - for example, a new page and its links of a btree. We have to pick up these xlogs. RM_HEAP_ID: The problem of heap is torn page prevention. We currently copy the whole page into xlog if it is the first time touched after a checkpoint. So we can always have this copy to replace the data file page which might be torn written. I didn't come up with any good method to handle it so far, so we keep this. (We can possibly avoid copy a P_NEW page, that's another story though). So what we can avoid xlog at least include the insert/update/delete happened on a page that's no need to be copied, which will give us a 50% xlog volumn/contention reduction I think. RM_BTREE_ID/RM_HASH_ID/RM_GIST_ID: For index, things get more complex. We need the xlogs to maintain the structure of the btree index, like the pointers, high key etc, but the content is not necessarily needed. Need more research here. RM_XLOG_ID/RM_XACT_ID/RM_SMGR_ID/RM_CLOG_ID/RM_DBASE_ID/RM_TBLSPC_ID/RM_MULTIXACT_ID/RM_SEQ_ID: It is hard to avoid much here, but they are not the important volume contribution of xlogs. Regards, Qingqing ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly