Paul McCullagh <paul.mccull...@primebase.org> writes: >> In particular this, flushing the data log (is this flush to disk?):
>> if (!thread->st_dlog_buf.dlb_flush_log(TRUE, thread)) { >> ok = FALSE; >> status = XT_LOG_ENT_ABORT; >> } > > Yes, this is a flush to disk. > > This could be done in the slow part (obviously this would be ideal). It occured to me that since we only do this (the new commit_ordered() API call) after having successfully run prepare(), the data log will already have been flushed to disk, right? So I suppose in this case, the data log flush will be a no-operation. In which case it is no problem to leave it in the "fast" part, or we could skip calling it. > But there is the following problem that should then be fixed. > > If we write the transaction log (i.e. commit the transaction), even if > we do not flush the transaction log. It may be flushed by some other > thread later. This will make the commit durable (in other words, on > recovery, this transaction will be rolled forward). > > If we do not flush the data log, then there is a chance that such a > commit transaction is incomplete, because the associated data log data > has not been committed. > > The way to fix this problem is to check the extend of flushing of both > the data and the transaction log on recovery. Simply put, on recover > we check if the data log part of each record is completely flushed (is > within the flush zone of the data log). > > If a data log record is missing, then recovery stops at that point in > the transaction log. Yes, I see, thanks for the explanation. > This will have to be built into the engine. And, it is easiest to do > this in PBXT 1.5 which handle transaction logs and data logs > identically. Ok. Well, maybe it's not necessary as per above observation. >> and this, at the end concerning the "sweeper": >> >> if (db->db_sw_faster) >> xt_wakeup_sweeper(db); > > Yes, this could be taken out of the fast part, although it is not > called all that often. Ok, I will omit it. >> Also, this statement definitely needs to be postponed to the "slow" >> part I >> guess: >> >> thread->st_xact_data = NULL; > > Actually, I don't think so. As far as PBXT is concerned, after the > fast part has run, the transaction is committed. It is just not > durable. > > This means that anything we do in the slow part should not need an > explicit reference to the transaction. Right, I see what you mean, I will keep it. > The flush log position is always increasing. Critical is when we > switch logs, e.g. from log_id=100, log_offset=80000, to log_id=101, > log_offset=0. > > I believe when this is done, the log_offset is first set to zero, then > the log_id is incremented (should check this). > > This would mean that the comparing function would err on the side of > flushing unnecessarily if the check comes between the to operations. Yes, that should work. However, you need a write memory barrier when you update the position, and a read memory barrier when you read it: xl_flush_log_offset = 0; wmb(); xl_flush_log_id = old_id + 1; ... local_id = xl_flush_log_id; rmb(); local_offset = xl_flush_log_offset; Without this, the CPU may do the reads or writes in the opposite order (or more likely, the newest optimisations in GCC will do it). > I would actually recommend a "lazy" approach to the implementation. > > Simply add a boolean to the current commit, which indicates a fast > commit should be done. > > Then we add a new "slow commit" function which does the parts not done > by the slow commit. Ok, thanks a lot for the advice, I will give it another shot. - Kristian. _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp