On Mon, 2012-02-20 at 10:36 -0800, Richard Henderson wrote: > On 02/20/12 08:54, Torvald Riegel wrote: > > The following patches don't depend on each other but are all for the > > gl_wt TM method. > > > > patch1: > > For memory transfers from source regions with the read-for-write > > modifier, undo logging of these regions was missing. Also optimize the > > number of accesses to gtm_thr(). > > Out of curiosity, does an optional argument as in > > static void pre_write(const void *addr, size_t len, gtm_thread *tx = > gtm_thr()) > > work just as well as > > + static void pre_write(const void *addr, size_t len) > + { > + gtm_thread *tx = gtm_thr(); > + pre_write(addr, len, tx); > + } > > > + if (dst_mod == WaW || dst_mod == NONTXNAL) > > + tx = gtm_thr(); > > This sort of micro-optimization can't be good. If we are, for some reason, > not unifying the two loads, it seems to me that it'd be better to simply > unconditionally load the thread data here: > > > + gtm_thread *tx = 0; > > The actual bug fix, calling pre_write for RfW, is of course ok. > > > patch2: > > Similar to the ml_wt code, we don't need to use acq_rel but just acquire > > memory order when acquiring the orec. The following release fence is > > still necessary. > > Ok. > > > > > > patch3: > > Recently, I changed how gtm_rwlock handles upgrades to serial mode. > > This change was to fix privatization safety and basically consists of > > not modifying gtm_thread::shared_state until trycommit() and/or > > rollback() have finished after the upgrade (and before the switch to > > serialirr dispatch). With this in place, we can avoid the previous > > workarounds for the commit-in-serial-mode corner case. > > Ok. > > > > > > patch4: > > Privatization safety in gl_wt rollback was previously enforced with > > something that probably worked but for which the precise C++11 mem model > > reasoning was unclear. The patch fixes that (so it's clear why it works > > on the basis of the language's memory model, not on the HW models), and > > it less costly than the previously used seq_cst barrier. > > Ok. > > > > > patch5: > > Just put gl_wt's global orec on a separate cacheline, same as we do in > > ml_wt. > > > + atomic<gtm_word> orec __attribute__((aligned(HW_CACHELINE_SIZE))); > > + char tailpadding[HW_CACHELINE_SIZE - sizeof(atomic<gtm_word>)]; > > The "tailpadding" structure should not be required. Since an element of > the structure is aligned, the whole class will be aligned. And the > compiler automatically pads aligned structures so that arrays of them > work, and satisfy the alignment. > > Or is the orec overlapping with the vtable or something?
It is without the tail-padding, but currently the virtual functions are called very infrequently. Removed the tailpadding and added a comment instead. Committed with your changes applied.