Re: libitm: fixes and optimizations in gl_wt

Torvald Riegel Mon, 20 Feb 2012 12:58:47 -0800

On Mon, 2012-02-20 at 10:36 -0800, Richard Henderson wrote:
> On 02/20/12 08:54, Torvald Riegel wrote:
> > The following patches don't depend on each other but are all for the
> > gl_wt TM method.
> > 
> > patch1:
> > For memory transfers from source regions with the read-for-write
> > modifier, undo logging of these regions was missing.  Also optimize the
> > number of accesses to gtm_thr().
> 
> Out of curiosity, does an optional argument as in
> 
>   static void pre_write(const void *addr, size_t len, gtm_thread *tx = 
> gtm_thr())
> 
> work just as well as
> 
> +  static void pre_write(const void *addr, size_t len)
> +  {
> +    gtm_thread *tx = gtm_thr();
> +    pre_write(addr, len, tx);
> +  }
> 
> > +   if (dst_mod == WaW || dst_mod == NONTXNAL)
> > +     tx = gtm_thr();
> 
> This sort of micro-optimization can't be good.  If we are, for some reason,
> not unifying the two loads, it seems to me that it'd be better to simply
> unconditionally load the thread data here:
> 
> > +    gtm_thread *tx = 0;
> 
> The actual bug fix, calling pre_write for RfW, is of course ok.
> 
> > patch2:
> > Similar to the ml_wt code, we don't need to use acq_rel but just acquire
> > memory order when acquiring the orec.  The following release fence is
> > still necessary.
> 
> Ok.
> 
> 
> > 
> > patch3:
> > Recently, I changed how gtm_rwlock handles upgrades to serial mode.
> > This change was to fix privatization safety and basically consists of
> > not modifying gtm_thread::shared_state until trycommit() and/or
> > rollback() have finished after the upgrade (and before the switch to
> > serialirr dispatch).  With this in place, we can avoid the previous
> > workarounds for the commit-in-serial-mode corner case.
> 
> Ok.
> 
> 
> > 
> > patch4:
> > Privatization safety in gl_wt rollback was previously enforced with
> > something that probably worked but for which the precise C++11 mem model
> > reasoning was unclear.  The patch fixes that (so it's clear why it works
> > on the basis of the language's memory model, not on the HW models), and
> > it less costly than the previously used seq_cst barrier.
> 
> Ok.
> 
> > 
> > patch5:
> > Just put gl_wt's global orec on a separate cacheline, same as we do in
> > ml_wt.
> 
> > +  atomic<gtm_word> orec __attribute__((aligned(HW_CACHELINE_SIZE)));
> > +  char tailpadding[HW_CACHELINE_SIZE - sizeof(atomic<gtm_word>)];
> 
> The "tailpadding" structure should not be required.  Since an element of
> the structure is aligned, the whole class will be aligned.  And the 
> compiler automatically pads aligned structures so that arrays of them
> work, and satisfy the alignment.
> 
> Or is the orec overlapping with the vtable or something?


It is without the tail-padding, but currently the virtual functions are
called very infrequently.  Removed the tailpadding and added a comment
instead.

Committed with your changes applied.

Re: libitm: fixes and optimizations in gl_wt

Reply via email to