Fwd: Re: [cxx-mem-model] compare_exchange implementation

Andrew MacLeod Wed, 19 Oct 2011 10:01:05 -0700

grr, reposting due to thunderbird apparently sneaking some HTML in andit being rejected... sigh.


On 10/19/2011 11:34 AM, Richard Henderson wrote:

!   weak = expand_expr (CALL_EXPR_ARG (exp, 3), NULL_RTX, ptr_mode,
!                     EXPAND_NORMAL);
!
!   if (weak != const0_rtx&&  weak != const1_rtx)
!     {
!       error ("strong/weak parameter must be 0 or 1 for 
%<__atomic_compare_exchange%>");
!       return NULL_RTX;
!     }

Really?  Are you even allowed to do that, like unto the variable memory model 
parameters?

well, they are explicitly 'compare_exchange_strong' and'compare_exchange_weak' calls... so yes, they have 'hardcoded' one orthe other :-)we could alternatively do 2 separate builtins, but I didn't see thepoint. But Im ambivalent.

+       /* Emit the compare_and_swap.  */
         create_output_operand (&ops[0], target, QImode);
         create_output_operand (&ops[1], mem, mode);
!       create_output_operand (&ops[2], current_val, mode);
!       create_convert_operand_to (&ops[3], expected_val, mode, true);
!       create_convert_operand_to (&ops[4], desired, mode, true);
!       create_integer_operand (&ops[5], success);
!       expand_insn (icode, 6, ops);

The mem operand must use create_fixed_operand, as with all of the other atomic 
builtins.
I strongly suggest swapping op 1 and op2, so that all of the "real" outputs 
come first.

oh, right, my error on the output operand...  I will shuffle.

I suggested on IRC that you examine the mode of the boolean output operand, and 
not
hard-code QImode.  Most targets will produce a word-sized boolean output that 
need not
be zero-extended.

I clearly misunderstood what you said then, I thought you said useQImode :-P. So, look at target and use that mode? target may not beset sometimes tho right? what do you use then? QImode?

!       emit_cmp_and_jump_insns (ops[0].value, const0_rtx, NE, const0_rtx,
!                              QImode, 1, true_label);
!
!       /* if not successful copy expected_val into *expected and issue the
!        failure fence.  */
!       emit_move_insn (gen_rtx_MEM (mode, expected), current_val);
!       expand_builtin_mem_thread_fence (failure);
!
!       /* If no success fence is required, we're done. Otherwise jump around 
the
!          code for TRUE and emitthe fence.  */
!       if (true_label != done_label)
!         {
!         emit_jump_insn (gen_jump (done_label));
!
!         emit_label (true_label);
!         if (success != MEMMODEL_RELAXED&&  success != MEMMODEL_RELEASE)
!           expand_builtin_mem_thread_fence (success);
!       }

Really?  We can do better than this.  In particular, by leaving it up to the 
target.
In the x86 case, cmpxchg is a full barrier.  Always.  No need for any followup 
barriers.

This is the tricky part. Actually, it is wrong but for a differentreason. I don't think the CAS needs a model. If I understand myprevious conversation with Lawrence, we need to know what mode thenative CAS is. If its seq-cst, then we don't need to do anything else.If it is relaxed though, then additional fences are required. Worstcase, if the native CAS is relaxed, the fences need to be issued as suchto implement


 compare_exchange (T* ptr, T* old, new, success, failure)

   memory_fence (success)         // release modes
   val = *old;
   res = relaxed_CAS (ptr, val, new)
   if (success)
      {
        memory_fence (success);   // acquire modes
        return true;
      }
   *old = res;
   memory_fence (failure)
   return false;

so leaving it up to the target for the CAS isnt enough.. we may need toissue one or more of these fences if the native CAS mode is weaker thanthe success and/or failure mode.


Or perhaps I misunderstood... Lawrence?

if instead it maps to

   val = *old;
   memory_fence (success)         // release modes
   res = relaxed_CAS (ptr, val, new)
   if (success_of_CAS)
      {
        memory_fence (success);   // acquire modes
        return true;
      }
   memory_fence (failure)
   *old = res;
   return false;

then the CAS could take care of issuing all the required fences withouta problem. I'd be very happy with that :-)

The ll/sc targets are going to generate code that looks like

        start-barrier
  restart:
        ll      curval, mem
        mov     t, 0
        cmp     curval, oldval
        bne     fail
        mov     t, newval
        sc      t, mem
        beq     t, restart      // strong-version only
  fail:
        end-barrier

   // outputs: t = 1/0 for success/failure
   //          curval = the current value of the memory, to be stored back into 
expected

where the position of fail: before or after the end-barrier depends on the 
relative
strength of the memory models.  No one is going to want to insert an extra jump
around a barrier, especially for a not-expected failure path.

We should encode the two memory model parameters and the weak parameter into a 
single
CONST_INT operand.  Probably with some macros in optabs.h to extract those 
parameters
in the targets.

So if the CAS can handle it all, why does it matter if the pattern has asingle "compressed" parameter for the 3 values, or whether it simplyexplicitly lists all three?

So you think its better not to have a seperate CAS_weak and CAS_strongpattern, but to unite them with a parameter? thats certainly easyenough...


Andrew

Fwd: Re: [cxx-mem-model] compare_exchange implementation

Reply via email to