grr, reposting due to thunderbird apparently sneaking some HTML in and it being rejected... sigh.

On 10/19/2011 11:34 AM, Richard Henderson wrote:
!   weak = expand_expr (CALL_EXPR_ARG (exp, 3), NULL_RTX, ptr_mode,
!                     EXPAND_NORMAL);
!
!   if (weak != const0_rtx&&  weak != const1_rtx)
!     {
!       error ("strong/weak parameter must be 0 or 1 for 
%<__atomic_compare_exchange%>");
!       return NULL_RTX;
!     }
Really?  Are you even allowed to do that, like unto the variable memory model 
parameters?

well, they are explicitly 'compare_exchange_strong' and 'compare_exchange_weak' calls... so yes, they have 'hardcoded' one or the other :-) we could alternatively do 2 separate builtins, but I didn't see the point. But Im ambivalent.
+       /* Emit the compare_and_swap.  */
         create_output_operand (&ops[0], target, QImode);
         create_output_operand (&ops[1], mem, mode);
!       create_output_operand (&ops[2], current_val, mode);
!       create_convert_operand_to (&ops[3], expected_val, mode, true);
!       create_convert_operand_to (&ops[4], desired, mode, true);
!       create_integer_operand (&ops[5], success);
!       expand_insn (icode, 6, ops);
The mem operand must use create_fixed_operand, as with all of the other atomic 
builtins.
I strongly suggest swapping op 1 and op2, so that all of the "real" outputs 
come first.
oh, right, my error on the output operand...  I will shuffle.

I suggested on IRC that you examine the mode of the boolean output operand, and 
not
hard-code QImode.  Most targets will produce a word-sized boolean output that 
need not
be zero-extended.

I clearly misunderstood what you said then, I thought you said use QImode :-P. So, look at target and use that mode? target may not be set sometimes tho right? what do you use then? QImode?
!       emit_cmp_and_jump_insns (ops[0].value, const0_rtx, NE, const0_rtx,
!                              QImode, 1, true_label);
!
!       /* if not successful copy expected_val into *expected and issue the
!        failure fence.  */
!       emit_move_insn (gen_rtx_MEM (mode, expected), current_val);
!       expand_builtin_mem_thread_fence (failure);
!
!       /* If no success fence is required, we're done. Otherwise jump around 
the
!          code for TRUE and emitthe fence.  */
!       if (true_label != done_label)
!         {
!         emit_jump_insn (gen_jump (done_label));
!
!         emit_label (true_label);
!         if (success != MEMMODEL_RELAXED&&  success != MEMMODEL_RELEASE)
!           expand_builtin_mem_thread_fence (success);
!       }
Really?  We can do better than this.  In particular, by leaving it up to the 
target.
In the x86 case, cmpxchg is a full barrier.  Always.  No need for any followup 
barriers.

This is the tricky part. Actually, it is wrong but for a different reason. I don't think the CAS needs a model. If I understand my previous conversation with Lawrence, we need to know what mode the native CAS is. If its seq-cst, then we don't need to do anything else. If it is relaxed though, then additional fences are required. Worst case, if the native CAS is relaxed, the fences need to be issued as such to implement

 compare_exchange (T* ptr, T* old, new, success, failure)

   memory_fence (success)         // release modes
   val = *old;
   res = relaxed_CAS (ptr, val, new)
   if (success)
      {
        memory_fence (success);   // acquire modes
        return true;
      }
   *old = res;
   memory_fence (failure)
   return false;


so leaving it up to the target for the CAS isnt enough.. we may need to issue one or more of these fences if the native CAS mode is weaker than the success and/or failure mode.

Or perhaps I misunderstood... Lawrence?

if instead it maps to

   val = *old;
   memory_fence (success)         // release modes
   res = relaxed_CAS (ptr, val, new)
   if (success_of_CAS)
      {
        memory_fence (success);   // acquire modes
        return true;
      }
   memory_fence (failure)
   *old = res;
   return false;

then the CAS could take care of issuing all the required fences without a problem. I'd be very happy with that :-)

The ll/sc targets are going to generate code that looks like

        start-barrier
  restart:
        ll      curval, mem
        mov     t, 0
        cmp     curval, oldval
        bne     fail
        mov     t, newval
        sc      t, mem
        beq     t, restart      // strong-version only
  fail:
        end-barrier

   // outputs: t = 1/0 for success/failure
   //          curval = the current value of the memory, to be stored back into 
expected

where the position of fail: before or after the end-barrier depends on the 
relative
strength of the memory models.  No one is going to want to insert an extra jump
around a barrier, especially for a not-expected failure path.

We should encode the two memory model parameters and the weak parameter into a 
single
CONST_INT operand.  Probably with some macros in optabs.h to extract those 
parameters
in the targets.


So if the CAS can handle it all, why does it matter if the pattern has a single "compressed" parameter for the 3 values, or whether it simply explicitly lists all three?

So you think its better not to have a seperate CAS_weak and CAS_strong pattern, but to unite them with a parameter? thats certainly easy enough...

Andrew




Reply via email to