Re: [PATCH v9 05/12] RISC-V: Atomic and Locking Code

Will Deacon Wed, 15 Nov 2017 10:06:27 -0800

Hi Palmer,

On Tue, Nov 14, 2017 at 12:30:59PM -0800, Palmer Dabbelt wrote:
> On Tue, 24 Oct 2017 07:10:33 PDT (-0700), will.dea...@arm.com wrote:
> >On Tue, Sep 26, 2017 at 06:56:31PM -0700, Palmer Dabbelt wrote:
> >>+ATOMIC_OPS(add, add, +,  i)
> >>+ATOMIC_OPS(sub, add, +, -i)
> >>+ATOMIC_OPS(and, and, &,  i)
> >>+ATOMIC_OPS( or,  or, |,  i)
> >>+ATOMIC_OPS(xor, xor, ^,  i)
> >
> >What is the point in the c_op parameter to these things?
> 
> I guess there isn't one, it just lingered from a handful of refactorings.
> It's used in some of the other functions if we need to do a C operation
> after the atomic.  How does this look?
> 
> commit 5db229491a205ad0e1aa18287e3b342176c62d30 (HEAD -> staging-mm)
> Author: Palmer Dabbelt <pal...@dabbelt.com>
> Date:   Tue Nov 14 11:35:37 2017 -0800
> 
>    RISC-V: Remove c_op from ATOMIC_OP
> 
>    This was an unused macro parameter.
> 
>    Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com>


You can do the same thing for FETCH_OP, I think.

> >>+ATOMIC_OPS(add, add, +,  i,      , _relaxed)
> >>+ATOMIC_OPS(add, add, +,  i, .aq  , _acquire)
> >>+ATOMIC_OPS(add, add, +,  i, .rl  , _release)
> >>+ATOMIC_OPS(add, add, +,  i, .aqrl,         )
> >
> >Have you checked that .aqrl is equivalent to "ordered", since there are
> >interpretations where that isn't the case. Specifically:
> >
> >// all variables zero at start of time
> >P0:
> >WRITE_ONCE(x) = 1;
> >atomic_add_return(y, 1);
> >WRITE_ONCE(z) = 1;
> >
> >P1:
> >READ_ONCE(z) // reads 1
> >smp_rmb();
> >READ_ONCE(x) // must not read 0
> 
> I haven't.  We don't quite have a formal memory model specification yet.
> I've added Daniel Lustig, who is creating that model.  He should have a
> better idea

Thanks. You really do need to ensure that, as it's heavily relied upon.

> >>+/* These barriers do not need to enforce ordering on devices, just memory. 
> >>*/
> >>+#define smp_mb()   RISCV_FENCE(rw,rw)
> >>+#define smp_rmb()  RISCV_FENCE(r,r)
> >>+#define smp_wmb()  RISCV_FENCE(w,w)
> >>+
> >>+/*
> >>+ * These fences exist to enforce ordering around the relaxed AMOs.  The
> >>+ * documentation defines that
> >>+ * "
> >>+ *     atomic_fetch_add();
> >>+ *   is equivalent to:
> >>+ *     smp_mb__before_atomic();
> >>+ *     atomic_fetch_add_relaxed();
> >>+ *     smp_mb__after_atomic();
> >>+ * "
> >>+ * So we emit full fences on both sides.
> >>+ */
> >>+#define __smb_mb__before_atomic()  smp_mb()
> >>+#define __smb_mb__after_atomic()   smp_mb()
> >
> >Now I'm confused, because you're also spitting out .aqrl for those afaict.
> >Do you really need full barriers *and* .aqrl, or am I misunderstanding
> >something here?
> 
> Here's the section of atomic_t.txt that I was reading
> 
>    The barriers:
>      smp_mb__{before,after}_atomic()
>    only apply to the RMW ops and can be used to augment/upgrade the ordering
>    inherent to the used atomic op. These barriers provide a full smp_mb().
>    These helper barriers exist because architectures have varying implicit
>    ordering on their SMP atomic primitives. For example our TSO architectures
>    provide full ordered atomics and these barriers are no-ops.
>    Thus:
>      atomic_fetch_add();
>    is equivalent to:
>      smp_mb__before_atomic();
>      atomic_fetch_add_relaxed();
>      smp_mb__after_atomic();
>    However the atomic_fetch_add() might be implemented more efficiently.
> 
> so I think what we've got there is correct: the barriers go with
> atomic_fetch_add_relaxed(), not atomic_fetch_add().  asm-generic/barrier.h
> has
> 
>    #ifndef __smp_mb__before_atomic
>    #define __smp_mb__before_atomic()       __smp_mb()
>    #endif
>    #ifndef __smp_mb__after_atomic
>    #define __smp_mb__after_atomic()        __smp_mb()
>    #endif
> 
> so I think we can just drop these entirely.  How does this look?
> 
> commit da682f7ee5d2af4aae7026ef40b5b5fb8e103938
> Author: Palmer Dabbelt <pal...@dabbelt.com>
> Date:   Tue Nov 14 11:49:42 2017 -0800
> 
>    RISC-V: Remove __smp_bp__{before,after}_atomic
> 
>    These duplicate the asm-generic definitions are therefor aren't useful.
> 
>    Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com>

Yes, that looks good to me. I was misunderstanding things, but you're right
that you don't need to do anything special for this.

> >>+
> >>+/*
> >>+ * These barriers prevent accesses performed outside a spinlock from being 
> >>moved
> >>+ * inside a spinlock.  Since RISC-V sets the aq/rl bits on our spinlock 
> >>only
> >>+ * enforce release consistency, we need full fences here.
> >>+ */
> >>+#define smb_mb__before_spinlock()  smp_mb()
> >
> >We killed this macro, so you don't need to define it.
> 
> Thanks!
> 
> commit 382a1f8b33a04fc0f991e062f70f4c65ca888bca
> Author: Palmer Dabbelt <pal...@dabbelt.com>
> Date:   Tue Nov 14 11:50:37 2017 -0800
> 
>    RISC-V: Remove smb_mb__{before,after}_spinlock()
> 
>    These are obselete.
> 
>    Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com>
> 
> diff --git a/arch/riscv/include/asm/barrier.h 
> b/arch/riscv/include/asm/barrier.h
> index 455ee16127fb..773c4e039cd7 100644
> --- a/arch/riscv/include/asm/barrier.h
> +++ b/arch/riscv/include/asm/barrier.h
> @@ -38,14 +38,6 @@
> #define smp_rmb()     RISCV_FENCE(r,r)
> #define smp_wmb()     RISCV_FENCE(w,w)
> 
> -/*
> - * These barriers prevent accesses performed outside a spinlock from being 
> moved
> - * inside a spinlock.  Since RISC-V sets the aq/rl bits on our spinlock only
> - * enforce release consistency, we need full fences here.
> - */
> -#define smb_mb__before_spinlock()    smp_mb()
> -#define smb_mb__after_spinlock()     smp_mb()

You might still need the __after version, particularly if your lock
operation has only acquire semantics. The __before version has been removed.

> >>+#define __test_and_op_bit_ord(op, mod, nr, addr, ord)              \
> >>+({                                                         \
> >>+   unsigned long __res, __mask;                            \
> >>+   __mask = BIT_MASK(nr);                                  \
> >>+   __asm__ __volatile__ (                                  \
> >>+           __AMO(op) #ord " %0, %2, %1"                    \
> >>+           : "=r" (__res), "+A" (addr[BIT_WORD(nr)])       \
> >>+           : "r" (mod(__mask))                             \
> >>+           : "memory");                                    \
> >>+   ((__res & __mask) != 0);                                \
> >>+})
> >
> >This looks broken to me -- the value-returning test bitops need to be fully
> >ordered.
> 
> Yep, you're right -- I just mis-read atomic_ops.rst (and re-misread it the
> first time when I went to check again).  I think this should do it
> 
> commit 9951b6ed76bffb714517d81d9ffceb0eb1796229
> Author: Palmer Dabbelt <pal...@dabbelt.com>
> Date:   Tue Nov 14 12:06:06 2017 -0800
> 
>    RISC-V: __test_and_op_bit_ord should be strongly ordered
> 
>    I mis-read the documentation.
> 
>    Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com>
> 
> diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
> index 7c281ef1d583..f30daf26f08f 100644
> --- a/arch/riscv/include/asm/bitops.h
> +++ b/arch/riscv/include/asm/bitops.h
> @@ -67,7 +67,7 @@
>               : "memory");
> 
> #define __test_and_op_bit(op, mod, nr, addr)                  \
> -     __test_and_op_bit_ord(op, mod, nr, addr, )
> +     __test_and_op_bit_ord(op, mod, nr, addr, .aqrl)

Yes, modulo my earlier comment and litmus test.

> >>+#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
> >>+#define arch_spin_is_locked(x)     ((x)->lock != 0)
> >
> >Missing READ_ONCE.
> 
> Thanks!
> 
> commit 64e80b0bf3898a88de09f4e12090b002b57efede
> Author: Palmer Dabbelt <pal...@dabbelt.com>
> Date:   Tue Nov 14 12:18:49 2017 -0800
> 
>    RISC-V: Add READ_ONCE in arch_spin_is_locked()
>    Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com>

Also looks good.

> >>+static inline void arch_read_lock(arch_rwlock_t *lock)
> >>+{
> >>+   int tmp;
> >>+
> >>+   __asm__ __volatile__(
> >>+           "1:     lr.w    %1, %0\n"
> >>+           "       bltz    %1, 1b\n"
> >>+           "       addi    %1, %1, 1\n"
> >>+           "       sc.w.aq %1, %1, %0\n"
> >>+           "       bnez    %1, 1b\n"
> >>+           : "+A" (lock->lock), "=&r" (tmp)
> >>+           :: "memory");
> >>+}
> >>+
> >>+static inline void arch_write_lock(arch_rwlock_t *lock)
> >>+{
> >>+   int tmp;
> >>+
> >>+   __asm__ __volatile__(
> >>+           "1:     lr.w    %1, %0\n"
> >>+           "       bnez    %1, 1b\n"
> >>+           "       li      %1, -1\n"
> >>+           "       sc.w.aq %1, %1, %0\n"
> >>+           "       bnez    %1, 1b\n"
> >>+           : "+A" (lock->lock), "=&r" (tmp)
> >>+           :: "memory");
> >>+}
> >
> >I think you have the same starvation issues as we have on arm64 here. I
> >strongly recommend moving over to qrwlock :)

I still strongly recommend this!

> >
> >>+#ifndef _ASM_RISCV_TLBFLUSH_H
> >>+#define _ASM_RISCV_TLBFLUSH_H
> >>+
> >>+#ifdef CONFIG_MMU
> >>+
> >>+/* Flush entire local TLB */
> >>+static inline void local_flush_tlb_all(void)
> >>+{
> >>+   __asm__ __volatile__ ("sfence.vma" : : : "memory");
> >>+}
> >>+
> >>+/* Flush one page from local TLB */
> >>+static inline void local_flush_tlb_page(unsigned long addr)
> >>+{
> >>+   __asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
> >>+}
> >
> >Is this serialised against prior updates to the page table (set_pte) and
> >also against subsequent instruction fetch?
> 
> This is a store -> (load, store) fence.  The ordering is between stores that
> touch paging data structures and the implicit loads that come from any
> memory access when paging is enabled.  As far as I can tell, it does not
> enforce any instruction fetch ordering constraints.

That sounds pretty suspicious to me. You need to ensure that speculative
instruction fetch doesn't fetch instructions from the old mapping. Also,
store -> (load, store) fences don't generally order the page table walk,
which is what you need to order here.

Will

Re: [PATCH v9 05/12] RISC-V: Atomic and Locking Code

Reply via email to