Hi Palmer, On Tue, Nov 14, 2017 at 12:30:59PM -0800, Palmer Dabbelt wrote: > On Tue, 24 Oct 2017 07:10:33 PDT (-0700), will.dea...@arm.com wrote: > >On Tue, Sep 26, 2017 at 06:56:31PM -0700, Palmer Dabbelt wrote: > >>+ATOMIC_OPS(add, add, +, i) > >>+ATOMIC_OPS(sub, add, +, -i) > >>+ATOMIC_OPS(and, and, &, i) > >>+ATOMIC_OPS( or, or, |, i) > >>+ATOMIC_OPS(xor, xor, ^, i) > > > >What is the point in the c_op parameter to these things? > > I guess there isn't one, it just lingered from a handful of refactorings. > It's used in some of the other functions if we need to do a C operation > after the atomic. How does this look? > > commit 5db229491a205ad0e1aa18287e3b342176c62d30 (HEAD -> staging-mm) > Author: Palmer Dabbelt <pal...@dabbelt.com> > Date: Tue Nov 14 11:35:37 2017 -0800 > > RISC-V: Remove c_op from ATOMIC_OP > > This was an unused macro parameter. > > Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com>
You can do the same thing for FETCH_OP, I think. > >>+ATOMIC_OPS(add, add, +, i, , _relaxed) > >>+ATOMIC_OPS(add, add, +, i, .aq , _acquire) > >>+ATOMIC_OPS(add, add, +, i, .rl , _release) > >>+ATOMIC_OPS(add, add, +, i, .aqrl, ) > > > >Have you checked that .aqrl is equivalent to "ordered", since there are > >interpretations where that isn't the case. Specifically: > > > >// all variables zero at start of time > >P0: > >WRITE_ONCE(x) = 1; > >atomic_add_return(y, 1); > >WRITE_ONCE(z) = 1; > > > >P1: > >READ_ONCE(z) // reads 1 > >smp_rmb(); > >READ_ONCE(x) // must not read 0 > > I haven't. We don't quite have a formal memory model specification yet. > I've added Daniel Lustig, who is creating that model. He should have a > better idea Thanks. You really do need to ensure that, as it's heavily relied upon. > >>+/* These barriers do not need to enforce ordering on devices, just memory. > >>*/ > >>+#define smp_mb() RISCV_FENCE(rw,rw) > >>+#define smp_rmb() RISCV_FENCE(r,r) > >>+#define smp_wmb() RISCV_FENCE(w,w) > >>+ > >>+/* > >>+ * These fences exist to enforce ordering around the relaxed AMOs. The > >>+ * documentation defines that > >>+ * " > >>+ * atomic_fetch_add(); > >>+ * is equivalent to: > >>+ * smp_mb__before_atomic(); > >>+ * atomic_fetch_add_relaxed(); > >>+ * smp_mb__after_atomic(); > >>+ * " > >>+ * So we emit full fences on both sides. > >>+ */ > >>+#define __smb_mb__before_atomic() smp_mb() > >>+#define __smb_mb__after_atomic() smp_mb() > > > >Now I'm confused, because you're also spitting out .aqrl for those afaict. > >Do you really need full barriers *and* .aqrl, or am I misunderstanding > >something here? > > Here's the section of atomic_t.txt that I was reading > > The barriers: > smp_mb__{before,after}_atomic() > only apply to the RMW ops and can be used to augment/upgrade the ordering > inherent to the used atomic op. These barriers provide a full smp_mb(). > These helper barriers exist because architectures have varying implicit > ordering on their SMP atomic primitives. For example our TSO architectures > provide full ordered atomics and these barriers are no-ops. > Thus: > atomic_fetch_add(); > is equivalent to: > smp_mb__before_atomic(); > atomic_fetch_add_relaxed(); > smp_mb__after_atomic(); > However the atomic_fetch_add() might be implemented more efficiently. > > so I think what we've got there is correct: the barriers go with > atomic_fetch_add_relaxed(), not atomic_fetch_add(). asm-generic/barrier.h > has > > #ifndef __smp_mb__before_atomic > #define __smp_mb__before_atomic() __smp_mb() > #endif > #ifndef __smp_mb__after_atomic > #define __smp_mb__after_atomic() __smp_mb() > #endif > > so I think we can just drop these entirely. How does this look? > > commit da682f7ee5d2af4aae7026ef40b5b5fb8e103938 > Author: Palmer Dabbelt <pal...@dabbelt.com> > Date: Tue Nov 14 11:49:42 2017 -0800 > > RISC-V: Remove __smp_bp__{before,after}_atomic > > These duplicate the asm-generic definitions are therefor aren't useful. > > Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com> Yes, that looks good to me. I was misunderstanding things, but you're right that you don't need to do anything special for this. > >>+ > >>+/* > >>+ * These barriers prevent accesses performed outside a spinlock from being > >>moved > >>+ * inside a spinlock. Since RISC-V sets the aq/rl bits on our spinlock > >>only > >>+ * enforce release consistency, we need full fences here. > >>+ */ > >>+#define smb_mb__before_spinlock() smp_mb() > > > >We killed this macro, so you don't need to define it. > > Thanks! > > commit 382a1f8b33a04fc0f991e062f70f4c65ca888bca > Author: Palmer Dabbelt <pal...@dabbelt.com> > Date: Tue Nov 14 11:50:37 2017 -0800 > > RISC-V: Remove smb_mb__{before,after}_spinlock() > > These are obselete. > > Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com> > > diff --git a/arch/riscv/include/asm/barrier.h > b/arch/riscv/include/asm/barrier.h > index 455ee16127fb..773c4e039cd7 100644 > --- a/arch/riscv/include/asm/barrier.h > +++ b/arch/riscv/include/asm/barrier.h > @@ -38,14 +38,6 @@ > #define smp_rmb() RISCV_FENCE(r,r) > #define smp_wmb() RISCV_FENCE(w,w) > > -/* > - * These barriers prevent accesses performed outside a spinlock from being > moved > - * inside a spinlock. Since RISC-V sets the aq/rl bits on our spinlock only > - * enforce release consistency, we need full fences here. > - */ > -#define smb_mb__before_spinlock() smp_mb() > -#define smb_mb__after_spinlock() smp_mb() You might still need the __after version, particularly if your lock operation has only acquire semantics. The __before version has been removed. > >>+#define __test_and_op_bit_ord(op, mod, nr, addr, ord) \ > >>+({ \ > >>+ unsigned long __res, __mask; \ > >>+ __mask = BIT_MASK(nr); \ > >>+ __asm__ __volatile__ ( \ > >>+ __AMO(op) #ord " %0, %2, %1" \ > >>+ : "=r" (__res), "+A" (addr[BIT_WORD(nr)]) \ > >>+ : "r" (mod(__mask)) \ > >>+ : "memory"); \ > >>+ ((__res & __mask) != 0); \ > >>+}) > > > >This looks broken to me -- the value-returning test bitops need to be fully > >ordered. > > Yep, you're right -- I just mis-read atomic_ops.rst (and re-misread it the > first time when I went to check again). I think this should do it > > commit 9951b6ed76bffb714517d81d9ffceb0eb1796229 > Author: Palmer Dabbelt <pal...@dabbelt.com> > Date: Tue Nov 14 12:06:06 2017 -0800 > > RISC-V: __test_and_op_bit_ord should be strongly ordered > > I mis-read the documentation. > > Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com> > > diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h > index 7c281ef1d583..f30daf26f08f 100644 > --- a/arch/riscv/include/asm/bitops.h > +++ b/arch/riscv/include/asm/bitops.h > @@ -67,7 +67,7 @@ > : "memory"); > > #define __test_and_op_bit(op, mod, nr, addr) \ > - __test_and_op_bit_ord(op, mod, nr, addr, ) > + __test_and_op_bit_ord(op, mod, nr, addr, .aqrl) Yes, modulo my earlier comment and litmus test. > >>+#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock) > >>+#define arch_spin_is_locked(x) ((x)->lock != 0) > > > >Missing READ_ONCE. > > Thanks! > > commit 64e80b0bf3898a88de09f4e12090b002b57efede > Author: Palmer Dabbelt <pal...@dabbelt.com> > Date: Tue Nov 14 12:18:49 2017 -0800 > > RISC-V: Add READ_ONCE in arch_spin_is_locked() > Signed-off-by: Palmer Dabbelt <pal...@dabbelt.com> Also looks good. > >>+static inline void arch_read_lock(arch_rwlock_t *lock) > >>+{ > >>+ int tmp; > >>+ > >>+ __asm__ __volatile__( > >>+ "1: lr.w %1, %0\n" > >>+ " bltz %1, 1b\n" > >>+ " addi %1, %1, 1\n" > >>+ " sc.w.aq %1, %1, %0\n" > >>+ " bnez %1, 1b\n" > >>+ : "+A" (lock->lock), "=&r" (tmp) > >>+ :: "memory"); > >>+} > >>+ > >>+static inline void arch_write_lock(arch_rwlock_t *lock) > >>+{ > >>+ int tmp; > >>+ > >>+ __asm__ __volatile__( > >>+ "1: lr.w %1, %0\n" > >>+ " bnez %1, 1b\n" > >>+ " li %1, -1\n" > >>+ " sc.w.aq %1, %1, %0\n" > >>+ " bnez %1, 1b\n" > >>+ : "+A" (lock->lock), "=&r" (tmp) > >>+ :: "memory"); > >>+} > > > >I think you have the same starvation issues as we have on arm64 here. I > >strongly recommend moving over to qrwlock :) I still strongly recommend this! > > > >>+#ifndef _ASM_RISCV_TLBFLUSH_H > >>+#define _ASM_RISCV_TLBFLUSH_H > >>+ > >>+#ifdef CONFIG_MMU > >>+ > >>+/* Flush entire local TLB */ > >>+static inline void local_flush_tlb_all(void) > >>+{ > >>+ __asm__ __volatile__ ("sfence.vma" : : : "memory"); > >>+} > >>+ > >>+/* Flush one page from local TLB */ > >>+static inline void local_flush_tlb_page(unsigned long addr) > >>+{ > >>+ __asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"); > >>+} > > > >Is this serialised against prior updates to the page table (set_pte) and > >also against subsequent instruction fetch? > > This is a store -> (load, store) fence. The ordering is between stores that > touch paging data structures and the implicit loads that come from any > memory access when paging is enabled. As far as I can tell, it does not > enforce any instruction fetch ordering constraints. That sounds pretty suspicious to me. You need to ensure that speculative instruction fetch doesn't fetch instructions from the old mapping. Also, store -> (load, store) fences don't generally order the page table walk, which is what you need to order here. Will