On Sun, Jul 14, 2013 at 1:57 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Il 14/07/2013 04:53, Liu Ping Fan ha scritto: >> Refcnt's atomic inc/dec ops are frequent and its idiom need no seq_cst >> order. So to get better performance, it worth to adopt _relaxed >> other than _seq_cst memory model on them. >> >> We resort to gcc builtins. If gcc supports C11 memory model, __atomic_* >> buitlins is used, otherwise __sync_* builtins. >> >> Signed-off-by: Liu Ping Fan <pingf...@linux.vnet.ibm.com> > > No, not at all. :( > > First of all, I'd like to understand how you benchmarked this. For > inc/dec, relaxed vs. seq_cst has no effect except on PPC and ARM. And
Not based on benchmark. Just based on the code releated with barrier implement, since PPC and ARM can just lock cacheline without flushing write buffer. > if the refcount ops are frequent enough, I strongly suspect cacheline > bouncing has a bigger effect than the memory barriers. > When out of biglock, object_ref/unref to pin the Device will be quite often, and can it be marked "frequent"? Or how can we say something is frequent? > Second, it is wrong because you need a further read memory barrier when > you are removing the last reference > Oh, yes, the last one. > Third, it is making the API completely unorthogonal, and "tend to be > exceptions" is not a justification. > > The justification here could be, rather than the performance, having to > remember how to use atomic_fetch_dec in the unref side. I don't really > buy that, but if you really care, do something like > > #define atomic_ref(ptr, field) \ > __atomic_fetch_add(&((ptr)->field), 1, __ATOMIC_RELAXED) > #define atomic_unref(ptr, field, releasefn) ( \ > __atomic_fetch_add(&((ptr)->field), -1, __ATOMIC_RELEASE) == 1 \ > ? (__atomic_thread_fence(__ATOMIC_ACQUIRE), (releasefn)(ptr)) : false) > > i.e. define a new interface similar to kref_get/kref_put and, since you > are at it, optimize it. > Thanks, a abstract layer for refct is what I need. Regards Pingfan