Or maybe performance gets worse but not because of that one additional 
instruction/cycle in ring buffer enqueue and dequeue but because function or 
loop alignment changed for one or more functions.

When the benchmarking noise (possibly several % due to changes in code 
alignment) is bigger than the effect you are trying to measure (1 cycle per 
ring buffer enqueue/dequeue), benchmarking is not the right approach.

-- Ola

On 08/10/2018, 07:27, "Honnappa Nagarahalli" <honnappa.nagaraha...@arm.com> 
wrote:

    >     >
    >     > I doubt it is possible to benchmark with such a precision so to see 
the
    >     > potential difference of one ADD instruction.
    >     > Just changes in function alignment can affect performance by 
percents.
    > And
    >     > the natural variation when not using a 100% deterministic system is 
going
    > to
    >     > be a lot larger than one cycle per ring buffer operation.
    >     >
    >     > Some of the other patches are also for correctness (e.g. 
load-acquire of
    > tail)
    >     The discussion is about this patch alone. Other patches are already 
Acked.
    > So the benchmarking then makes zero sense.
    The whole point is to prove the effect of 1 instruction either way. IMO, it 
is simple enough, follow the memory model to the full extent. We have to keep 
in mind about other architectures as well. May be that additional instruction 
is not required on other architectures. 
    
    > 
    > 
    

Reply via email to