Le 21/08/2019 à 13:44, Segher Boessenkool a écrit :
Hi!

On Wed, Aug 21, 2019 at 02:59:59PM +0530, Santosh Sivaraj wrote:
except for a couple of calls (1 or 2 nsec reduction), there are no
improvements in the call times. Or is 10 nsec the minimum granularity??

So I don't know if its even worth updating vdso64 except to keep vdso32 and
vdso64 equal.

Calls are cheap, in principle...  It is the LR stuff that can make it
slower on some cores, and a lot of calling sequence stuff may have
considerable overhead of course.

On an 8xx, a taken branch is 2 cycles and a non taken branch in 1 cycle (+ the refetch if that was not the anticipate branch).


+.macro get_datapage ptr, tmp
+       bcl     20,31,888f
+888:
+       mflr    \ptr
+       addi    \ptr, \ptr, __kernel_datapage_offset - 888b
+       lwz     \tmp, 0(\ptr)
+       add     \ptr, \tmp, \ptr
+.endm

(You can just write that as
        bcl 20,31,$+4
        mflr \ptr
etc.  Useless labels are useless :-) )

Nice trick. Will use that.


One thing you might want to do to improve performance is to do this without
the bcl etc., because you cannot really hide the LR latency of that.  But
that isn't very many ns either...  Superscalar helps, OoO helps, but it is
mostly just that >100MHz helps ;-)

Good idea. Did you have a look at my vdso32 similar patch ? https://patchwork.ozlabs.org/patch/1148274/

Do you have any idea on how to avoid that bcl/mflr stuff ?

Christophe

Reply via email to