Re: Usefulness of GCC's 64bit __sync_* ops on ARM

Dave Martin Thu, 26 May 2011 04:01:30 -0700

On Wed, May 25, 2011 at 10:21:43AM -0400, Nicolas Pitre wrote:
> On Wed, 25 May 2011, Dave Martin wrote:
> 
> > On Tue, May 24, 2011 at 11:45:04PM -0400, Nicolas Pitre wrote:
> > > + *       typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,
> > > + *                                          const int64_t *newval,
> > > + *                                          volatile int64_t *ptr);
> > > + *       #define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0xffff0f60)
> > > + *
> > > + * Atomically store newval in *ptr if *ptr is equal to oldval for user 
> > > space.
> > > + * Return zero if *ptr was changed or non-zero if no exchange happened.
> > > + * The C flag is also set if *ptr was changed to allow for assembly
> > > + * optimization in the calling code.
> > 
> >  * Do not attempt to call this function unless __kernel_helper_version >= 5.
> >  *
> 
> Yep.  I will queue your other patch prior to this one and make this 
> blurb consistent with the rest.


Oh, OK.  I hadn't feedback on that yet, so I wasn't sure whether anyone
was picking it up.  I will repost to alkml, but I was waiting for the
merge window to close since this was just a tidyup rather than something
urgent.

This warning is more important for the new helper than for the existing
ones (where really we could omit it without much risk).

> 
> > > +kuser_cmpxchg64_fixup:
> > > + @ Called from kuser_cmpxchg_fixup.
> > > + @ r2 = address of interrupted insn (must be preserved).
> > > + @ sp = saved regs. r7 and r8 are clobbered.
> > > + @ 1b = first critical insn, 2b = last critical insn.
> > > + @ If r2 >= 1b and r2 <= 2b then saved pc_usr is set to 1b.
> > > + mov     r7, #0xffff0fff
> > > + sub     r7, r7, #(0xffff0fff - (0xffff0f60 + (1b - __kuser_cmpxchg64)))
> > > + subs    r8, r2, r7
> > > + rsbcss  r8, r8, #(2b - 1b)
> > > + strcs   r7, [sp, #S_PC]
> > > +#if __LINUX_ARM_ARCH__ < 6
> > > + b       kuser_cmpxchg32_fixup
> > 
> > Can we just have movcs pc,lr here, and put kuser_cmpxchg32_fixup
> > immediately after?
> > 
> > This would allow us to skip the branch, and the initial "mov r7" in
> > the kuser_cmpxchg32_fixup code.
> 
> The 'mov r7' is still needed unless the second instruction uses another 
> target register.

What I meant is that at the end of the above sequence, r7 = 1b.
So we can just fall through in to the 32-bit fixup code and add the
appropriate small offset to r7 so that it now points to the corresponding
1b label for __kuser_cmpxchg32, without needing to re-derive the address
using mov+sub.

It's a pretty minor tweak though.

> I thought about the possibility of "movcs pc, lr", especially since the 
> .text segments are simply concatenated and therefore the branch is 
> effectively branching to the very next instruction.  So there could be 
> like a common preamble, then a list of concatenated fixup segments (only 
> two of them now) and finally a postamble which would simply be "mov pc, lr".  
> This would all be put contigous at link time.  However I'm not sure yet 
> if this is worth optimizing that far since this code is far from being 
> hot, and clarity would also be affected.

Also, probably the number of fixups is never going to grow very large.

> 
> > There's a fair amount of duplicated logic from the 32-bit case.
> > Is it worth trying to merge the two?
> 
> The core logic spans 5 instructions.  Only 3 of them are actually the 
> same in both cases i.e. those with no references to 1b or 2b, and 
> they're perfectly interlaced in between the others.  Trying to make this 
> into common runtime code would result in even bigger code I'm afraid.  
> This could be made into common source code via a macro though.

Fair enough -- a macro might be worth a try _if_ it simplifies things
in the source, but I think you're right that merging the actual code
probably isn't worth it just to save a few words in the vectors page
(which eats up 4K regardless of what we put in it) and a few cycles per
fault (which already costs many, many cycles).

Cheers
---Dave

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

Reply via email to