On Fri, Jul 17, 2020 at 3:09 AM Tanu Kaskinen <ta...@iki.fi> wrote:
> On Thu, 2020-03-26 at 14:23 -0700, Andre McCurdy wrote:
> > On Thu, Mar 26, 2020 at 1:26 PM Adrian Bunk <b...@stusta.de> wrote:
> > > On Thu, Mar 26, 2020 at 12:53:08PM -0700, Andre McCurdy wrote:
> > > > On Thu, Mar 26, 2020 at 12:16 PM Adrian Bunk <b...@stusta.de> wrote:
> > > > > On Thu, Mar 26, 2020 at 05:26:29PM +0200, Stefan Ghinea wrote:
> > > > > > ...
> > > > > > When compiling for Thumb or Thumb2, frame pointers _must_ be 
> > > > > > disabled
> > > > > > since the Thumb frame pointer in r7
> > > > > > ...
> > > > >
> > > > > How are you reproducing the problem in pulseaudio?
> > > > >
> > > > > This sounds like a workaround for a bug in musl that was fixed 2 
> > > > > years ago.
> > > >
> > > > The problem can show up anywhere that inline asm is trying to use r7.
> > > > In this case it looks like:
> > > >
> > > >   
> > > > https://github.com/pulseaudio/pulseaudio/blob/master/src/pulsecore/remap_neon.c#L50
> > > > ...
> > >
> > > After looking at the pulseaudio code I suspected the patch description
> > > claiming pulseaudio syscall code would be the problem was rubbish, and
> > > that this NEON code was the problem.
> >
> > Yes, the comment looks like it was copied and pasted and doesn't
> > really apply in this case (since pulseaudio isn't making syscalls).
> > That should be updated.
> >
> > > But when I tried to reproduce the problem it built for me with both
> > > glibc and musl in master (the patch didn't mention that this was a
> > > musl-only problem).
> > >
> > > Then I saw that this was fixed in musl upstream 2 years ago:
> > > https://git.musl-libc.org/cgit/musl/commit/?id=e3c682ab5257aaa6739ef242a9676d897370e78e
> >
> > Right, it's not related to musl or glibc. I suspect it can be
> > reproduced by building for an ARM target which supports NEON, ensuring
> > that DEFAULTTUNE doesn't forcefully disable Thumb (e.g. it should be
> > armv7vethf-neon, not armv7vehf-neon), setting ARM_INSTRUCTION_SET to
> > thumb and then compiling with frame pointers enabled (e.g. by adding
> > -fno-omit-frame-pointer to CFLAGS).
> >
> > In terms of a fix, then changing the code to use r12 instead of r7 is
> > probably the best solution (assuming it works), but would need careful
> > testing. Appending -fomit-frame-pointer to CFLAGS for ARM machines
> > building for Thumb is safe and should fix the issue too. Presumably
> > limiting the -fomit-frame-pointer workaround to ARM machines which
> > support NEON building for Thumb would be an even more targeted
> > solution.
>
> I finally found time to test fixing the assembly code to use r12
> instead of r7. Seems to work fine (I was first baffled by incorrect
> behaviour, because I changed "{r4-r7}" to "{r4-r12}" without realizing
> that "r4-r12" meant a range of all registers from r4 to r12).
>
> Can you enlighten me: why did you choose r12 instead of r8? Why did the
> original author use registers r4-r7 instead of r0-r3? Is it somehow
> advisable to avoid registers r0-r3 and r8-r11? The code seems to work
> fine with any set of registers, except r7.

The compiler will work around whichever set of registers you want to
use (apart from r7 in some cases) so it's expected that other
combinations will work fine. Some combinations will be more efficient
than others (ie the compiler will need to do less shuffling values
between registers or saving register values to the stack in order to
make registers you specify available to you). Using r12 instead of r8
is just an educated guess about the combination will allow the
compiler to generate the most efficient code.

Registers r0-r3 and r12 can be used within a function without needing
to preserve their previous contents, so if a function needs registers
it's more efficient to use these registers first. Other registers need
to be preserved (ie saved to the stack) before use.

Registers r0-r3 are also used to pass non-floating point arguments to
a function, so if a function takes 4 or more non-floating point
arguments, then r0-r3 will already contain values which the function
will need to use.

Note that in this particular function, the first argument (ie the
pointer m) is never actually used, so it may be that using r0, r4, r5
and r12 will give the best result. The function is pretty trivial
though so I guess with a recent compiler just writing the whole thing
in C will give close to the optimal result too without all the
maintenance issues.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#140786): 
https://lists.openembedded.org/g/openembedded-core/message/140786
Mute This Topic: https://lists.openembedded.org/mt/72566057/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to