On Fri, 2020-07-17 at 12:19 -0700, Andre McCurdy wrote:
> On Fri, Jul 17, 2020 at 3:09 AM Tanu Kaskinen <ta...@iki.fi> wrote:
> > On Thu, 2020-03-26 at 14:23 -0700, Andre McCurdy wrote:
> > > On Thu, Mar 26, 2020 at 1:26 PM Adrian Bunk <b...@stusta.de> wrote:
> > > > On Thu, Mar 26, 2020 at 12:53:08PM -0700, Andre McCurdy wrote:
> > > > > On Thu, Mar 26, 2020 at 12:16 PM Adrian Bunk <b...@stusta.de> wrote:
> > > > > > On Thu, Mar 26, 2020 at 05:26:29PM +0200, Stefan Ghinea wrote:
> > > > > > > ...
> > > > > > > When compiling for Thumb or Thumb2, frame pointers _must_ be 
> > > > > > > disabled
> > > > > > > since the Thumb frame pointer in r7
> > > > > > > ...
> > > > > > 
> > > > > > How are you reproducing the problem in pulseaudio?
> > > > > > 
> > > > > > This sounds like a workaround for a bug in musl that was fixed 2 
> > > > > > years ago.
> > > > > 
> > > > > The problem can show up anywhere that inline asm is trying to use r7.
> > > > > In this case it looks like:
> > > > > 
> > > > >   
> > > > > https://github.com/pulseaudio/pulseaudio/blob/master/src/pulsecore/remap_neon.c#L50
> > > > > ...
> > > > 
> > > > After looking at the pulseaudio code I suspected the patch description
> > > > claiming pulseaudio syscall code would be the problem was rubbish, and
> > > > that this NEON code was the problem.
> > > 
> > > Yes, the comment looks like it was copied and pasted and doesn't
> > > really apply in this case (since pulseaudio isn't making syscalls).
> > > That should be updated.
> > > 
> > > > But when I tried to reproduce the problem it built for me with both
> > > > glibc and musl in master (the patch didn't mention that this was a
> > > > musl-only problem).
> > > > 
> > > > Then I saw that this was fixed in musl upstream 2 years ago:
> > > > https://git.musl-libc.org/cgit/musl/commit/?id=e3c682ab5257aaa6739ef242a9676d897370e78e
> > > 
> > > Right, it's not related to musl or glibc. I suspect it can be
> > > reproduced by building for an ARM target which supports NEON, ensuring
> > > that DEFAULTTUNE doesn't forcefully disable Thumb (e.g. it should be
> > > armv7vethf-neon, not armv7vehf-neon), setting ARM_INSTRUCTION_SET to
> > > thumb and then compiling with frame pointers enabled (e.g. by adding
> > > -fno-omit-frame-pointer to CFLAGS).
> > > 
> > > In terms of a fix, then changing the code to use r12 instead of r7 is
> > > probably the best solution (assuming it works), but would need careful
> > > testing. Appending -fomit-frame-pointer to CFLAGS for ARM machines
> > > building for Thumb is safe and should fix the issue too. Presumably
> > > limiting the -fomit-frame-pointer workaround to ARM machines which
> > > support NEON building for Thumb would be an even more targeted
> > > solution.
> > 
> > I finally found time to test fixing the assembly code to use r12
> > instead of r7. Seems to work fine (I was first baffled by incorrect
> > behaviour, because I changed "{r4-r7}" to "{r4-r12}" without realizing
> > that "r4-r12" meant a range of all registers from r4 to r12).
> > 
> > Can you enlighten me: why did you choose r12 instead of r8? Why did the
> > original author use registers r4-r7 instead of r0-r3? Is it somehow
> > advisable to avoid registers r0-r3 and r8-r11? The code seems to work
> > fine with any set of registers, except r7.
> 
> The compiler will work around whichever set of registers you want to
> use (apart from r7 in some cases) so it's expected that other
> combinations will work fine. Some combinations will be more efficient
> than others (ie the compiler will need to do less shuffling values
> between registers or saving register values to the stack in order to
> make registers you specify available to you). Using r12 instead of r8
> is just an educated guess about the combination will allow the
> compiler to generate the most efficient code.
> 
> Registers r0-r3 and r12 can be used within a function without needing
> to preserve their previous contents, so if a function needs registers
> it's more efficient to use these registers first. Other registers need
> to be preserved (ie saved to the stack) before use.
> 
> Registers r0-r3 are also used to pass non-floating point arguments to
> a function, so if a function takes 4 or more non-floating point
> arguments, then r0-r3 will already contain values which the function
> will need to use.
> 
> Note that in this particular function, the first argument (ie the
> pointer m) is never actually used, so it may be that using r0, r4, r5
> and r12 will give the best result. The function is pretty trivial
> though so I guess with a recent compiler just writing the whole thing
> in C will give close to the optimal result too without all the
> maintenance issues.

Thanks for explaining!

-- 
Tanu

https://www.patreon.com/tanuk
https://liberapay.com/tanuk

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#140787): 
https://lists.openembedded.org/g/openembedded-core/message/140787
Mute This Topic: https://lists.openembedded.org/mt/72566057/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to