On Fri, 2020-07-17 at 12:19 -0700, Andre McCurdy wrote: > On Fri, Jul 17, 2020 at 3:09 AM Tanu Kaskinen <ta...@iki.fi> wrote: > > On Thu, 2020-03-26 at 14:23 -0700, Andre McCurdy wrote: > > > On Thu, Mar 26, 2020 at 1:26 PM Adrian Bunk <b...@stusta.de> wrote: > > > > On Thu, Mar 26, 2020 at 12:53:08PM -0700, Andre McCurdy wrote: > > > > > On Thu, Mar 26, 2020 at 12:16 PM Adrian Bunk <b...@stusta.de> wrote: > > > > > > On Thu, Mar 26, 2020 at 05:26:29PM +0200, Stefan Ghinea wrote: > > > > > > > ... > > > > > > > When compiling for Thumb or Thumb2, frame pointers _must_ be > > > > > > > disabled > > > > > > > since the Thumb frame pointer in r7 > > > > > > > ... > > > > > > > > > > > > How are you reproducing the problem in pulseaudio? > > > > > > > > > > > > This sounds like a workaround for a bug in musl that was fixed 2 > > > > > > years ago. > > > > > > > > > > The problem can show up anywhere that inline asm is trying to use r7. > > > > > In this case it looks like: > > > > > > > > > > > > > > > https://github.com/pulseaudio/pulseaudio/blob/master/src/pulsecore/remap_neon.c#L50 > > > > > ... > > > > > > > > After looking at the pulseaudio code I suspected the patch description > > > > claiming pulseaudio syscall code would be the problem was rubbish, and > > > > that this NEON code was the problem. > > > > > > Yes, the comment looks like it was copied and pasted and doesn't > > > really apply in this case (since pulseaudio isn't making syscalls). > > > That should be updated. > > > > > > > But when I tried to reproduce the problem it built for me with both > > > > glibc and musl in master (the patch didn't mention that this was a > > > > musl-only problem). > > > > > > > > Then I saw that this was fixed in musl upstream 2 years ago: > > > > https://git.musl-libc.org/cgit/musl/commit/?id=e3c682ab5257aaa6739ef242a9676d897370e78e > > > > > > Right, it's not related to musl or glibc. I suspect it can be > > > reproduced by building for an ARM target which supports NEON, ensuring > > > that DEFAULTTUNE doesn't forcefully disable Thumb (e.g. it should be > > > armv7vethf-neon, not armv7vehf-neon), setting ARM_INSTRUCTION_SET to > > > thumb and then compiling with frame pointers enabled (e.g. by adding > > > -fno-omit-frame-pointer to CFLAGS). > > > > > > In terms of a fix, then changing the code to use r12 instead of r7 is > > > probably the best solution (assuming it works), but would need careful > > > testing. Appending -fomit-frame-pointer to CFLAGS for ARM machines > > > building for Thumb is safe and should fix the issue too. Presumably > > > limiting the -fomit-frame-pointer workaround to ARM machines which > > > support NEON building for Thumb would be an even more targeted > > > solution. > > > > I finally found time to test fixing the assembly code to use r12 > > instead of r7. Seems to work fine (I was first baffled by incorrect > > behaviour, because I changed "{r4-r7}" to "{r4-r12}" without realizing > > that "r4-r12" meant a range of all registers from r4 to r12). > > > > Can you enlighten me: why did you choose r12 instead of r8? Why did the > > original author use registers r4-r7 instead of r0-r3? Is it somehow > > advisable to avoid registers r0-r3 and r8-r11? The code seems to work > > fine with any set of registers, except r7. > > The compiler will work around whichever set of registers you want to > use (apart from r7 in some cases) so it's expected that other > combinations will work fine. Some combinations will be more efficient > than others (ie the compiler will need to do less shuffling values > between registers or saving register values to the stack in order to > make registers you specify available to you). Using r12 instead of r8 > is just an educated guess about the combination will allow the > compiler to generate the most efficient code. > > Registers r0-r3 and r12 can be used within a function without needing > to preserve their previous contents, so if a function needs registers > it's more efficient to use these registers first. Other registers need > to be preserved (ie saved to the stack) before use. > > Registers r0-r3 are also used to pass non-floating point arguments to > a function, so if a function takes 4 or more non-floating point > arguments, then r0-r3 will already contain values which the function > will need to use. > > Note that in this particular function, the first argument (ie the > pointer m) is never actually used, so it may be that using r0, r4, r5 > and r12 will give the best result. The function is pretty trivial > though so I guess with a recent compiler just writing the whole thing > in C will give close to the optimal result too without all the > maintenance issues.
Thanks for explaining! -- Tanu https://www.patreon.com/tanuk https://liberapay.com/tanuk
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#140787): https://lists.openembedded.org/g/openembedded-core/message/140787 Mute This Topic: https://lists.openembedded.org/mt/72566057/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-