[FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC

2022-07-10 Thread Triang3l
Android, starting from version 6 (API level 23, from the year 2015), 
requires all shared libraries to be position-independent, and refuses to 
load shared libraries (which are the most common native binary type on 
Android as in Android applications, native code is placed in shared 
libraries accessed via the Java Native Interface) containing dynamic 
relocations.


To support PIC, all AArch64 assembly code in FFmpeg uses the `movrel` 
macro to obtain addresses of labels, such as lookup table constants, in 
a way that with CONFIG_PIC being 1, PC-relative addresses of labels are 
computed via the `adrp` instruction. This approach, however, is suitable 
only for labels defined in the same object file. For `adrp` to work 
directly between object files, the linker has to perform a text 
relocation. And in my scenario (libavcodec being a static library linked 
to a shared library, though I'm not sure if this is relevant), this 
resulted in the following LLVM linker errors, making my application not 
buildable for Android:


"relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 
ff_cos_32; recompile with -fPIC"
"can't create dynamic relocation R_AARCH64_ADD_ABS_LO12_NC against 
symbol: ff_cos_32 in readonly segment; recompile object files with -fPIC 
or pass '-Wl,-z,notext' to allow text relocations in the output"


This commit brings the solution that is already implemented in FFmpeg on 
AArch32 to AArch64 - a separate macro, `movrelx`, which emits 
instructions for computing the address of an external label through the 
Global Object Table (GOT).


The same targets as `movrel` is implemented for are covered by this 
commit. For Apple targets, the instruction sequence is the one that is 
generated for referencing an extern variable in C code by Clang for the 
`aarch64-apple-darwin` target triple. For other targets (Linux), this is 
the sequence emitted by Clang for the `aarch64-unknown-linux-gnu` 
target, by GCC, and specified in the "Assembly expressions" section of 
the Arm Compiler Reference Guide. The hardware-assisted AddressSanitizer 
has no effect on the sequence - Clang generates the same `:got:` and 
`:got_lo12:` addressing regardless of whether `-fsanitize=hwaddress` is 
used. Windows has no concept of PIC, and Windows builds should be done 
with CONFIG_PIC being 0, so it doesn't need to be handled.


The literal offset, however, is always applied using a separate `add` or 
`sub` instruction as the actual address is loaded indirectly for an 
extern object that is the whole lookup table itself. The only place 
where the offset is currently used with `movrelx` is VP9, with the 
offset being 256 or 512 bytes there. Unfortunately, that offset can't be 
moved to the positive immediate offset encoded in load/store 
instructions there without major restructuring, as the actual memory 
accesses are performed in a function that is common to different offset 
values, with the offset being pre-applied to one of its arguments 
instead. Without PIC though, `movrelx` is implemented exactly the same 
as `movrel`, with the offset applied directly to the `ldr` literal, so 
the non-PIC path is unaffected by this change.


Testing was performed on my local build setup for Android based on 
ndk-build. Two things were tested:


- Regression testing was done using the `libavcodec/tests/fft.c` test. 
`libavcodec` was built as a static library, and the test was built as a 
native executable (which, unlike shared libraries, isn't required to be 
position-independent). Both the executable without the changes and the 
executable with the new code were launched on a physical AArch64 device 
using Termux. As the length of the instruction sequences for `movrel` 
and `movrelx` without the offset is the same, comparing the two binaries 
in a diff tool has shown the expected 13 differences in the code - 12 in 
`fftN_neon` for different transform sizes, and 1 in `ff_fft_calc_neon`. 
The results for the FFT test were the same for both executables with 
different transform size values.


- To check sufficiency and suitability for fixing the original issue, 
the `fft.c` test was converted into a shared library (with the `main` 
function renamed), and a proxy executable performing `dlopen` of the 
library and invoking the main test function from it via `dlsym`. Termux 
is built with `targetSdkVersion` 28, so the `dlopen` rule of Android API 
levels 23 and above disallowing dynamic relocations should apply to it. 
The testing device is running Android 11 (API level 30). The test was 
executed successfully, meaning that no relocations incompatible with PIC 
are required by libavcodec anymore.


Signed-off-by: Triang3l 
---
 libavcodec/aarch64/fft_neon.S |  4 ++--
 libavcodec/aarch64/sbrdsp_neon.S  |  2 +-
 libavcodec/aarch64/vp9mc_16bpp_neon.S |  4 ++--
 libavcodec/aarch64/vp9mc_neon.S   |  4 ++--
 libavutil/aarch64/asm.S   | 19 ++

Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC

2022-07-10 Thread Triang3l
Hi Martin, thanks for a quick and detailed review!Oops, Thunderbird (or 
something else in the chain) has added an extraneous space in the beginning of 
every line, though the +- deltas seem to be unaffected. I can try sending the 
patch some other way if needed, like as a binary attachment, or via git 
send-email, but possibly the "^ " (without quotes) per-line regular expression 
should be fine for cleaning it up as well. This is my first patch submission to 
this repository, and overall to a mailing list, so I'm just starting learning 
the best practices… hi everyone!!!Yes, the original issue was likely a 
consequence of us using a completely different build process than FFmpeg is 
designed to be built with. We have our own Premake 5 scripts for FFmpeg 
components, that are used for generation of Android ndk-build files via my 
Premake generator, Triang3l/premake-androidndk on GitHub. So yes, we don't have 
any link options imported from a file.I agree, it was very suprising to see 
that dynamic relocations were done between object files within a single shared 
library at all. I suspected ASLR at first, but I don't know if it can randomize 
the locations of individual object files this way, that's probably not the case 
though. I'll see what can be done to reproduce the effect of libavcodec.v — 
since my shared library is the final application logic, not a library that's 
actually shareable, none of the FFmpeg objects need to be exported from it at 
all. However, static libraries barely have anything to configure overall as far 
as I know, so disabling exports specifically for FFmpeg may be complicated — 
but thankfully, we can (and even should, to reduce the file size) use 
-fvisibility=hidden globally in our application, if that helps fix the issue. 
-Wl,-Bsymbolic should be fine too, and possibly even less intrusive.If using 
__attribute__((visibility("hidden"))) for the lookup tables prevents dynamic 
relocations from being inserted, and if that doesn't break common usages of 
libavcodec, yes, it should be a way better solution than introducing an 
additional memory load at runtime. I'll check if that works for us tomorrow or 
in a couple of days.If we're able to avoid using the global object table this 
way though, maybe it could be desirable to also get rid of `movrelx` in the 
AArch32 code as well?By the way, I'm also super confused by how the offset is 
applied in the local `movrel` currently, it looks very inconsistent. The `adrp` 
and `add` combination, as I understand its operation, should work for any 
32-bit literal value, not specifically for addresses of known objects — `adrp` 
accepting the upper 20 bits as a literal and adding them to the PC, and then 
`add` just adding the lower 12 bits, the offset within the page, also taken as 
a literal.While `movrelx` most likely requires the offset to be applied 
explicitly using 0…4095 `add` or `sub` afterwards because it's first loaded 
from a lookup table of addresses of, if I understand correctly, actual objects 
in the code (I wouldn't expect it to be filled with object+8, object+16, 
object+24 pointers in case of a large object for loading time reasons, though I 
haven't researched this topic in detail), if everything `movrel` does is adding 
the PC to the input literal… do we even need to emit `sub` for negative offsets 
in it?The current Linux implementation of `movrel` just adds the offset 
directly to the label using + regardless of whether it's positive or negative, 
and there already are instances of negative offsets in the code 
(`subpel_filters` minus 16 in vp8dsp_neon.S). The AArch32 version of this code 
also doesn't use an offset parameter, rather, passing `subpel_filters-16` 
directly as the label argument.However, the Apple AArch64 implementation does 
have two paths for applying the offset — directly to the literal if it's 
positive, but via a `sub` instruction for a negative one. This is also true for 
the Windows implementation — whose existence overall is questionable, as 
Windows DLLs use a different relocation method, and PIC doesn't apply to them 
at all if I understand correctly; and also the `movrel` implementation for 
Windows with a non-negative offset is identical to the Linux version, minus the 
hardware-assisted ASan path on Linux.I'll likely play with the assembler later 
to see how negative offsets are interpreted, but still, is there a reason to 
emit the subtraction instruction that you can remember, or would it be safe to 
possibly even remove the offset argument completely?For `movrelx`, however, if 
it's needed at all, the offset argument is desirable as without PIC, it will be 
applied to the label literal for free.Thank you!— Tri
On Mon, 11 Jul 2022, Martin Storsjö wrote:> Hi,>> Thanks for your patch! While 
the patch probably is worthwhile to pursue, > ffmpeg does work on Andro

Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC

2022-07-10 Thread Triang3l

oh no… I have no idea what's happening, all the newlines in the previous 
email were destroyed, sorry ugggh… let's try without disabling rich text in the 
Samsung client this time :/Hi Martin, thanks for a quick and detailed 
review!Oops, Thunderbird (or something else in the chain) has added an 
extraneous space in the beginning of every line, though the +- deltas seem to 
be unaffected. I can try sending the patch some other way if needed, like as a 
binary attachment, or via git send-email, but possibly the "^ " (without 
quotes) per-line regular expression should be fine for cleaning it up as well. 
This is my first patch submission to this repository, and overall to a mailing 
list, so I'm just starting learning the best practices… hi everyone!!!Yes, the 
original issue was likely a consequence of us using a completely different 
build process than FFmpeg is designed to be built with. We have our own Premake 
5 scripts for FFmpeg components, that are used for generation of Android 
ndk-build files via my Premake generator, Triang3l/premake-androidndk on 
GitHub. So yes, we don't have any link options imported from a file.I agree, it 
was very suprising to see that dynamic relocations were done between object 
files within a single shared library at all. I suspected ASLR at first, but I 
don't know if it can randomize the locations of individual object files this 
way, that's probably not the case though. I'll see what can be done to 
reproduce the effect of libavcodec.v — since my shared library is the final 
application logic, not a library that's actually shareable, none of the FFmpeg 
objects need to be exported from it at all. However, static libraries barely 
have anything to configure overall as far as I know, so disabling exports 
specifically for FFmpeg may be complicated — but thankfully, we can (and even 
should, to reduce the file size) use -fvisibility=hidden globally in our 
application, if that helps fix the issue. -Wl,-Bsymbolic should be fine too, 
and possibly even less intrusive.If using __attribute__((visibility("hidden"))) 
for the lookup tables prevents dynamic relocations from being inserted, and if 
that doesn't break common usages of libavcodec, yes, it should be a way better 
solution than introducing an additional memory load at runtime. I'll check if 
that works for us tomorrow or in a couple of days.If we're able to avoid using 
the global object table this way though, maybe it could be desirable to also 
get rid of `movrelx` in the AArch32 code as well?By the way, I'm also super 
confused by how the offset is applied in the local `movrel` currently, it looks 
very inconsistent. The `adrp` and `add` combination, as I understand its 
operation, should work for any 32-bit literal value, not specifically for 
addresses of known objects — `adrp` accepting the upper 20 bits as a literal 
and adding them to the PC, and then `add` just adding the lower 12 bits, the 
offset within the page, also taken as a literal.While `movrelx` most likely 
requires the offset to be applied explicitly using 0…4095 `add` or `sub` 
afterwards because it's first loaded from a lookup table of addresses of, if I 
understand correctly, actual objects in the code (I wouldn't expect it to be 
filled with object+8, object+16, object+24 pointers in case of a large object 
for loading time reasons, though I haven't researched this topic in detail), if 
everything `movrel` does is adding the PC to the input literal… do we even need 
to emit `sub` for negative offsets in it?The current Linux implementation of 
`movrel` just adds the offset directly to the label using + regardless of 
whether it's positive or negative, and there already are instances of negative 
offsets in the code (`subpel_filters` minus 16 in vp8dsp_neon.S). The AArch32 
version of this code also doesn't use an offset parameter, rather, passing 
`subpel_filters-16` directly as the label argument.However, the Apple AArch64 
implementation does have two paths for applying the offset — directly to the 
literal if it's positive, but via a `sub` instruction for a negative one. This 
is also true for the Windows implementation — whose existence overall is 
questionable, as Windows DLLs use a different relocation method, and PIC 
doesn't apply to them at all if I understand correctly; and also the `movrel` 
implementation for Windows with a non-negative offset is identical to the Linux 
version, minus the hardware-assisted ASan path on Linux.I'll likely play with 
the assembler later to see how negative offsets are interpreted, but still, is 
there a reason to emit the subtraction instruction that you can remember, or 
would it be safe to possibly even remove the offset argument completely?For 
`movrelx`, however, if it's needed at all, the offset argument is desirable as 
without PIC, it will be applied to the lab

Re: [FFmpeg-devel] [PATCH 1/2] libavutil: Add av_visibility_hidden for setting hidden symbol visibility

2022-07-11 Thread Triang3l
Yes, making everything except for av_ hidden by default would be more 
consistent with the current build process, which includes libavcodec.v. 
Though, this is a special case that results not only in increasing the 
shared object file size if libavcodec.v is not used, which is 
undesirable, yet harmless, but also in making the library not linkable 
with PIC at all unless those symbols are hidden or forced to be resolved 
at link time some other way.


Thanks for implementing the fix very quickly, by the way!

I'd also suggest writing a comment in the code describing specifically 
the original issue that the current instances of the usage of 
visibility("hidden") resolves, so the reason why it's used there is not 
forgotten, and there's a clear pattern of relation between movrel X() 
and av_visibility_hidden to follow when adding new assembly code. Though 
if the convention is to rely on `git blame` for this purpose, that 
shouldn't be necessary.


— Triang3l

On 11/07/2022 15:12, Henrik Gramner wrote:

On Mon, Jul 11, 2022 at 11:19 AM Martin Storsjö  wrote:

+#if (AV_GCC_VERSION_AT_LEAST(4,0) || defined(__clang__)) && (defined(__ELF__) 
|| defined(__MACH__))
+#define av_visibility_hidden __attribute__((visibility("hidden")))
+#else
+#define av_visibility_hidden
+#endif

The usual approach is to compile with -fvisibility=hidden and
explicitly flag exported API symbols.

Is there a reason for doing this the other way around?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".