Re: AArch64 and -moutline-atomics
The 05/20/2020 09:02, Florian Weimer via Gcc wrote: > * Richard Henderson: > > On 5/19/20 3:38 AM, Florian Weimer via Gcc wrote: > >> One minor improvement would be to document __aarch64_have_lse_atomics as > >> interposable on the GCC side and define that directly in glibc, so that > >> lse-init.o is not linked in anymore and __aarch64_have_lse_atomics can > >> be initialized as soon as ld.so has the hwcap information. > > > > The __aarch64_have_lse_atomics symbol is not interposable. > > We use a direct pc-relative reference to it from each lse thunk. > > What I meant that users are allowed to supply their own definition in a > static link. Sorry, not sure what the correct terminology is here. I > don't think any code changes would be needed for that, it's just a > matter of documentation (and being careful about future evolution of the > code). are you proposing to put it in libc_nonshared.a/crt1.o? (and then ld.so would treat it specially when loading a module to initialize it early) or only dealing with it in libc.so, and let other modules still initialize it late (in case there are higher prio ctors or ifunc resolvers using atomics)?
Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt
On 01/07/15 16:34, Zinovy Nis wrote: > The only idea on size difference I have is: > > headers text in many of FP-emulation files from compiler_rt contains lines > like: > > // This file implements quad-precision soft-float addition ***with the > IEEE-754 default rounding*** (to nearest, ties to even). > nearest rounding and no exception flags. in other words they assume no fenv access.
Re: Testing and dynamic linking on remote target
On 09/07/15 16:56, David Talmage wrote: > I'm looking for a way to specify the LD_LIBRARY_PATH or LD_PRELOAD on the > target system when running one of the DejaGNU test suites. I'm testing a gcc > cross-compiler on a development board. I can't replace existing versions of > libraries under test because other people are using the development board > when > I'm testing. > > I found a thread about this in the archives: "Is anyone testing for a > (cross-) > target (board) with dynlinking?" > (https://gcc.gnu.org/ml/gcc/2008-02/msg00201.html). The best suggestion at > the > time was to NFS mount the cross-compiled library directory and use "-Wl,- > dynamic-linker -Wl,-rpath" in LDFLAGS. > > NFS mounting isn't an option for me, alas. > i think if you copy the libraries somewhere on the target then you can use -rpath-link=/libs/on/host -rpath=/libs/on/target
Re: Compiler support for erasure of sensitive data
* Zack Weinberg [2015-09-09 15:03:50 -0400]: > On 09/09/2015 02:02 PM, paul_kon...@dell.com wrote: > >> On Sep 9, 2015, at 1:54 PM, David Edelsohn > >> wrote: > >> > >> What level of erasure of sensitive data are you trying to ensure? > >> Assuming that overwriting values in the ISA registers actually > >> completely clears and destroys the values is delusionally naive. > > > > Could you point to some references about that? > > I *assume* David is referring to register renaming, which is not > architecturally visible... > or async signal handler copying all the register state on sigaltstack or internal counters and debug features making sensitive info observable or timing/cache-effect side channels that let other processes get info or compiling to a highlevel language (js) with different kind of leaks or running under emulator/debugger that can make secrets visible or... > I would consider data leaks via state inaccessible to a program > executing at the same privilege level as the code to be hardened to be > out of scope. (Which does mean that *when hardening an OS kernel* one specifying the info leak at the abstract c machine level is not useful (the memset is not observable there, unless you assign meaning to undefined behaviour which is a can of worms), but you do have to specify the leak on some abstraction level (that is applicable to the targets of a compiler and gives useful security properties in practice) otherwise the attribute is not meaningful. leaks can happen for many reasons that are layers below the control of the compiler, but still observable by high level code.
Re: Clarifying attribute-const
On 25/09/15 21:16, Eric Botcazou wrote: First, a belated follow-up to https://gcc.gnu.org/PR66512 . The bug is asking why attribute-const appears to have a weaker effect in C++, compared to C. The answer in that bug is that GCC assumes that attribute-const function can terminate by throwing an exception. FWIW there is an equivalent semantics in Ada: the "const" functions can throw and the language explicitly allows them to be CSEd in this case, etc. i think a throwing interface that may be moved around by the compiler makes reasoning about exception safety hard.. (i.e. the spec cannot be hand-wavy about the allowed optimizations). i guess the inconsistency stems from c++ making extern c apis throw by default (causing some amount of misery: in c one cannot throw nor declare something nothrow, so c api is pessimized in c++). That doesn't actually seem reasonable. Consider that C counterpart to throwing is longjmp; it seems to me that GCC should behave consistently: either assume that attribute-const may both longjmp and throw (I guess nobody wants that), or that it may not longjmp nor throw. Intuitively, if "const" means "free of side effects so that calls can be moved speculatively or duplicated", then non-local control flow transfer via throwing should be disallowed as well. This would pessimize a lot languages where exceptions are pervasive. i think if the language tends to allow catching invalid input to pure computation as an exception (e.g. division by zero) then throwing is preferred. if bad input is treated as undefined behavior then nothrow is preferred. In any case, it would be nice the intended compiler behavior could be explicitely stated in the manual. Agreed. +1
Re: [RFC] Kernel livepatching support in GCC
On 22/10/15 10:23, libin wrote: From: Jiangjiji Date: Sat, 10 Oct 2015 15:29:57 +0800 Subject: [PATCH] * gcc/config/aarch64/aarch64.opt: Add a new option. * gcc/config/aarch64/aarch64.c: Add some new functions and Macros. * gcc/config/aarch64/aarch64.h: Modify PROFILE_HOOK and FUNCTION_PROFILER. this patch might be worth submitting to gcc-patches. i assume this is not redundant with respect to the nop-padding work. Signed-off-by: Jiangjiji Signed-off-by: Li Bin --- gcc/config/aarch64/aarch64.c | 23 +++ gcc/config/aarch64/aarch64.h | 13 - gcc/config/aarch64/aarch64.opt |4 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 752df4e..c70b161 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -440,6 +440,17 @@ aarch64_is_long_call_p (rtx sym) return aarch64_decl_is_long_call_p (SYMBOL_REF_DECL (sym)); } +void +aarch64_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) +{ + if (flag_fentry) + { + fprintf (file, "\tmov\tx9, x30\n"); + fprintf (file, "\tbl\t__fentry__\n"); + fprintf (file, "\tmov\tx30, x9\n"); + } +} + you can even omit the mov x30,x9 at the call site if __fentry__ does stp x9,x30,[sp,#-16]! //... profiling ldp x30,x9,[sp],#16 ret x9 is there a problem with this? i think the rest of the patch means that -pg retains the old behaviour and -pg -mfentry emits this new entry. note that -pg rejects -fomit-frame-pointer (for no good reason), that should be fixed separately (it seems the kernel now relies on frame pointers on aarch64, but the mcount abi does not require this and e.g. the glibc mcount does not use it.) /* Return true if the offsets to a zero/sign-extract operation represent an expression that matches an extend operation. The operands represent the paramters from @@ -7414,6 +7425,15 @@ aarch64_emit_unlikely_jump (rtx insn) add_int_reg_note (insn, REG_BR_PROB, very_unlikely); } +/* Return true, if profiling code should be emitted before + * prologue. Otherwise it returns false. + * Note: For x86 with "hotfix" it is sorried. */ +static bool +aarch64_profile_before_prologue (void) +{ + return flag_fentry != 0; +} + /* Expand a compare and swap pattern. */ void @@ -8454,6 +8474,9 @@ aarch64_cannot_change_mode_class (enum machine_mode from, #undef TARGET_ASM_ALIGNED_SI_OP #define TARGET_ASM_ALIGNED_SI_OP "\t.word\t" +#undef TARGET_PROFILE_BEFORE_PROLOGUE +#define TARGET_PROFILE_BEFORE_PROLOGUE aarch64_profile_before_prologue + #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK #define TARGET_ASM_CAN_OUTPUT_MI_THUNK \ hook_bool_const_tree_hwi_hwi_const_tree_true diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 77b2bb9..65e34fc 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -804,13 +804,16 @@ do { \ #define PROFILE_HOOK(LABEL) \ { \ rtx fun, lr; \ -lr = get_hard_reg_initial_val (Pmode, LR_REGNUM); \ -fun = gen_rtx_SYMBOL_REF (Pmode, MCOUNT_NAME); \ -emit_library_call (fun, LCT_NORMAL, VOIDmode, 1, lr, Pmode); \ + if (!flag_fentry) + { + lr = get_hard_reg_initial_val (Pmode, LR_REGNUM); \ + fun = gen_rtx_SYMBOL_REF (Pmode, MCOUNT_NAME); \ + emit_library_call (fun, LCT_NORMAL, VOIDmode, 1, lr, Pmode); \ + } } -/* All the work done in PROFILE_HOOK, but still required. */ -#define FUNCTION_PROFILER(STREAM, LABELNO) do { } while (0) +#define FUNCTION_PROFILER(STREAM, LABELNO) + aarch64_function_profiler(STREAM, LABELNO) /* For some reason, the Linux headers think they know how to define these macros. They don't!!! */ diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 266d873..9e4b408 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -124,3 +124,7 @@ Enum(aarch64_abi) String(ilp32) Value(AARCH64_ABI_ILP32) EnumValue Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64) + +mfentry +Target Report Var(flag_fentry) Init(0) +Emit profiling counter call at function entry immediately after prologue.
Re: arm64:, Re: [RFC] Kernel livepatching support in GCC
On 22/10/15 11:14, AKASHI Takahiro wrote: On 10/22/2015 06:07 PM, libin wrote: 在 2015/5/28 16:39, Maxim Kuvyrkov 写道: Our proposal is that instead of adding -mfentry/-mnop-count/-mrecord-mcount options to other architectures, we should implement a target-independent option -fprolog-pad=N, which will generate a pad of N nops at the beginning of each function and add a section entry describing the pad similar to -mrecord-mcount [1]. Since adding NOPs is much less architecture-specific then outputting call instruction sequences, this option can be handled in a target-independent way at least for some/most architectures. Comments? As I found out today, the team from Huawei has implemented [2], which follows x86 example of -mfentry option generating a hard-coded call sequence. I hope that this proposal can be easily incorporated into their work since most of the livepatching changes are in the kernel. Thanks very much for your effort for this, and the arch-independed implementation is very good to me, but only have one question that how to enture the atomic replacement of multi instructions in kernel side? I have one idea, but we'd better discuss this topic in, at least including, linux-arm-kernel. And before this arch-independed option, can we consider the arch-depended -mfentry implemention for arm64 like arch x86 firstly? I will post it soon. livepatch for arm64 based on this arm64 -mfentry feature on github: https://github.com/libin2015/livepatch-for-arm64.git master I also have my own version of livepatch support for arm64 using yet-coming "-fprolog-add=N" option :) As we discussed before, the main difference will be how we should preserve LR register when invoking a ftrace hook (ftrace_regs_caller). But again, this is a topic to discuss mainly in linux-arm-kernel. (I have no intention of excluding gcc ml from the discussions.) is -fprolog-add=N enough from gcc? i assume it solves the live patching, but i thought -mfentry might be still necessary when live patching is not used. or is the kernel fine with the current mcount abi for that? (note that changes the code generation in leaf functions and currently the kernel relies on frame pointers etc.)
Re: arm64:, Re: [RFC] Kernel livepatching support in GCC
On 23/10/15 10:11, AKASHI Takahiro wrote: On 10/22/2015 07:26 PM, Szabolcs Nagy wrote: On 22/10/15 11:14, AKASHI Takahiro wrote: I also have my own version of livepatch support for arm64 using yet-coming "-fprolog-add=N" option :) As we discussed before, the main difference will be how we should preserve LR register when invoking a ftrace hook (ftrace_regs_caller). But again, this is a topic to discuss mainly in linux-arm-kernel. (I have no intention of excluding gcc ml from the discussions.) is -fprolog-add=N enough from gcc? Yes, as far as I correctly understand this option. i assume it solves the live patching, but i thought -mfentry might be still necessary when live patching is not used. No. - Livepatch depends on ftrace's DYNAMIC_FTRACE_WITH_REGS feature - DYNAMIC_FTRACE_WITH_REGS can be implemented either with -fprolog-add=N or -mfentry - x86 is the only architecture that supports -mfentry AFAIK - and it is used in the kernel solely to implement this ftrace feature AFAIK - So once a generic option, fprolog-add=N, is supported, we have no reason to add arch-specific -mfentry. or is the kernel fine with the current mcount abi for that? (note that changes the code generation in leaf functions Can you please elaborate your comments in more details? I didn't get your point here. ok, i may be confused. i thought there is a static ftrace (functions are instrumented with mcount using -pg) and a dynamic one where the code is modified at runtime. then i thought adding -fprolog-pad=N would be good for the dynamic case, but not for the static case. the static case may need improvements too because the current way (using regular c call abi for mcount) affects code generation more significantly than the proposed -mfentry solution would (e.g. leaf functions turn into non-leaf ones). hence the question: is the kernel satisfied with -pg mcount for the static ftrace or does it want -mfentry behaviour instead?
Re: Linux-abi group
* H.J. Lu [2016-02-08 11:24:53 -0800]: > I created a mailing list to discuss Linux specific,.processor independent > modification and extension of generic System V Application Binary Interface: > > https://groups.google.com/d/forum/linux-abi > > I will start to document existing Linux extensions, like STT_GNU_IFUNC. > I will propose some new extensions soon. > seems to require a registered email address at google. (and the archive does not work from any console based browser or using direct http get tools.) the kernel seems to have a lot of mailing lists, may be they can handle this list too? thanks
Re: gnu-gabi group
On 15/02/16 16:03, H.J. Lu wrote: > On Mon, Feb 15, 2016 at 7:37 AM, Alexandre Oliva wrote: >> On Feb 12, 2016, Pedro Alves wrote: >> >> wonderful. I am not a big fan of google groups mailinglists, they seem to make it hard to subscribe and don't have easy to access archives. Having a local gnu-gabi group on sourceware.org would be better IMHO. >> >>> +1 >> >> +1 >> >> Since it's GNU tools we're talking about, we'd better use a medium that >> we've all already agreed to use, than one that a number of us objects >> to. I, for one, have closed my Google account several Valentine's Days >> ago, for privacy reasons, and this makes the archives of lists hidden >> there unusable for me. > > Please don't spread false information. Anyone can subscribe Linux-ABI > group and its archive is to open to everyone. You don't need a gmail account > for any of those. There are quite a few non-gmail users. You don't have > to take my word for it. I can add your email to Linux-ABI group and you > can check it out yourself :-). > you as a group admin can do that, others cannot join without creating a account at google (which requires the acceptance of the google tos etc). you also have censorship rights over others. even if you add users to the list they cannot access the archive through standard http or https, they need to allow google to execute javascript code on their machine. (so wget does not work). and the url through which you visit a post is not a reliable permanent link so linking to posts is hard. i think google groups is not an acceptable forum for discussing open standards publicly.
Re: gnu-gabi group
On 15/02/16 17:36, Mike Frysinger wrote: > On 15 Feb 2016 16:18, Szabolcs Nagy wrote: >> you as a group admin can do that, others cannot join >> without creating a account at google (which requires >> the acceptance of the google tos etc). > > that is annoying i didn't know about list+subscr...@googlegroups.com (thanks Florian and Joseph) >> you also have censorship rights over others. > > umm, every mailing list has that. Google Groups is no different. it's better if admin right is at some discussion related organization. (e.g. in case anything happens to H.J.Lu) >> even if you add users to the list they cannot access >> the archive through standard http or https, > > you're conflating things here. of course access is through "standard > http or https" -- that's the transport protocol that everyone has to > implement according to the standard in order to work. Goole is not > different here. the contents cannot be accessed with an http or https client. (unless you know the magic urls below) >> they need to allow google to execute javascript code on their >> machine. > > complaining that the web interface executes JS is a bit luddite-ish. some of us tend to browse the web from terminal (== no js). >> (so wget does not work). > > every message has a link to the raw message you can use to fetch the > mail directly. > > perm link: > https://groups.google.com/d/msg/x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ redirects me to https://groups.google.com/forum/#!msg/x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ > which has a link to the raw message: > https://groups.google.com/forum/message/raw?msg=x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ i didn't know about this raw url, it seems there is https://groups.google.com/forum/print/msg/x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ too, so if i always change the urls i can browse the archive. (this is not discoverable without js as far as i can see) with the +subscribe@ and the raw msg options i'm no longer against google groups hosting public discussions (provided the project documents these somewhere), i still prefer more accessible alternatives though. > it's actually nicer than mailmain (i.e. sourceware) as it doesn't do all > the trivial content mangling (s/@/ at/g). it's not like e-mail scrapers > today can't reverse that easily. > >> and the url through which you visit a post is not a >> reliable permanent link so linking to posts is hard. > > every post has a "link" option to get a perm link. needing the location > in the URL bar be the perm link is a weak (dumb imo) requirement. > -mike >
Re: Q: (d = NAN) != NAN?
On 08/04/16 11:09, Ulrich Windl wrote: > Probably I'm doing something wrong, but I have some problems comparing a > double with NAN: The value is NAN, but the test fails. Probably I should use > isnana(). yes, that's how ieee works, nan != nan is true.
Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]
On 19/04/16 09:20, Richard Biener wrote: > On Tue, Apr 19, 2016 at 7:08 AM, Alan Modra wrote: >> On Mon, Apr 18, 2016 at 07:59:50AM -0700, H.J. Lu wrote: >>> On Mon, Apr 18, 2016 at 7:49 AM, Alan Modra wrote: On Mon, Apr 18, 2016 at 11:01:48AM +0200, Richard Biener wrote: > To summarize: there is currently no testcase for a wrong-code issue > because there is no wrong-code issue. >> >> I've added a testcase at >> https://sourceware.org/bugzilla/show_bug.cgi?id=19965#c3 >> that shows the address problem (&x != x) with older gcc *or* older >> glibc, and shows the program behaviour problem with current >> binutils+gcc+glibc. > > Thanks. > > So with all this it sounds that current protected visibility is just broken > and we should forgo with it, making it equal to default visibility? > the test cases pass for me on musl libc, it's just a glibc dynamic linker bug that it does not handle extern protected visibility correctly. > At least I couldn't decipher a solution that solves all of the issues > with protected visibility apart from trying to error at link-time > (or runtime?) for the cases that are tricky (impossible?) to solve. > > glibc uses "protected visibility" via its using of local aliases, correct? > But it doesn't use anything like that for data symbols? > > Richard. > >> -- >> Alan Modra >> Australia Development Lab, IBM >
Re: SafeStack proposal in GCC
On 13/04/16 14:01, Cristina Georgiana Opriceana wrote: > I bring to your attention SafeStack, part of a bigger research project > - CPI/CPS [1], which offers complete protection against stack-based > control flow hijacks. i think it does not provide complete protection. it cannot instrument the c runtime or dsos and attacks can be retried on a forking server which has fixed memory layout, so there is still significant attack surface. (it would be nice if security experts made such claims much more carefully). > In GCC, we propose a design composed of an instrumentation module > (implemented as a GIMPLE pass) and a runtime library. ... > The runtime support will have to deal with unsafe stack allocation - a > hook in the pthread create/destroy functions to create per-thread > stack regions. This runtime support might be reused from the Clang > implementation. the SafeStack runtime in compiler-rt has various issues that should be clearly documented. it seems the runtime * aborts the process on allocation failure. * deallocates the unsafe stack using tsd dtors, but signal handlers may run between dtors and the actual thread exit.. without a mapped unsafe stack. * determines the main stack with broken heuristic (since the rlimit can change at runtime i don't think this is possible to do correctly in general). * interposes pthread_create but not c11 thrd_create so conforming c11 code will crash. (same for non-standard usage of raw clone.) * sigaltstack and swapcontext are broken too. i think the runtime issues are more likely to cause problems than the compiler parts: it has to be reliable and abi stable since safestack is advertised for production use. (i think gcc should raise the bar for runtime code quality higher than that, but there is precedent for much worse runtimes in gcc so this should not block the safestack porting work, however consider these issues when communicating about it to upstream or to potential users.)
GCC 6 symbol poisoning and c++ header usage is fragile
building gcc6 using musl based gcc6 fails with symbol poisoning error (details at the end of the mail). the root cause is c++: c++ headers include random libc headers with _GNU_SOURCE ftm so all sorts of unexpected symbols are defined/declared. since it's unlikely the c++ standard gets fixed (to properly specify the namespace rules) it is not acceptable to include std headers after system.h, where the poisoning happens, because trivial libc header change will break the build. c++ header use in gcc seems inconsistent, e.g. there are cases where - is included before system.h (go-system.h) - is included after system.h (indirectly through ) - is included in system.h because INCLUDE_STRING is defined. - is included in system.h and in source files using it.. sometimes i think it should be consistently before system.h (i'm not sure what's going on with the INCLUDE_STRING macro), fortunately not many files are affected in gcc/: auto-profile.c diagnostic.c graphite-isl-ast-to-gimple.c ipa-icf.c ipa-icf-gimple.c pretty-print.c toplev.c i can prepare a patch moving the c++ includes up and i'm open to other suggestions. (including libc headers is also problematic because of _GNU_SOURCE, but still safer than what is happening in c++ land where #include makes all locale.h, pthread.h, time.h, sched.h, etc symbols visible). x86_64-linux-musl-g++ -fno-PIE -c -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -I. -I. -I/src/gcc/gcc -I/src/gcc/gcc/. -I/src/gcc/gcc/../include -I/src/gcc/gcc/../libcpp/include -I/build/host-tools/include -I/build/host-tools/include -I/build/host-tools/include -I/src/gcc/gcc/../libdecnumber -I/src/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/src/gcc/gcc/../libbacktrace -o auto-profile.o -MT auto-profile.o -MMD -MP -MF ./.deps/auto-profile.TPo /src/gcc/gcc/auto-profile.c In file included from /tool/x86_64-linux-musl/include/pthread.h:30:0, from /tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr-default.h:35, from /tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr.h:148, from /tool/x86_64-linux-musl/include/c++/6.0.1/ext/atomicity.h:35, from /tool/x86_64-linux-musl/include/c++/6.0.1/bits/basic_string.h:39, from /tool/x86_64-linux-musl/include/c++/6.0.1/string:52, from /tool/x86_64-linux-musl/include/c++/6.0.1/stdexcept:39, from /tool/x86_64-linux-musl/include/c++/6.0.1/array:39, from /tool/x86_64-linux-musl/include/c++/6.0.1/tuple:39, from /tool/x86_64-linux-musl/include/c++/6.0.1/bits/stl_map.h:63, from /tool/x86_64-linux-musl/include/c++/6.0.1/map:61, from /src/gcc/gcc/auto-profile.c:36: /tool/x86_64-linux-musl/include/sched.h:74:7: error: attempt to use poisoned "calloc" void *calloc(size_t, size_t); ^ /tool/x86_64-linux-musl/include/sched.h:114:36: error: attempt to use poisoned "calloc" #define CPU_ALLOC(n) ((cpu_set_t *)calloc(1,CPU_ALLOC_SIZE(n))) ^
Re: GCC 6 symbol poisoning and c++ header usage is fragile
On 21/04/16 12:36, Richard Biener wrote: > On Thu, Apr 21, 2016 at 1:11 PM, Szabolcs Nagy wrote: >> building gcc6 using musl based gcc6 fails with symbol poisoning error >> (details at the end of the mail). >> >> the root cause is c++: c++ headers include random libc headers with >> _GNU_SOURCE ftm so all sorts of unexpected symbols are defined/declared. >> >> since it's unlikely the c++ standard gets fixed (to properly specify >> the namespace rules) it is not acceptable to include std headers after >> system.h, where the poisoning happens, because trivial libc header >> change will break the build. >> >> c++ header use in gcc seems inconsistent, e.g. there are cases where >> - is included before system.h (go-system.h) >> - is included after system.h (indirectly through ) >> - is included in system.h because INCLUDE_STRING is defined. >> - is included in system.h and in source files using it.. sometimes >> >> i think it should be consistently before system.h (i'm not sure what's >> going on with the INCLUDE_STRING macro), fortunately not many files are >> affected in gcc/: > > system headers should be included from _within_ system.h. To avoid including > them everywhere we use sth like > > /* Include before "safe-ctype.h" to avoid GCC poisoning >the ctype macros through safe-ctype.h */ > > #ifdef __cplusplus > #ifdef INCLUDE_STRING > # include > #endif > #endif > > so sources do > > #define INCLUDE_STRING > #include "config.h" > #include "system.h" > > So the cases can be simplified with INCLUDE_STRING and the > case should be added similarly (unless we decide is cheap > enough to be always included). > is always included already. there is also , , usage and go-system.h is special. (and gmp.h includes when built with c++) so i can prepare a patch with INCLUDE_{MAP,SET,LIST} and remove the explicit libc/libstdc++ includes. > Richard. > >> auto-profile.c >> diagnostic.c >> graphite-isl-ast-to-gimple.c >> ipa-icf.c >> ipa-icf-gimple.c >> pretty-print.c >> toplev.c >> >> i can prepare a patch moving the c++ includes up and i'm open to >> other suggestions. (including libc headers is also problematic because >> of _GNU_SOURCE, but still safer than what is happening in c++ land >> where #include makes all locale.h, pthread.h, time.h, sched.h, >> etc symbols visible). >> >> >> x86_64-linux-musl-g++ -fno-PIE -c -g -O2 -DIN_GCC -fno-exceptions >> -fno-rtti -fasynchronous-unwind-tables >> -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual >> -Wmissing-format-attribute -Woverloaded-virtual -pedantic >> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings >> -DHAVE_CONFIG_H -I. -I. -I/src/gcc/gcc >> -I/src/gcc/gcc/. -I/src/gcc/gcc/../include -I/src/gcc/gcc/../libcpp/include >> -I/build/host-tools/include >> -I/build/host-tools/include -I/build/host-tools/include >> -I/src/gcc/gcc/../libdecnumber >> -I/src/gcc/gcc/../libdecnumber/dpd -I../libdecnumber >> -I/src/gcc/gcc/../libbacktrace -o auto-profile.o -MT >> auto-profile.o -MMD -MP -MF ./.deps/auto-profile.TPo >> /src/gcc/gcc/auto-profile.c >> In file included from /tool/x86_64-linux-musl/include/pthread.h:30:0, >> from >> /tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr-default.h:35, >> from >> /tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr.h:148, >> from >> /tool/x86_64-linux-musl/include/c++/6.0.1/ext/atomicity.h:35, >> from >> /tool/x86_64-linux-musl/include/c++/6.0.1/bits/basic_string.h:39, >> from /tool/x86_64-linux-musl/include/c++/6.0.1/string:52, >> from /tool/x86_64-linux-musl/include/c++/6.0.1/stdexcept:39, >> from /tool/x86_64-linux-musl/include/c++/6.0.1/array:39, >> from /tool/x86_64-linux-musl/include/c++/6.0.1/tuple:39, >> from >> /tool/x86_64-linux-musl/include/c++/6.0.1/bits/stl_map.h:63, >> from /tool/x86_64-linux-musl/include/c++/6.0.1/map:61, >> from /src/gcc/gcc/auto-profile.c:36: >> /tool/x86_64-linux-musl/include/sched.h:74:7: error: attempt to use poisoned >> "calloc" >> void *calloc(size_t, size_t); >>^ >> /tool/x86_64-linux-musl/include/sched.h:114:36: error: attempt to use >> poisoned "calloc" >> #define CPU_ALLOC(n) ((cpu_set_t *)calloc(1,CPU_ALLOC_SIZE(n))) >> ^ >> >
Re: GCC 6 symbol poisoning and c++ header usage is fragile
On 21/04/16 12:52, Jonathan Wakely wrote: > On 21 April 2016 at 12:11, Szabolcs Nagy wrote: >> the root cause is c++: c++ headers include random libc headers with >> _GNU_SOURCE ftm so all sorts of unexpected symbols are defined/declared. > > Yes, I'd really like to be able to stop defining _GNU_SOURCE > unconditionally. It needs some libstdc++ and glibc changes for that to > happen, I'll be looking at it for gcc 7. > > >> since it's unlikely the c++ standard gets fixed (to properly specify >> the namespace rules) > > Fixed how? What's wrong with the rules? (I'd like to understand what's > wrong here before I try to change anything, and I don't understand the > comment above). > posix has "namespace rules" specifying what symbols are reserved for the implementation when certain headers are included. (it's not entirely trivial, i have a collected list http://port70.net/~nsz/c/posix/reserved.txt http://port70.net/~nsz/c/posix/README.txt i use for testing musl headers, glibc also does such namespace checks.) e.g. the declared function names in a header are reserved to be defined as macros. c++ does not specify how its headers interact with posix headers except for a few c standard headers where it requires no macro definition for functions (and imposes some other requirements on the libc like being valid c++ syntax, using extern "C" where appropriate etc). so from a libc implementor's point of view, including libc headers into c++ code is undefined behaivour (neither posix nor c++ specifies what should happen). without a specification libc headers just piling #ifdef __cplusplus hacks when ppl run into problems. e.g. c++ code uses ::pthread_equal(a,b), but musl used a macro for pthread_equal (the only sensible implementation is (a)==(b), this has to be suppressed for c++, which now uses an extern call to do the same), i'm also pretty sure a large number of c++ code would break if unistd.h defined "read", "write", "link" etc as macros, since these are often used as method names in c++, but this would be a conforming libc implementation.
Re: SafeStack proposal in GCC
On 09/05/16 22:45, Michael Matz wrote: > On Mon, 9 May 2016, Rich Felker wrote: > >>> Done. I never understood why they left in the hugely unuseful >>> {sig,}{set,long}jmp() but removed the actually useful *context() >>> (amended somehow like above). >> >> Because those are actually part of the C language > > Sure. Same QoI bug in my book. (And I'm not motivated enough to find out > if the various C standards weren't just following POSIX whe setjmp was > included, or really the other way around). > >> (the non-sig versions, but the sig versions are needed to work around >> broken unices that made the non-sig versions save/restore signal mask >> and thus too slow to ever use). They're also much more useful for >> actually reasonable code (non-local exit across functions that were >> badly designed with no error paths) > > Trivially obtainable with getcontext/setcontext as well. > >> as opposed to just nasty hacks that >> are mostly/entirely UB anyway (coroutines, etc.). > > Well, we differ in the definition of reasonable :) And I certainly don't > see any material difference in undefined behaviour between both classes of > functions. Both are "special" regarding compilers (e.g. returning > multiple times) and usage. But as the *jmp() functions can be implemented > with *context(), but not the other way around, it automatically follows no, no, no, don't try to present getcontext as equal to setjmp, getcontext is broken while setjmp is just ugly. setjmp is defined so that the compiler can treat it specially and the caller has to make sure certain objects are volatile, cannot appear in arbitrary places (e.g. in the declaration of a vla), longjmp must be in same thread etc. all those requirements that make setjmp implementible at all were missing from the getcontext specs, so you can call it through a function pointer and access non-volatile modified local state after the second return, etc. (the compiler treating "getcontext" specially is a hack, not justified by any standard.) i think both gccgo and qemu can setcontext into another thread, so when getcontext returns all tls object addresses are wrong.. the semantics of this case was not properly defined anywhere (and there are implementation internal objects with thread local storage duration like fenv so this matters even if the caller does not use tls). this is unlikely to work correctly with whatever safestack implementation. setcontext were originally specified to be able to use the ucontext from async signal handlers.. this turned out to be problematic for several reasons (kernel saved ucontext is different from user space ucontext and sigaltstack needs special treatment). if setcontext finishes executing the last linked context in the main thread it was not clearly specified what cleanups will be performed. there is just a never ending list of issues with these apis, so unless there is an actual proposal how to tighten their specification, any caller of the context apis rely on undefined semantics. > (to me!) that the latter are more useful, if for nothing else than basic > building blocks. (there are coroutine libs that try to emulate a real > makecontext with setjmp/longjmp on incapable architectures. As this is > impossible for all corner cases they are broken and generally awful on > them :) ) > > > Ciao, > Michael. >
Re: LTO and undefined reference to typeinfo
On 23/05/16 12:36, MM wrote: > Hello, > > g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) > GNU gold (version 2.25-17.fc23) 1.11 > I successfully link a executable in debug mode (-std=c++11 -g) but not in > release mode (-std=c++11 -flto -O3). All sources are compiled with the same > option. Shared libraries are used. > The compiler driver is used to launch the final link line: > /bin/c++-std=c++11 -Wno-multichar -O3 -DNDEBUG -flto > -o -rdynamic Wl,-rpath, > > These are the errors I see (only in release, not in debug): > ... [clone .constprop.79]: error: undefined reference to > 'typeinfo for market [clone .lto_priv.1353]' > > Both the debug and release version of the object referencing this show the > same with gcc-nm: > > U typeinfo for market > Note this bit " [clone .lto_priv.1353]" is not in the symbol at all. > > This is what gcc-nm says for the object where the symbol is defined > (market.cpp.o, which is part of libmarkets.so): > > 1. In DEBUG > gcc-nm -C market.cpp.o | grep 'typeinfo for market' > V typeinfo for market > > 2. In RELEASE > gcc-nm -C market.cpp.o | grep 'typeinfo for market' > W typeinfo for market > This is the one that fails. > Given the versions of gcc and ld, the default behaviour for lto should be > straightforward? > Any ideas what's going on? > typeinfo seems to be a weak object symbol which is known to be broken with lto, so this may be related to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69271 > Thanks > MM >
Re: LTO and undefined reference to typeinfo
On 23/05/16 14:24, MM wrote: > On 23 May 2016 at 12:53, Szabolcs Nagy wrote: >> On 23/05/16 12:36, MM wrote: >>> Hello, >>> >>> g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6) >>> GNU gold (version 2.25-17.fc23) 1.11 >>> I successfully link a executable in debug mode (-std=c++11 -g) but not in >>> release mode (-std=c++11 -flto -O3). All sources are compiled with the same >>> option. Shared libraries are used. >>> The compiler driver is used to launch the final link line: >>> /bin/c++-std=c++11 -Wno-multichar -O3 -DNDEBUG -flto >>> -o -rdynamic Wl,-rpath, >>> >>> These are the errors I see (only in release, not in debug): >>> ... [clone .constprop.79]: error: undefined reference to >>> 'typeinfo for market [clone .lto_priv.1353]' >>> >>> Both the debug and release version of the object referencing this show the >>> same with gcc-nm: >>> >>> U typeinfo for market >>> Note this bit " [clone .lto_priv.1353]" is not in the symbol at all. >>> >>> This is what gcc-nm says for the object where the symbol is defined >>> (market.cpp.o, which is part of libmarkets.so): >>> >>> 1. In DEBUG >>> gcc-nm -C market.cpp.o | grep 'typeinfo for market' >>> V typeinfo for market >>> >>> 2. In RELEASE >>> gcc-nm -C market.cpp.o | grep 'typeinfo for market' >>> W typeinfo for market >>> This is the one that fails. >>> Given the versions of gcc and ld, the default behaviour for lto should be >>> straightforward? >>> Any ideas what's going on? >>> >> >> typeinfo seems to be a weak object symbol >> which is known to be broken with lto, so >> this may be related to: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69271 >> > > Is it a workaround to not compile the referencing cpp and the referred > cpp without lto, yet compile all the rest and link with lto? > Otherwise, I'll turn off LTO until that bug is resolved. it is not clear if this the same issue as pr692771, so i think you should submit a bug report with test code if possible. > > Thanks >
Re: Should we import gnulib under gcc/ or at the top-level like libiberty?
On 23/06/16 12:18, Pedro Alves wrote: > gdb doesn't put that gnulib wrapper library at the top level, mainly > just because of history -- we didn't always have that wrapper > library -- and the fact that gdb/gdbserver/ itself is not at top > level either, even though it would be better moved to top level. > > See this long email, explaining how the current gdb's gnulib import > is set up: > > https://sourceware.org/ml/gdb-patches/2012-04/msg00426.html > > I suggest gcc reuses the whole of gdb's wrapper library and scripts: > > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=tree;f=gdb/gnulib;h=cdf326774716ae427dc4fb47c9a410fcdf715563;hb=HEAD > > ... but put it in the top level instead. if both gcc and binutils used a toplevel gnulib directory then shared tree build would have the same problem as libiberty has now: gcc and binutils can depend on different versions of libiberty and then the build can fail. as far as i know the shared tree build is the only way to build a toolchain without install (using in tree binutils) and it would be nice to fix that use case.
Re: GCC libatomic ABI specification draft
On 17/11/16 20:12, Bin Fan wrote: > > Although this ABI specification specifies that 16-byte properly aligned > atomics are inlineable on platforms > supporting cmpxchg16b, we document the caveats here for further discussion. > If we decide to change the > inlineable attribute for those atomics, then this ABI, the compiler and the > runtime implementation should be > updated together at the same time. > > > The compiler and runtime need to check the availability of cmpxchg16b to > implement this ABI specification. > Here is how it would work: The compiler can get the information either from > the compiler flags or by > inquiring the hardware capabilities. When the information is not available, > the compiler should assume that > cmpxchg16b instruction is not supported. The runtime library implementation > can also query the hardware > compatibility and choose the implementation at runtime. Assuming the user > provides correct compiler options with this abi the runtime implementation *must* query the hardware (because there might be inlined cmpxchg16b in use in another module on a hardware that supports it and the runtime must be able to sync with it). currently gcc libatomic does not guarantee this which is dangerously broken: if gcc is configured with --disable-gnu-indirect-function (or on targets without ifunc support: solaris, bsd, android, musl,..) the compiler may inline cmpxchg16b in one translation unit but use incompatible runtime function in another. there is PR 70191 but this issue has wider scope. > and the inquiry returns the correct information, on a platform that supports > cmpxchg16b, the code generated > by the compiler will both use cmpxchg16b; on a platform that does not support > cmpxchg16b, the code generated > by the compiler, including the code generated for a generic platform, always > call the support function, so > there is no compatibility problem.
Re: GCC libatomic ABI specification draft
On 20/12/16 13:26, Ulrich Weigand wrote: > Torvald Riegel wrote: >> On Fri, 2016-12-02 at 12:13 +0100, Gabriel Paubert wrote: >>> On Thu, Dec 01, 2016 at 11:13:37AM -0800, Bin Fan at Work wrote: Thanks for the comment. Yes, the ABI requires libatomic must query the hardware. This is necessary if we want the compiler to generate inlined code for 16-byte atomics. Note that this particular issue only affects x86. >>> >>> Why? Power (at least recent ones) has 128 bit atomic instructions >>> (lqarx/stqcx.) and Z has 128 bit compare and swap. >> >> That's not the only factor affecting whether cmpxchg16b or such is used >> for atomics. If the HW just offers a wide CAS but no wide atomic load, >> then even an atomic load is not truly just a load, which breaks (1) >> atomic loads on read-only mapped memory and (2) volatile atomic loads >> (unless we claim that an idempotent store is like a load, which is quite >> a stretch for volatile I think). > > I may have missed the context of the discussion, but just on the > specific ISA question here: both Power and z not only have the > 16-byte CAS (or load-and-reserve/store-conditional), but they also both > have specific 16-byte atomic load and store instructions (lpq/stpq > on z, lq/stq on Power). > > Those are available on any system supporting z/Architecture (z900 and up), > and on any Power system supporting the V2.07 ISA (POWER8 and up). GCC > does in fact use those instructions to implement atomic operations on > 16-byte data types on those machines. that's a bug. at least i don't see how gcc makes sure the libatomic calls can interoperate with inlined atomics.
Re: GCC libatomic ABI specification draft
On 22/12/16 17:37, Segher Boessenkool wrote: > We do not always have all atomic instructions. Not all processors have > all, and it depends on the compiler flags used which are used. How would > libatomic know what compiler flags are used to compile the program it is > linked to? > > Sounds like a job for multilibs? x86_64 uses ifunc dispatch to always use atomic instructions if available (which is bad because ifunc is not supported on all platforms). either such runtime feature detection and dispatch is needed in libatomic or different abis have to be supported (with the usual hassle).
Re: .../lib/gcc//7.1.1/ vs. .../lib/gcc//7/
On 06/01/17 12:48, Jakub Jelinek wrote: > SUSE and some other distros use a hack that omits the minor and patchlevel > versions from the directory layout, just uses the major number, it is very what is the benefit?
Re: .../lib/gcc//7.1.1/ vs. .../lib/gcc//7/
On 06/01/17 13:11, Jakub Jelinek wrote: > On Fri, Jan 06, 2017 at 01:07:23PM +0000, Szabolcs Nagy wrote: >> On 06/01/17 12:48, Jakub Jelinek wrote: >>> SUSE and some other distros use a hack that omits the minor and patchlevel >>> versions from the directory layout, just uses the major number, it is very >> >> what is the benefit? > > Various packages use the paths to gcc libraries/includes etc. in various > places (e.g. libtool, *.la files, etc.). So any time you upgrade gcc it is a bug that gcc installs libtool la files, because a normal cross toolchain is relocatable but the la files have abs path in them. that would be nice to fix, so build scripts don't have to manually delete the bogus la files. > (say from 6.1.0 to 6.2.0 or 6.2.0 to 6.2.1), everything that has those paths > needs to be rebuilt. By having only the major number in the paths (which is > pretty much all that matters), you only have to rebuild when the major > version of gcc changes (at which time one usually want to mass rebuild > everything anyway). i thought only the gcc driver needs to know these paths because there are no shared libs there that are linked into binaries so no binary references those paths so nothing have to be rebuilt.
weak pthread symbols in libgcc/gthr-posix.h cause issues
the weakref magic in libgcc/gthr-posix.h is not guaranteed to work which can at least break libstdc++ with static linking and dlopen there are several bugs here: - fallback code (unknown posix systems) should assume multi-threaded application instead of using a fragile threadedness test - determining threadedness with weak symbols is broken for dynamic loading and static linking as well (dlopened library can pull in pthread dependency at runtime, and with static linking a symbol does not indicate the availability of another) - using symbols through weak references at runtime is wrong with static linking (it just happens to work with hacks that put a single .o into libpthread.a) see this analysis for more details and crashing example code: http://www.openwall.com/lists/musl/2014/10/18/5 the static linking issue there was fixed by unconditionally disabling the weak symbols in libgcc/gthr.h when building the toolchain: #define GTHREAD_USE_WEAK 0 i sent this report to the libstdc++ list first but got redirected here: https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html the static linking issue there was worked around by using linker flags '-Wl,--whole-archive -lpthread -Wl,--no-whole-archive' i think upstream should fix this properly
Re: RFC: Creating a more efficient sincos interface
On 13/09/18 14:52, Florian Weimer wrote: > On 09/13/2018 03:27 PM, Wilco Dijkstra wrote: >> Hi, >> >> The existing sincos functions use 2 pointers to return the sine and cosine >> result. In >> most cases 4 memory accesses are necessary per call. This is inefficient and >> often >> significantly slower than returning values in registers. I ran a few >> experiments on the >> new optimized sincosf implementation in GLIBC using the following interface: >> >> __complex__ float sincosf2 (float); >> >> This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 >> for >> random inputs in the range +-PI/4. Larger inputs take longer and thus have >> lower >> gains, but there is still a 5% gain on the (rarely used) path with full >> range reduction. >> Given sincos is used in various HPC applications this can give a worthwile >> speedup. > > I think this is totally fine if you call it expif or something like that (and > put the sine in the imaginary part, of course). > gcc seems to have a __builtin_cexpif https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/builtins.c;h=58ea7475ef7bb2a8abad2463b896efaa8fd79650;hb=HEAD#l2439 but i dont see it documented, may be we can add an actual cexpif symbol with the above signature? > In general, I would object to using complex numbers for arbitrary pairs, but > this doesn't apply to this case. > > Thanks, > Florian
Re: TLSDESC clobber ABI stability/futureproofness?
On 11/10/18 04:53, Alexandre Oliva wrote: > On Oct 10, 2018, Rich Felker wrote: >> For aarch64 at least, according to discussions I had with Szabolcs >> Nagy, there is an intent that any new extensions to the aarch64 >> register file be treated as clobbered by tlsdesc functions, rather >> than preserved. > > That's unfortunate. I'm not sure I understand the reasoning behind this > intent. Maybe we should discuss it further? > sve registers overlap with existing float registers so float register access clobbers them. so new code is suddenly not compatible with existing tlsdesc entry points in the libc. i think extensions should not cause such abi break. we could mark binaries so they fail to load on an old system instead of failing randomly at runtime, but requiring new libc for a new system is suboptimal (you cannot deploy stable linux distros then). if we update the libc then the tlsdesc entry has to save/restore all sve regs, which can be huge state (cause excessive stack usage), but more importantly suddenly the process becomes "sve enabled" even if it otherwise does not use sve at all (linux kernel keeps track of which processes use sve instructions, ones that don't can enter the kernel more quickly as the sve state does not have to be saved) i don't see a good solution for this, but in practice it's unlikely that user code would need tls access and sve together much, so it seems reasonable to just add sve registers to tlsdesc call clobber list and do the same for future extensions too (tlsdesc call will not be worse than a normal indirect call). (in principle it's possible that tlsdesc entry avoids using any float regs, but in practice that requires hackery in the libc: marking every affected translation units with -mgeneral-regs-only or similar)
Re: Parallelize the compilation using Threads
On 15/11/18 10:29, Richard Biener wrote: > In my view (I proposed the thing) the most interesting parts are > getting GCCs global state documented and reduced. The parallelization > itself is an interesting experiment but whether there will be any > substantial improvement for builds that can already benefit from make > parallelism remains a question. in the common case (project with many small files, much more than core count) i'd expect a regression: if gcc itself tries to parallelize that introduces inter thread synchronization and potential false sharing in gcc (e.g. malloc locks) that does not exist with make parallelism (glibc can avoid some atomic instructions when a process is single threaded).
Re: autovectorization in gcc
On 10/01/2019 08:19, Richard Biener wrote: > On Wed, 9 Jan 2019, Jakub Jelinek wrote: > >> On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote: >>> extern void vf1() >>> { >>>#pragma vectorize enable >>>for ( int i = 0 ; i < 32768 ; i++ ) >>> data [ i ] = std::sqrt ( data [ i ] ) ; >>> } >>> >>> Compiling on this x86_64 box with -fopt-info-vec-missed shows the >> >>> _7 = .SQRT (_1); >>> if (_1 u>= 0.0) >>> goto ; [99.95%] >>> else >>> goto ; [0.05%] >>> >>>[local count: 1062472912]: >>> goto ; [100.00%] >>> >>>[local count: 531495]: >>> __builtin_sqrtf (_1); >>> >>> I'm not sure where that control flow came from: it isn't in >>> sqrt-test.cc.104t.stdarg >>> but is in >>> sqrt-test.cc.105t.cdce >>> so I think it's coming from the argument-range code in cdce. >>> >>> Arguably the location on the statement is wrong: it's on the loop >>> header, when it presumably should be on the std::sqrt call. >> >> See my either mail, it is the result of the -fmath-errno default, >> the inline emitted sqrt doesn't handle errno setting and we emit >> essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); >> where >> the former sqrt is inline using HW instructions and the latter is the >> library call. >> >> With some extra work we could vectorize it; e.g. if we make it handle >> OpenMP #pragma omp ordered simd efficiently, it would be the same thing >> - allow non-vectorizable portions of vectorized loops by doing there a >> scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the >> limitation >> that the vectorized loop is a single bb. Essentially, in this case it would >> be >> vec1 = vec_load (data + i); >> vec2 = vec_sqrt (vec1); >> if (__builtin_expect (any (vec2 < 0.0))) >> { >> for (int i = 0; i < vf; i++) >> sqrt (vec2[i]); >> } >> vec_store (data + i, vec2); >> If that would turn to be way too hard, we could for the vectorization >> purposes hide that into the .SQRT internal fn, say add a fndecl argument to >> it if it should treat the exceptional cases some way so that the control >> flow isn't visible in the vectorized loop. > > If we decide it's worth the trouble I'd rather do that in the epilogue > and thus make the any (vec2 < 0.0) a reduction. Like > >smallest = min(smallest, vec1); > > and after the loop do the errno thing on the smallest element. > > That said, this is a transform that is probably worthwhile even > on scalar code, possibly easiest to code-gen right from the start > in the call-dce pass. if this is useful other than errno handling then fine, but i think it's a really bad idea to add optimization complexity because of errno handling: nobody checks errno after sqrt (other than conformance test code). -fno-math-errno is almost surely closer to what the user wants than trying to vectorize the errno handling.
Vector Function ABI specifications for AArch64 update
Arm released an update (2019Q1.1) of the Vector Function ABI specifications for AArch64 that uses the `declare variant` directive from OpenMP 5.0 to support user defined vector functions. The mechanism is introduced in chapter 4, and it is in beta release status to allow feedback from the open source community. The mechanism also allows declaring SVE and AdvSIMD vector functions independently which is not possible with the current OpenMP and attribute(simd) support in gcc. Feedback needs to be provided at arm.eabi (at) arm.com by end of June 16th (AOE). https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi Thanks.
Re: aarch64 TLS optimizations?
On 17/05/2019 14:51, Tom Horsley wrote: > I'm trying (for reason too complex to go into) to > locate the TLS offset of the tcache_shutting_down > variable from malloc in the ubuntu provided > glibc on aarch64 ubuntu 18.04. > > Various "normal" TLS variables appear to operate > much like x86_64 with a GOT table entry where the > TLS offset of the variable gets stashed. this is more of a glibc question than a gcc one (i.e. libc-help list would be better). tls in glibc uses the initial-exec tls access model, (tls object is at a fixed offset from tp across threads), that requires a GOT entry for the offset which is set up via a R_*_TPREL dynamic reloc at startup time. (note: if a symbol is internal to the module its TPREL reloc is not tied to a symbol, it only has an addend for the offset within the module) > But in the ubuntu glibc there is no GOT entry for > that variable, and disassembly of the code shows > that it seems to "just know" the offset to use. i see adrp+ldr sequences that access GOT entries. e.g. in the objdump of libc.so.6: 000771d0 <__libc_malloc@@GLIBC_2.17>: ... 77400: f6c0adrpx0, 152000 77404: f9470c00ldr x0, [x0, #3608] 77408: d53bd041mrs x1, tpidr_el0 you can verify that 0x152000 + 3608 == 0x152e18 is indeed a GOT entry (falls into .got) and there is a 00152e18 R_AARCH64_TLS_TPREL64 *ABS*+0x0010 dynamic relocation for that entry as expected. (but i don't know which symbol this entry is for, only that the symbol must be a local tls sym) > Is there some kind of magic TLS optimization that > can happen for certain variables on aarch64? I'm trying > to understand how it could know the offset like > it appears to do in the code. there is no magic.
Re: aarch64 TLS optimizations?
On 20/05/2019 16:59, Tom Horsley wrote: > On Mon, 20 May 2019 15:43:53 + > Szabolcs Nagy wrote: > >> you can verify that 0x152000 + 3608 == 0x152e18 is >> indeed a GOT entry (falls into .got) and there is a >> >> 00152e18 R_AARCH64_TLS_TPREL64 *ABS*+0x0010 > > There are a couple of other TLS variables in malloc, and I > suspect this is one of them, where it is actually looking > at tcache_shutting_down (verified with debug info and disassembly), > it is simply using the tpidr_el0 value still laying around > in the register from the 1st TLS reference and loading > tcache_shutting_down from an offset which appears for all the > world to simply be hard coded, no GOT reference involved. > > I suppose at some point I'll be forced to understand how to build > glibc from the ubuntu source package so I can see exactly > what options and ifdefs are used and check the relocations in > the malloc.o file from before it is incorporated with libc.so in my build of malloc.os in glibc in the symtab i see 84: 0 TLS LOCAL DEFAULT 10 .LANCHOR3 85: 8 TLS LOCAL DEFAULT 10 thread_arena 86: 0008 8 TLS LOCAL DEFAULT 10 tcache 87: 0010 1 TLS LOCAL DEFAULT 10 tcache_shutting_down and the R_*_TLSIE_* relocs are for .LANCHOR3 + 0, so there will be one GOT entry for the 3 objects and you should see tp + got_value + (0 or 8 or 16) address computation to access the 3 objects. e.g. in __malloc_arena_thread_freeres i see 4e04: d53bd056mrs x22, tpidr_el0 4e08: 9015adrpx21, 0 <_dl_tunable_set_mmap_threshold> 4e08: R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21 .LANCHOR3 4e0c: f94002b5ldr x21, [x21] 4e0c: R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC .LANCHOR3 4e10: a90153f3stp x19, x20, [sp, #16] 4e14: 8b1502c0add x0, x22, x21 // x0 = tp + got_value 4e18: f9400414ldr x20, [x0, #8] // read from tcache 4e1c: f9001bf7str x23, [sp, #48] 4e20: b4000234cbz x20, 4e64 <__malloc_arena_thread_freeres+0x6c> 4e24: 52800021mov w1, #0x1// #1 4e28: 91010293add x19, x20, #0x40 4e2c: 91090297add x23, x20, #0x240 4e30: f900041fstr xzr, [x0, #8] // write to tcache 4e34: 39004001strbw1, [x0, #16] // write to tchace_shutting_down i doubt ubuntu changed this, but if the offset is a fixed const in the binary that means they moved that variable into the glibc internal pthread struct (which is at a fixed offset from tp).
[AArch64 ELF ABI] Vector calls and lazy binding on AArch64
The lazy binding code of aarch64 currently only preserves q0-q7 of the fp registers, but for an SVE call [AAPCS64+SVE] it should preserve p0-p3 and z0-z23, and for an AdvSIMD vector call [VABI64] it should preserve q0-q23. (Vector calls are extensions of the base PCS [AAPCS64].) A possible fix is to save and restore the additional register state in the lazy binding entry code, this was discussed in https://sourceware.org/ml/libc-alpha/2018-08/msg00017.html the main objections were (1) Linux may optimize the kernel entry code for processes that don't use SVE, so lazy binding should avoid accessing SVE registers. (2) If this is fixed in the dynamic linker, vector calls will not be backward compatible with old glibc. (3) The saved SVE register state can be large (> 8K), so binaries that work today may run out of stack space on an SVE system during lazy binding (which can e.g. happen in a signal handler on a tiny stack). and the proposed solution was to force bind now semantics for vector functions e.g. by not calling them via PLT. This turned out to be harder than I expected. I no longer think (1) and (2) are critically important, but (3) is a correctness issue which is hard to argue away (would require larger stack allocations to accommodate the worst case stack size increase, but the stack allocation is not always under the control of glibc, so it cannot provide strict guarantees). Some approaches to make symbols "bind now" were discussed at https://groups.google.com/forum/#!topic/generic-abi/Bfb2CwX-u4M The ABI change draft is below the notes, it requires marking symbols in the ELF symbol table that follow the vector PCS (or other variant PCS conventions). This is most relevant to dynamic linkers with lazy binding support and to ELF linkers targeting AArch64, but assemblers will need to be updated too. Note 1: the dynamic linker may have to run user code during lazy binding because of ifunc resolvers, so it cannot avoid clobbering fp regs. Note 2: the tlsdesc entry is also affected by (3), so either the the initial DTV setup should avoid clobbering fp regs or the SVE register state should not be callee-preserved by the tlsdesc call ABI (the latter was chosen, which is backward compatible with old dynamic linkers, but tls access from SVE code is as expensive as an extern call now: the caller has to spill). Note 3: signal frame and SVE register spills in code using SVE can also lead to variable stack usage (AT_MINSIGSZTKSZ was introduced to address the former issue on linux) so it is a valid approach to just increase min stack size limits on aarch64 compared to other targets (this is less invasive, but does not fix old binaries). Note 4: the proposal requires marking symbols in asm and elf objects, so it is not compatible with existing tooling (old as or ld cannot create valid vector function symbol references or definitions) and it is only effective with a new dynamic linker. Note 5: -fno-plt style code generation for vector function calls might have worked too, but on aarch64 it requires compiler and linker changes to avoid PLT in position dependent code when that is emitted for the sake of pointer equality. It also requires tightening the ABI to ensure the static linker does not introduce PLT when processing certain static relocations. This approach would generate suboptimal static linked code (the no-plt code is hard to relax into direct calls on aarch64) fragile (easy to accidentally introduce a PLT) and hard to diagnose. Note 6: the proposed solution applies to both SVE calls and AdvSIMD vector calls, even though some issues only apply to SVE. Note 7: a separate dynamic linker entry point for variant PCS calls may be introduced (requires further ELF changes for a PLT0 like stub) or the dynamic linker may decide to always preserve all registers or decide to always bind symbols at load time. AAELF64: in the Symbol Table section add st_other Values The st_other member of a symbol table entry specifies the symbol's visibility in the lowest 2 bits. The top 6 bits are unused in the generic ELF ABI [SCO-ELF], and while there are no values reserved for processor-specific semantics, many other architectures have used these bits. The defined processor-specific st_other flag values are listed in Table 4-5-1. Table 4-5-1, Processor specific st_other flags ++--+-+ |Name| Mask | Comment | ++--+-+ |STO_AARCH64_VARIANT_PCS | 0x80 | Thefunction | || | associated with the | || | symbol may follow a | || | variant procedure | || | call standard with | |
Re: [AArch64 ELF ABI] Vector calls and lazy binding on AArch64
On 22/05/2019 16:06, Florian Weimer wrote: > * Szabolcs Nagy: > >> AAELF64: in the Symbol Table section add >> >> st_other Values >> The st_other member of a symbol table entry specifies the symbol's >> visibility in the lowest 2 bits. The top 6 bits are unused in the >> generic ELF ABI [SCO-ELF], and while there are no values reserved for >> processor-specific semantics, many other architectures have used these >> bits. >> >> The defined processor-specific st_other flag values are listed in >> Table 4-5-1. >> >> Table 4-5-1, Processor specific st_other flags >> ++--+-+ >> |Name| Mask | Comment | >> ++--+-+ >> |STO_AARCH64_VARIANT_PCS | 0x80 | Thefunction | >> || | associated with the | >> || | symbol may follow a | >> || | variant procedure | >> || | call standard with | >> || | different register | >> || | usage convention. | >> ++--+-+ >> >> A symbol table entry that is marked with the STO_AARCH64_VARIANT_PCS >> flag set in its st_other field may be associated with a function that >> follows a variant procedure call standard with different register >> usage convention from the one defined in the base procedure call >> standard for the list of argument, caller-saved and callee-saved >> registers [AAPCS64]. The rules in the Call and Jump relocations >> section still apply to such functions, and if a subroutine is called >> via a symbol reference that is marked with STO_AARCH64_VARIANT_PCS >> then code that runs between the calling routine and called subroutine >> must preserve the contents of all registers except IP0, IP1 and the >> condition code flags [AAPCS64]. > > Can you clarify if there has to be a valid stack at this point which can > be used during the call transfer? What about the stack alignment > requirement? the intention is to only allow 'register usage convention' to be relaxed compared to the base PCS (which has rules for stack etc), and even the register usage convention has to be compatible with the 'Call and Jump relocations section' which essentially says that veneers inserted by the linker between calls can clobber IP0, IP1 and the condition flags. i.e. a variant pcs function follows the same rules as base pcs, but it may use different caller-/callee-saved/argument regiseters. when SVE pcs is merged into the current AAPCS document, then i hope the 'variant pcs' term used here will be properly specified so the ELF ABI will just refer back to that.
Re: [AArch64 ELF ABI] Vector calls and lazy binding on AArch64
On 22/05/2019 16:34, Florian Weimer wrote: > * Szabolcs Nagy: > >> On 22/05/2019 16:06, Florian Weimer wrote: >>> * Szabolcs Nagy: >>> >>>> AAELF64: in the Symbol Table section add >>>> >>>> st_other Values >>>> The st_other member of a symbol table entry specifies the symbol's >>>> visibility in the lowest 2 bits. The top 6 bits are unused in the >>>> generic ELF ABI [SCO-ELF], and while there are no values reserved for >>>> processor-specific semantics, many other architectures have used these >>>> bits. >>>> >>>> The defined processor-specific st_other flag values are listed in >>>> Table 4-5-1. >>>> >>>> Table 4-5-1, Processor specific st_other flags >>>> ++--+-+ >>>> |Name| Mask | Comment | >>>> ++--+-+ >>>> |STO_AARCH64_VARIANT_PCS | 0x80 | Thefunction | >>>> || | associated with the | >>>> || | symbol may follow a | >>>> || | variant procedure | >>>> || | call standard with | >>>> || | different register | >>>> || | usage convention. | >>>> ++--+-+ >>>> >>>> A symbol table entry that is marked with the STO_AARCH64_VARIANT_PCS >>>> flag set in its st_other field may be associated with a function that >>>> follows a variant procedure call standard with different register >>>> usage convention from the one defined in the base procedure call >>>> standard for the list of argument, caller-saved and callee-saved >>>> registers [AAPCS64]. The rules in the Call and Jump relocations >>>> section still apply to such functions, and if a subroutine is called >>>> via a symbol reference that is marked with STO_AARCH64_VARIANT_PCS >>>> then code that runs between the calling routine and called subroutine >>>> must preserve the contents of all registers except IP0, IP1 and the >>>> condition code flags [AAPCS64]. >>> >>> Can you clarify if there has to be a valid stack at this point which can >>> be used during the call transfer? What about the stack alignment >>> requirement? >> >> the intention is to only allow 'register usage convention' to be >> relaxed compared to the base PCS (which has rules for stack etc), >> and even the register usage convention has to be compatible with >> the 'Call and Jump relocations section' which essentially says that >> veneers inserted by the linker between calls can clobber IP0, IP1 >> and the condition flags. >> >> i.e. a variant pcs function follows the same rules as base pcs, but >> it may use different caller-/callee-saved/argument regiseters. >> >> when SVE pcs is merged into the current AAPCS document, then i hope >> the 'variant pcs' term used here will be properly specified so the >> ELF ABI will just refer back to that. > > My concern is that with the current language, it's not clear whether > it's possible to use the stack as a scratch area during the call > transition, or rely on a valid TCB. I think this is rather > underspecified. i think that's underspecified in general for normal calls too, currently the glibc dynamic linker assumes it can use some stack space and do various async signal safe operations (some of which may even fail), variant pcs does not change any of this. it only provides a per symbol escape hatch for functions with a bit special call convention, and i plan to use the symbol marking in glibc as 'force bind now for these symbols', because other behaviour may not be forward compatible if the architecture changes again (if lazy binding turns out to be very important for these symbols i'd prefer introducing a second entry point for them instead of checking the elf flags from the entry asm). i'll try to post patches implementing this abi soon.
Re: [AArch64 ELF ABI] Vector calls and lazy binding on AArch64
On 22/05/2019 15:42, Szabolcs Nagy wrote: > [AAELF64]: ELF for the Arm 64-bit Architecture (AArch64) >https://developer.arm.com/docs/ihi0056/latest > [VABI64]: Vector Function ABI Specification for AArch64 > > https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi the new ABI has been published with minor wording changes compared to the draft version. the ABI is implemented in gcc, binutils and glibc in a series of patches listed below. gcc: commit 779640c76d37b32f4d8a7b97637ed9e345d750b4 Commit: nsz CommitDate: 2019-06-03 13:50:53 + aarch64: emit .variant_pcs for aarch64_vector_pcs symbol references git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271869 138bc75d-0d04-0410-961f-82ee72b054a4 commit d403a7711c2cf9a7a4892d76b875a1c99a690f89 Commit: nsz CommitDate: 2019-06-04 16:16:52 + aarch64: fix asm visibility for extern symbols git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271913 138bc75d-0d04-0410-961f-82ee72b054a4 commit 042371f341a956de8c76557df700ebdc1af9ab4f Commit: nsz CommitDate: 2019-06-18 11:11:07 + aarch64: fix gcc.target/aarch64/pcs_attribute-2.c on non-gnu targets git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272414 138bc75d-0d04-0410-961f-82ee72b054a4 binutils: commit 2301ed1c9af1316b4bad3747d2b03f7d44940f87 Commit: Szabolcs Nagy CommitDate: 2019-05-24 15:05:57 +0100 aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS commit f166ae0188dcb89c5ae925034260a708a254ab2f Commit: Szabolcs Nagy CommitDate: 2019-05-24 15:07:42 +0100 aarch64: handle .variant_pcs directive in gas commit 0b4eac57c44ec4c9e13f5201b40936c3b3e6c639 Commit: Szabolcs Nagy CommitDate: 2019-05-24 15:09:06 +0100 aarch64: override default elf .set handling in gas commit 823710d5856996d1f54f04ecb2f7647aeae99b5b Commit: Szabolcs Nagy CommitDate: 2019-05-24 15:11:00 +0100 aarch64: handle STO_AARCH64_VARIANT_PCS in bfd commit 65f381e729bedb933f3e1376e7f53f0ff63ac9a8 Commit: Szabolcs Nagy CommitDate: 2019-05-28 12:03:51 +0100 aarch64: fix variant_pcs ld tests glibc: commit 55f82d328d2dd1c7c13c1992f4b9bf9c95b57551 Commit: Szabolcs Nagy CommitDate: 2019-06-13 09:44:44 +0100 aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS commit 82bc69c012838a381c4167c156a06f4598f34227 Commit: Szabolcs Nagy CommitDate: 2019-06-13 09:45:00 +0100 aarch64: handle STO_AARCH64_VARIANT_PCS
Re: Implicit function declarations and GCC 10
On 04/07/2019 12:27, Florian Weimer wrote: > Implicit function declarations were removed from C99, more than twenty > years ago. So far, GCC only warns about them because there were too > many old configure scripts where an error would lead to incorrect > configure check failures. > > I can try to fix the remaining configure scripts in Fedora and submit > the required changes during this summer and fall. > > I would appreciate if GCC 10 refused to declare functions implicitly by > default. +1 > > According to my observations, lack of an error diagnostic has turned > into a major usability issue. For bugs related to pointer truncation, > we could perhaps change the C front end to produce a hard error if an > int value returned from an implicitly declared function is converted to > a pointer. But the other case involves functions defined as returning > _Bool, and the result is used in a boolean context. The x86-64 ABI only > requires that the lowest 8 bits of the return value are defined, so an > implicit int results in int values which incorrectly compare as inqueal > to zero. > > Given that the pointer truncation issue is only slightly more common, > than the _Bool issue, I don't think the diagnostic improvement for > pointers would be very helpful, and we should just transition to errors. > > Implicit int we should remove as well. Checking configure scripts for > both issues at the same time would not be much more work. +1 for making implicit int an error by default. > > Thanks, > Florian >
Re: PPC64 libmvec implementation of sincos
On 27/09/2019 20:23, GT wrote: > I am attempting to create a vector version of sincos for PPC64. > The relevant discussion thread is on the GLIBC libc-alpha mailing list. > Navigate it beginning at > https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html > > The intention is to reuse as much as possible from the existing GCC > implementation of other libmvec functions. > My questions are: Which function(s) in GCC; > > 1. Gather scalar function input arguments, from multiple loop iterations, > into a single vector input argument for the vector function version? > 2. Distribute scalar function outputs, to appropriate loop iteration result, > from the single vector function output result? > > I am referring especially to vectorization of sin and cos. i wonder if gcc can auto-vectorize scalar sincos calls, the vectorizer seems to want the calls to have no side-effect, but attribute pure or const is not appropriate for sincos (which has no return value but takes writable pointer args) "#pragma omp simd" on a loop seems to work but i could not get unannotated sincos loops to vectorize. it seems it would be nice if we could add pure/const somehow (maybe to the simd variant only? afaik openmp requires no sideeffects for simd variants, but that's probably only for explicitly marked loops?)
Re: PPC64 libmvec implementation of sincos
On 30/09/2019 18:30, GT wrote: > ‐‐‐ Original Message ‐‐‐ > On Monday, September 30, 2019 9:52 AM, Szabolcs Nagy > wrote: > >> On 27/09/2019 20:23, GT wrote: >> >>> I am attempting to create a vector version of sincos for PPC64. >>> The relevant discussion thread is on the GLIBC libc-alpha mailing list. >>> Navigate it beginning at >>> https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html >>> The intention is to reuse as much as possible from the existing GCC >>> implementation of other libmvec functions. >>> My questions are: Which function(s) in GCC; >>> >>> 1. Gather scalar function input arguments, from multiple loop iterations, >>> into a single vector input argument for the vector function version? >>> 2. Distribute scalar function outputs, to appropriate loop iteration >>> result, from the single vector function output result? >>> >>> I am referring especially to vectorization of sin and cos. >> >> i wonder if gcc can auto-vectorize scalar sincos >> calls, the vectorizer seems to want the calls to >> have no side-effect, but attribute pure or const >> is not appropriate for sincos (which has no return >> value but takes writable pointer args) > > 1. Do you mean whether x86_64 already does auto-vectorize sincos? any current target with simd attribute or omp delcare simd support. > 2. Where in the code do you see the vectorizer require no side-effect? i don't know where it is in the code, but __attribute__((simd)) float foo (float); void bar (float *restrict a, float *restrict b) { for(int i=0; i<4000; i++) a[i] = foo (b[i]); } is not vectorized, however it gets vectorized if i add __attribute__((const)) to foo OR if i add '#pragma omp simd' to the loop and compile with -fopenmp-simd. (which makes sense to me: you don't want to vectorize if you don't know the side-effects, otoh, there is no attribute to say that i know there will be no side-effects in functions taking pointer arguments so i don't see how sincos can get vectorized) >> "#pragma omp simd" on a loop seems to work but i >> could not get unannotated sincos loops to vectorize. >> >> it seems it would be nice if we could add pure/const >> somehow (maybe to the simd variant only? afaik openmp >> requires no sideeffects for simd variants, but that's >> probably only for explicitly marked loops?) > > 1. Example 1 and Example 2 at https://sourceware.org/glibc/wiki/libmvec show > the 2 different > ways to activate auto-vectorization. When you refer to "unannotated sincos", > which of > the 2 techniques do you mean? example 1 annotates the loop with #pragma omp simd. (and requires -fopenmp-simd cflag to work) example 2 is my goal where -ftree-vectorize is enough without annotation. > 2. Which function was auto-vectorized by "pragma omp simd" in the loop? see example above.
Re: Commit messages and the move to git
On 19/11/2019 23:44, Joseph Myers wrote: > I do think "Related to PR N (description)" or similar is a good > summary line to insert where the present summary line is just a ChangeLog > date/author line. i agree.
Re: -fpatchable-function-entry should set SHF_WRITE and create one __patchable_function_entries per function
On 07/01/2020 07:25, Fangrui Song wrote: > On 2020-01-06, Fangrui Song wrote: >> The addresses of NOPs are collected in a section named >> __patchable_function_entries. >> A __patchable_function_entries entry is relocated by a symbolic relocation >> (e.g. R_X86_64_64, R_AARCH64_ABS64, R_PPC64_ADDR64). >> In -shared or -pie mode, the linker will create a dynamic relocation >> (non-preemptible: relative relocation (e.g. R_X86_64_RELATIVE); >> preemptible: symbolic relocation (e.g. R_X86_64_64)). >> >> In either case, the section contents will be modified at runtime. >> Thus, the section should have the SHF_WRITE flag to avoid text relocations >> (DF_TEXTREL). pie/pic should either imply writable __patchable_function_entries, or __patchable_function_entries should be documented to be offsets from some base address in the module: the users of it have to modify .text and do lowlevel hacks so they should be able to handle such arithmetics. i think it's worth opening a gcc bug report. >> When -ffunction-sections is used, ideally GCC should emit one >> __patchable_function_entries (SHF_LINK_ORDER) per .text.foo . >> If the corresponding .text.foo is discarded (--gc-sections, COMDAT, >> /DISCARD/), the linker can discard the associated >> __patchable_function_entries. This can be seen as a lightweight COMDAT >> section group. (A section group adds an extra section and costs 3 words) >> Currently lld (LLVM linker) has implemented such SHF_LINK_ORDER collecting >> features. GNU ld and gold don't have the features. >> >> I have summarized the feature requests in this post >> https://sourceware.org/ml/binutils/2019-11/msg00266.html >> >> gcc -fpatchable-function-entry=2 -ffunction-sections -c a.c >> >> [ 4] .text.f0 PROGBITS 40 09 00 >> AX 0 0 1 >> ### No W flag >> ### One __patchable_function_entries instead of 3. >> [ 5] __patchable_function_entries PROGBITS 49 >> 18 00 A 0 0 1 >> [ 6] .rela__patchable_function_entries RELA >> 000280 48 18 I 13 5 8 >> [ 7] .text.f1 PROGBITS 61 09 00 >> AX 0 0 1 >> [ 8] .text.f2 PROGBITS 6a 09 00 >> AX 0 0 1 > > Emitting a __patchable_function_entries for each function may waste > object file sizes (64 bytes per function on ELF64). If zeros are > allowed, emitting a single __patchable_function_entries should be fine. > > If we do want to emit unique sections, the condition should be either > -ffunction-sections or COMDAT is used. again it's worth raising a gcc bug i think. there is another known issue: aarch64 -mbranch-protect=bti (and presumably x86_64 -fcf-protection=branch) has to add landing pad at the begining of each indirectly called function so the patchable nops can only come after that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 no matter how this gets resolved i think this will require documentation changes too.
Re: Successful bootstrap and install of gcc (GCC) 6.3.0 on aarch64-unknown-linux-gnu
On 25/01/17 19:02, Aaro Koskinen wrote: > Configured with: ../gcc-6.3.0/configure --with-arch=armv8-a+crc > --with-cpu=cortex-a53 --disable-multilib --disable-nls > --prefix=/home/aaro/gcctest/newcompiler --enable-languages=c,c++ > --host=aarch64-unknown-linux-gnu --build=aarch64-unknown-linux-gnu > --target=aarch64-unknown-linux-gnu --with-system-zlib --with-sysroot=/ > host: raspberrypi-3 > distro: los.git rootfs=96c66f native=96c66f > kernel: Linux 4.9.0-rpi3-los_8e2f1c > binutils: GNU binutils 2.27 > make: GNU Make 4.2.1 > libc: GNU C Library (GNU libc) stable release version 2.24 > zlib: 1.2.8 > mpfr: 3.1.3 > gmp:6 ... > processor : 0 > BogoMIPS : 38.40 > Features : fp asimd evtstrm crc32 > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part : 0xd03 > CPU revision : 4 this seems to be an r0p4 revision of cortex-a53, if you use your toolchain to build binaries that are potentially executed on such hw then i think the safe way is to configure gcc with --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 since it may not be easy to tell what software is affected on a case by case basis (there are flags to turn these on/off at compile time if you want to go that way).
Re: [contribution] C11 threads implementation for Unix and Windows environments
On 20/02/17 07:49, Sebastian Huber wrote: > Hello Gokan, > > you may have a look at: > > https://svnweb.freebsd.org/base/head/lib/libstdthreads/ note that despite the looks this is non-portable and non-conforming implementation, it is way better than the posted github code, but it can still clobber errno, leak resources (and introduces cancellation points which may or may not be conforming depending how posix will integrate c11) as far as i'm aware the only c11 conforming open source implementation is the one in musl libc (and that's not portable to other libcs either).
Re: RFC: Add ___tls_get_addr
On 05/07/17 16:38, H.J. Lu wrote: > On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by > GCCs older than GCC 4.9.4: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 > > continue to work even if vector instructions are used by functions called > from __tls_get_addr, which assumes 16-byte stack alignment as specified > by x86-64 psABI. > > We are considering to add an alternative interface, ___tls_get_addr, to > glibc, which doesn't realign stack. Compilers, which properly align stack > for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, > if ___tls_get_addr is available. > > Any comments? > > what happens when new compiler generating the new symbol is used with old glibc?
Re: RFC: Add ___tls_get_addr
On 05/07/17 17:18, H.J. Lu wrote: > On Wed, Jul 5, 2017 at 8:53 AM, Szabolcs Nagy wrote: >> On 05/07/17 16:38, H.J. Lu wrote: >>> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by >>> GCCs older than GCC 4.9.4: >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 >>> >>> continue to work even if vector instructions are used by functions called >>> from __tls_get_addr, which assumes 16-byte stack alignment as specified >>> by x86-64 psABI. >>> >>> We are considering to add an alternative interface, ___tls_get_addr, to >>> glibc, which doesn't realign stack. Compilers, which properly align stack >>> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr, >>> if ___tls_get_addr is available. >>> >>> Any comments? >>> >>> >> >> what happens when new compiler generating the new symbol >> is used with old glibc? >> > > Compiler shouldn't do that. > i don't see how can the compiler not do that e.g. somebody building an old glibc from source with new compiler, then runs the tests, all tls tests would fail because the compiler generated the new symbol. or users interposing __tls_get_addr (asan) need to update their code. or there are cases when libraries built against one libc is used with another (e.g. musl can mostly use a libstdc++ compiled against glibc on x86_64) i think introducing new libc<->compiler abi should be done conservatively when it is really necessary and from Rich's mail it seems there is no need for new abi here.
Re: [Bug web/?????] New: Fwd: failure notice: Bugzilla down.
On 15/08/17 04:10, Martin Sebor wrote: > On 08/14/2017 04:22 PM, Eric Gallager wrote: >> I'm emailing this manually to the list because Bugzilla is down and I >> can't file a bug on Bugzilla about Bugzilla being down. The error >> message looks like this: > > Bugzilla and the rest of gcc.gnu.org have been down much of > the afternoon/evening due to a hard drive upgrade (the old disk > apparently failed). You're not the only one who found out about > it the hard way. I (and I suspect most others) also discovered > it when things like Git and SVN (and Bugzilla) stopped working. > > I've CC'd the gcc list to let others know (not sure what list > to subscribe to in order to get a heads up on these kinds of > maintenance issues). > i seems the database got corrupted. at least one of my bugs was overwritten by another: original 81846: https://gcc.gnu.org/ml/gcc-bugs/2017-08/msg01574.html current 81846: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81846 similarly there are two bugs on the bug mailing list for 81845 and later bugs are missing.
Re: Behaviour of __forced_unwind with noexcept
On 15/08/17 16:21, Ron wrote: > On Tue, Aug 15, 2017 at 01:31:10PM +0200, Richard Biener wrote: >> On Tue, Aug 15, 2017 at 1:28 PM, Jonathan Wakely >> wrote: >>> On 15 August 2017 at 11:24, Richard Biener >>> wrote: On Tue, Aug 15, 2017 at 6:44 AM, Ron wrote: > On Mon, Aug 14, 2017 at 06:22:39PM +0100, Jonathan Wakely wrote: >> On 13 August 2017 at 19:20, Ron wrote: >>> >>> Hi, >>> >>> I'm looking for some clarification of how the __forced_unwind thread >>> cancellation exceptions intersect with noexcept. I've long been a >>> big fan of the __forced_unwind idiom, but now that C++14 is the default >>> since GCC 6.1, and many methods including destructors are implicitly >>> noexcept, using it safely appears to have become a lot more tricky. >>> >>> The closest I've found so far to an "authoritative" statement of the >>> expected behaviour is the comments from Jonathan Wakely here: >>> >>> https://stackoverflow.com/questions/14268080/cancelling-a-thread-that-has-a-mutex-locked-does-not-unlock-the-mutex >>> >>> In particular: "It interacts with noexcept as you'd expect: >>> std::terminate() is called if a __forced_unwind escapes a noexcept >>> function, so noexcept functions are really noexcept, they won't >>> unexpectedly throw some 'special' type" >>> >>> Which does seem logical, but unless I'm missing something this makes >>> it unsafe to perform any operation in a destructor which might cross >>> a cancellation point, unless that destructor is noexcept(false). >> >> Unfortunately I still think that's true. >> >> This was also raised in >> https://gcc.gnu.org/ml/gcc-help/2015-08/msg00040.html > > Ouch. Had you considered the option of having any scope that is > noexcept(true) also be treated as if it was implicitly in a scoped > pthread_setcancelstate(PTHREAD_CANCEL_DISABLE), restoring the > old state when it leaves that scope? > > Would it be feasible for the compiler to automatically generate that? > > For any toolchain which does use the unwinding exceptions extension, > that also seems like a logical extension to the noexcept behaviour, > since allowing cancellation will otherwise result in an exception and > process termination. If people really need cancellation in such > scopes, then they can more manageably mark just those noexcept(false). > > > It would need to be done by the compiler, since in user code I can't > do that in a destructor in a way that will also protect unwinding > members of a class (which may have destructors in code I don't > control). > > I can't even completely mitigate this by just always using -std=c++03 > because presumably I'm also exposed to (at least) libstdc++.so being > built with the new compiler default of C++14 or later. > > > I'd be really sad to lose the stack unwinding we currently have when > a thread is cancelled. I've always known it was an extension (and I'm > still a bit surprised it hasn't become part of the official standard), > but it is fairly portable in practice. > > On Linux (or on Debian at least) clang also supports it. It's also > supported by gcc on FreeBSD and MacOS (though not by clang there). > It's supported by mingw for Windows builds. OpenBSD is currently > the only platform I know of where even its gcc toolchain doesn't > support this (but they're also missing support for standard locale > functionality so it's a special snowflake anyway). > > > It seems that we need to find some way past the status-quo though, > because "don't ever use pthread_cancel" is the same as saying that > there's no longer any use for the forced_unwind extension. Or that > "you can have a pthread_cancel which leaks resources, or none at all". > > Having a pthread_cancel that only works on cancellation points that > aren't noexcept seems like a reasonable compromise and extension to > the shortcomings of the standard to me. Am I missing something there > which makes that solution not a viable option either? Have glibc override the abort () from the forced_unwind if in pthread_cancel context? >>> >>> If the forced_unwind exception escapes a noexcept function then the >>> compiler calls std::terminate(). That can be replaced by the user so >>> that it doesn't call abort(). It must not return, but a user-supplied >>> terminate handler could trap or raise SIGKILL or something else. >>> >>> Required behavior: A terminate_handler shall terminate execution of >>> the program without returning >>> to the caller. >>> Default behavior: The implementation’s default terminate_handler calls >>> abort(). >>> >>> I don't think glibc can help, I think the compiler would need to >>> change to not call std::terminate(). >> >> Maybe it could call
Re: Behaviour of __forced_unwind with noexcept
On 15/08/17 16:47, Richard Biener wrote: > On Tue, Aug 15, 2017 at 5:21 PM, Ron wrote: >> Is changing the cancellation state really an expensive operation? >> Moreso than the checking which I assume already needs to be done for >> noexcept to trap errant exceptions? > > The noexcept checking only needs to happen if an exception is thrown > while the pthread cancel state needs to be adjusted whenever we are > about to enter/exit such function. > >> If it really is, I guess we could also have an attribute which declares >> a stronger guarantee than noexcept, to claim there are no cancellation >> points in that scope, if people have something in a hot path where a few >> cycles really matter to them and this protection is not actually needed. >> Which could also be an automatic optimisation if the compiler is able to >> prove there are no cancellation points? > > I guess that's possible. > > I suppose prototyping this would be wrapping all noexcept calls in > > try { pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, &old); call > (); } finally { pthread_setcancelstate (old, &old); } > i think changing the state this way is only valid if call itself does not change the state, which we don't know.
Re: [Bug web/?????] New: Fwd: failure notice: Bugzilla down.
On 16/08/17 18:38, Joseph Myers wrote: > On Wed, 16 Aug 2017, Eric Gallager wrote: >> I see Richi redid all his 7.2 release changes; does that imply that >> the server restore is now complete? > > No, there's still a search process ongoing to identify corrupted or > missing files by comparison with the last backup. > > My expectation is that all the other Bugzilla changes from 13 and 14 > August UTC need redoing manually (recreating bugs with new numbers in the > case of new bugs filed during that period, if those bugs are still > relevant, repeating comments, etc. - and possibly recreating accounts for > people who created accounts and filed bugs during that period). But I > haven't seen any official announcement from overseers to all affected > projects (for both GCC and Sourceware Bugzillas) yet. > can i resubmit my lost bug reports now?
libmvec simd math functions in fortran
is there a way to get vectorized math functions in fortran? in c code there is attribute simd declarations or openmp declare simd pragma to tell the compiler which functions have simd variant, but i see no such thing in fortran. some targets have -mveclibabi=type which allows vectorizing a set of math functions, but this does not support the libmvec abi of glibc.
Re: libmvec simd math functions in fortran
On 01/11/17 16:26, Jakub Jelinek wrote: > On Wed, Nov 01, 2017 at 04:23:11PM +0000, Szabolcs Nagy wrote: >> is there a way to get vectorized math functions in fortran? >> >> in c code there is attribute simd declarations or openmp >> declare simd pragma to tell the compiler which functions >> have simd variant, but i see no such thing in fortran. > > !$omp declare simd should work fine in fortran (with -fopenmp > or -fopenmp-simd). > 1) i don't want to change the fortran. 2) it does not work for me. i want this to call vector powf in libmvec: subroutine foo(a,b,c) real(4) a(8000),b(8000),c(8000) do j=1,8000 a(j)=b(j)**c(j) end do end where do i put !$omp declare simd (powf) ?
Re: libmvec simd math functions in fortran
On 01/11/17 16:47, Szabolcs Nagy wrote: > On 01/11/17 16:26, Jakub Jelinek wrote: >> On Wed, Nov 01, 2017 at 04:23:11PM +, Szabolcs Nagy wrote: >>> is there a way to get vectorized math functions in fortran? >>> >>> in c code there is attribute simd declarations or openmp >>> declare simd pragma to tell the compiler which functions >>> have simd variant, but i see no such thing in fortran. >> >> !$omp declare simd should work fine in fortran (with -fopenmp >> or -fopenmp-simd). >> > > 1) i don't want to change the fortran. > > 2) it does not work for me. > > i want this to call vector powf in libmvec: > > subroutine foo(a,b,c) > real(4) a(8000),b(8000),c(8000) > do j=1,8000 > a(j)=b(j)**c(j) > end do > end > > where do i put > > !$omp declare simd (powf) > > ? to answer my question.. it seems fortran cannot express the type signature of mathematical functions because arguments are passed by reference. so there is no way to declare math interfaces and then add omp declare simd to them to get simd versions. (it's not clear to me how omp declare simd is supposed to work in fortran, but it is not useful for vectorizing loops with math functions.) so gfortran will need a different mechanism to do the vectorization, e.g. an option like -mveclibabi=glibc, but the list of available vector functions need to be specified somewhere.
Re: -static-pie and -static -pie
On 31/01/18 15:44, Cory Fields wrote: After looking at this for quite a while, I'm afraid I'm unsure how to proceed. As of now, static and static-pie are mutually exclusive. So given the GNU_USER_TARGET_STARTFILE_SPEC you pasted earlier, "static" matches before "static-pie", causing the wrong start files. It seems to me that the static-pie target complicates things more than matching against static+pie individually. If I convert -static + -pie to -static-pie, then "static" won't be matched in specs, where maybe it otherwise should. Same for -pie. you can change PIE_SPEC to pie|static-pie and occurrences of static to static|static-pie (and !static: to !static:%{!static-pie: etc.), except where it is used to mean "no-pie static", those should be changed to PIE_SPEC:;static: (and i think --no-dynamic-linker should always be passed to ld in LD_PIE_SPEC for static pie, not just on linux systems and selected targets.) then there should be no difference between -static -pie and -static-pie. (and the new -static-pie flag would be redundant.) this would e.g. break static linking with default pie toolchain on systems where the static libc is not pie or missing the rcrt startup file after upgrading to gcc-8. i'm not sure if this is a good enough reason to introduce the -static-pie mess, however if we don't want to break any previously working configuration then -static-pie has to be different from -static -pie. Would you prefer to swallow -static and -pie and pass along only -static-pie? Or forward them all along, and fix the specs which look for static before static-pie ? Regards, Cory On Tue, Jan 30, 2018 at 2:36 PM, H.J. Lu wrote: On Tue, Jan 30, 2018 at 11:18 AM, Cory Fields wrote: On Tue, Jan 30, 2018 at 2:14 PM, H.J. Lu wrote: On Tue, Jan 30, 2018 at 11:07 AM, Cory Fields wrote: On Tue, Jan 30, 2018 at 1:35 PM, H.J. Lu wrote: On Tue, Jan 30, 2018 at 10:26 AM, Cory Fields wrote: Hi list I'm playing with -static-pie and musl, which seems to be in good shape for 8.0.0. Nice work :) However, the fact that "gcc -static -pie" and "gcc -static-pie" produce different results is very unexpected. I understand the case for the new link-type, but merging the options when possible would be a huge benefit to existing buildsystems that already cope with both individually. My use-case: I'd like to build with --enable-default-pie, and by adding "-static" Why not adding "-static-pie" instead of "-static"? to my builds, produce static-pie binaries. But at the moment, that attempts to add an interp section. So my question is, if no conflicting options are found, why not hoist "-static -pie" to "-static-pie" ? Regards, Cory -- H.J. My build system, and plenty of others I'm sure, already handle -static and -pie. Having that understood to mean "static-pie" would mean that the combination would now just work. Asking a different way, if I request -static and -pie, without -nopie, quietly creating non-pie binary seems like a bug. Is there a reason _not_ to interpret it as -static-pie in that case? GNU_USER_TARGET_STARTFILE_SPEC is defined as #define GNU_USER_TARGET_STARTFILE_SPEC \ "%{shared:; \ pg|p|profile:%{static-pie:grcrt1.o%s;:gcrt1.o%s}; \ static:crt1.o%s; \ static-pie:rcrt1.o%s; \ " PIE_SPEC ":Scrt1.o%s; \ :crt1.o%s} \ crti.o%s \ %{static:crtbeginT.o%s; \ shared|static-pie|" PIE_SPEC ":crtbeginS.o%s; \ :crtbegin.o%s} \ %{fvtable-verify=none:%s; \ fvtable-verify=preinit:vtv_start_preinit.o%s; \ fvtable-verify=std:vtv_start.o%s} \ " CRTOFFLOADBEGIN to pick a suitable crt1.o for static PIE when -static-pie is used. If gcc.c can convert ... -static ... -pie and ... -pie ... -static ... to -static-pic for GNU_USER_TARGET_STARTFILE_SPEC, it should work. -- H.J. Great, that's how I've fixed it locally. Would you consider accepting a patch for this? I'd like to see it in GCC 8. Please open a GCC bug and submit your patch against it. Thanks. -- H.J.
Re: GCC interpretation of C11 atomics (DR 459)
On 26/02/18 04:00, Ruslan Nikolaev via gcc wrote: 1. Not consistent with clang/llvm which completely supports double-width atomics for arm32, arm64, x86 and x86-64 making it possible to write portable code (w/o specific extensions or assembly code) across all these architectures (which is finally possible with C11!) this should be reported as a bug against clang. there is no abi guarantee that double-width atomics will be able to synchronize with code in other modules, you have to introduce a new abi to do this whatever that takes (new elf flag, new dynamic linker name,..). 4. atomic_load can be implemented using read-modify-write as it is the only option for x86-64 and arm64 (see below). no, it can't be. [..] The actual nature of read-only memory and how it can be used are outside the scope of the standard, so there is nothing to prevent atomic_load from being implemented as a read-modify-write operation. rmw load is only valid if the implementation can guarantee that atomic objects are never read-only. current implementations on linux (including clang) don't do that, so an rmw load can observably break conforming c code: a static global const object is placed in .rodata section and thus rmw on it is a crash at runtime contrary to c standard requirements. on an aarch64 machine clang miscompiles this code: $ cat a.c #include static const _Atomic struct S {long i,j;} x; int f(const _Atomic struct S *p) { struct S y = *p; return y.i; } int main() { return f(&x); } $ gcc a.c -latomic $ ./a.out $ clang a.c -latomic $ ./a.out Segmentation fault (core dumped)
Re: GCC interpretation of C11 atomics (DR 459)
On 26/02/18 13:56, Alexander Monakov wrote: On Mon, 26 Feb 2018, Szabolcs Nagy wrote: rmw load is only valid if the implementation can guarantee that atomic objects are never read-only. OK, but that sounds like a matter of not emitting atomic objects into .rodata, which shouldn't be a big problem, if not for backwards compatibility concern? well gcc wants to allow atomic access on non-atomic objects too, otherwise public interfaces may need to change to use the _Atomic qualifier (which is not even valid in c++ so it would cause all sorts of breakage). i think it would be valid to put _Atomic stuff in writable section and then say atomic load is only supported on const objects if it is declared with _Atomic, this would make all strictly conforming c code work as well as most code that ppl write in practice (they probably don't use atomics on global consts). current implementations on linux (including clang) don't do that, so an rmw load can observably break conforming c code: a static global const object is placed in .rodata section and thus rmw on it is a crash at runtime contrary to c standard requirements. Note that in your example GCC emits 'x' as a common symbol, you need '... x = { 0 };' for it to appear in .rodata, i see. static ... x = {0}; and static ... x; are equivalent in c, so if gcc treats them differently that's a gcc weirdness, but does not change the issue that there is no guarantee about readonlyness. on an aarch64 machine clang miscompiles this code: [...] and then with new enough libatomic on Glibc this segfaults with GCC on x86_64 too due to IFUNC redirection mentioned in the other subthread. that's yet another issue, that this is not fully fixed in x86 gcc.
Re: Fw: GCC interpretation of C11 atomics (DR 459)
On 27/02/18 12:56, Ruslan Nikolaev wrote: Formally speaking, either implementation satisfies C11 because the standard allows much leeway in the interpretation here. no, 1) your proposal would make gcc non-conforming to iso c unless it changes how static const objects are emitted. 2) the two implementations are not abi compatible, the choice is already made, changing it is an abi break. 3) Torvald pointed out further considerations such as users expecting lock-free atomic loads to be faster than stores. the solutions is to add a language extension, but that requires careful design.
libmvec in gcc to have vector math in fortran
i had a query earlier about libmvec vector functions in fortran: https://gcc.gnu.org/ml/gcc/2017-11/msg7.html but there were no simple solutions to make math functions vectorizable in fortran, because it's hard to make libc headers with simd attributes visible to the fortran front end. i think a possible workaround is to have a dummy libmvec implementation in libgcc.a (or more likely as a separate libgccmvec.a) that just calls scalar functions from libm like vdouble _ZGVbN2v_sin(vdouble x) { return (vdouble){sin(x[0]), sin(x[1])}; } and similarly for all relevant single and double precision functions for all vector lengths and other supported variants. then gcc knows that there is an implementation for these functions available and with the right link order a better implementation from libmvec can override these dummy implementations. (the cost model cannot assume a faster vector algorithm than the scalar one though) - this allows vectorizing loops with math functions even in fortran, - and on targets without a libmvec implementation (but with a vector abi), - and allows users to provide their own vector math implementation more easily without hacking around glibc math.h (which may not support vector math or only enable it for a small subset of math functions). gcc needs a new cflag and ldflag to enable this. (maybe -mveclibabi= already present in x86 and ppc can be used for this)
Re: libmvec in gcc to have vector math in fortran
On 10/04/18 11:14, Janne Blomqvist wrote: As I mentioned previously in that thread you linked to, the fortran frontend never generates a direct call to libm sin(), or for that matter ZGVbN2v_sin(). Instead it generates a "call" to __builtin_sin(). And similarly for other libm functions that have gcc builtins. The middle-end optimizers are then free to do whatever optimizations they like on that __builtin_sin call, such as constant folding, and at least as far as the fortran frontend is concerned, vectorizing if -mveclibabi= or such is in effect. the generated builtin call is not the issue (same happens in c), the knowledge about libc declarations is. the middle-end has no idea what functions can be vectorized, only the libc knows it and declares this in c headers. this is the problem i'm trying to solve.
Re: libmvec in gcc to have vector math in fortran
On 10/04/18 14:27, Richard Biener wrote: On April 10, 2018 3:06:55 PM GMT+02:00, Jakub Jelinek wrote: On Tue, Apr 10, 2018 at 02:55:43PM +0200, Richard Biener wrote: I wonder if it is possible for glibc to ship a "module" for fortran instead containing the appropriate declarations and gfortran auto-include that (if present). Then we'd run into module binary format changing every release, so hard for glibc to ship that. Another thing is how would we express it in the module, we could just use OpenMP syntax, interface function sin(x) bind(C,name="__builtin_sin") result(res) import !$omp declare simd notinbranch real(c_double) :: res real(c_double),value :: x end function end interface but we'd need to temporarily enable OpenMP while parsing that module. I see Fortran now supports already !GCC$ attributes stdcall, fastcall::test Could we support !GCC$ attributes simd and !GCC$ attributes simd('notinbranch') too? Maybe we can also generate this module in a fixinlclude way? ideally everything should work magically but i think it's good to have a big hammer solution that's easy to reason about. the gcc vectorizer should be testable independently of glibc, and users should be able to specify what can be vectorized. if this is via a per-frontend declaration syntax, then i see implementation and usability issues, while those are sorted out a single flag that requests every function known to gcc to be vectorized sounds to me a viable big hammer solution: easy to implement and enables users to start experimenting with simd math. (the implementation may use a preincluded fortran module internally, but i think it makes sense to have a single flag ui too)
Re: libmvec in gcc to have vector math in fortran
On 15/06/18 08:59, Florian Weimer wrote: * Richard Biener: 'pure' makes it pure but there doesn't seem to be a way to make it const? Does Fortran support setting the rounding mode? yes, but vec math is only enabled with -ffast-math (so it can assume -fno-rounding-math) In C, sin is not const because it depends on the current rounding mode. hm i don't see const in glibc even in case of -ffast-math compilation, i wonder if that can be changed.
Re: How to get GCC on par with ICC?
On 11/06/18 11:05, Martin Jambor wrote: The int rate numbers (running 1 copy only) were not too bad, GCC was only about 2% slower and only 525.x264_r seemed way slower with GCC. The fp rate numbers (again only 1 copy) showed a larger difference, around 20%. 521.wrf_r was more than twice as slow when compiled with GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed significant slowdowns when compiled with GCC vs. ICC. Keep in mind that when discussing FP benchmarks, the used math library can be (almost) as important as the compiler. In the case of 481.wrf, we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) performance is about 70% of ICC's. When we just linked against AMD's libm, we got to 83%. When we instructed GCC to generate calls to Intel's SVML library and linked against it, we got to 91%. Using both SVML and AMD's libm, we achieved 93%. i think glibc 2.27 should outperform amd's libm on wrf (since i upstreamed the single precision code from https://github.com/ARM-software/optimized-routines/ ) the 83% -> 93% diff is because gcc fails to vectorize math calls in fortran to libmvec calls. That means that there likely still is 7% to be gained from more clever optimizations in GCC but the real problem is in GNU libm. And 481.wrf is perhaps the most extreme example but definitely not the only one. there is no longer a problem in gnu libm for the most common single precision calls and if things go well then glibc 2.28 will get double precision improvements too. but gcc has to learn how to use libmvec in fortran.
Re: Subnormal float support in armv7(with -msoft-float) for intrinsics
On 12/07/18 16:20, Umesh Kalappa wrote: Hi everyone, we have our source base ,that was compiled for armv7 on gcc8.1 with soft-float and for following input a=0x0010 b=0x0001 result = a - b ; we are getting the result as "0x000e" and with -mhard-float (disabled the flush to zero mode ) we are getting the result as ""0x000f" as expected. please submit it as a bug report to bugzilla while debugging the soft-float code,we see that ,the compiler calls the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing "a" in r0 and r1 registers and respectively for "b". we are investigating the routine "__aeabi_dsub" that comes from libgcc for incorrect result and meanwhile we would like to know that a)do libgcc routines/intrinsic for float operations support or consider the subnormal values ? ,if so how we can enable the same. Thank you ~Umesh
Re: [RFC] man7/system_data_types.7: Document [unsigned] __int128
The 10/01/2020 12:14, Alejandro Colomar via Gcc wrote: > Here is the rendered intmax_t: > > intmax_t > Include: . Alternatively, . > > A signed integer type capable of representing any value of any > signed integer type supported by the implementation. According > to the C language standard, it shall be capable of storing val- > ues in the range [INTMAX_MIN, INTMAX_MAX]. > > The macro INTMAX_C() expands its argument to an integer constant > of type intmax_t. > > The length modifier for intmax_t for the printf(3) and the > scanf(3) families of functions is j; resulting commonly in %jd > or %ji for printing intmax_t values. > > Bugs: intmax_t is not large enough to represent values of type > __int128 in implementations where __int128 is defined and long > long is less than 128 bits wide. or __int128 is not an integer type. integer types are either standard or extended. and __int128 is neither because it can be larger than intmax_t and stdint.h does not provide the necessary macros for it. > > Conforming to: C99 and later; POSIX.1-2001 and later. > > See also the uintmax_t type in this page.
Re: unnormal Intel 80-bit long doubles and isnanl
The 11/24/2020 16:23, Siddhesh Poyarekar wrote: > Hi, > > The Intel 80-bit long double format has a concept of "unnormal" numbers that > have a non-zero exponent and zero integer bit (i.e. bit 63) in the mantissa; > all valid long double numbers have their integer bit set to 1. Unnormal > numbers are mentioned in "8.2.2 Unsupported Double Extended-Precision > Floating-Point Encodings and Pseudo-Denormals" and listed in Table 8-3 in > the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume > 1:Basic Architecture. > > As per the manual, these numbers are considered unsupported and generate an > invalid-operation exception if they are used as operands to any floating > point instructions. The question of this email is how the toolchain > (including glibc) should treat these numbers since as things stand today, > glibc and gcc disagree when it comes to isnanl. ideally fpclassify (and other classification macros) would handle all representations. architecturally invalid or trap representations can be a non-standard class but i think classifying them as FP_NAN would break the least amount of code. > glibc evaluates the bit pattern of the 80-bit long double and in the > process, ignores the integer bit, i.e. bit 63. As a result, it considers > the unnormal number as a valid long double and isnanl returns 0. i think m68k and x86 are different here. > > gcc on the other hand, simply uses the number in a floating point comparison > and uses the parity flag (which indicates an unordered compare, signalling a > NaN) to decide if the number is a NaN. The unnormal numbers behave like > NaNs in this respect, in that they set the parity flag and with > -fsignalling-nans, would result in an invalid-operation exception. As a > result, __builtin_isnanl returns 1 for an unnormal number. compiling isnanl to a quiet fp compare is wrong with -fsignalling-nans: classification is not supposed to signal exceptions for snan. > > So the question is, which behaviour should be considered correct? Strictly > speaking, unnormal numbers are listed separately from NaNs in the document > and as such are distinct from NaNs. So on the question of "is nan?" the > answer ought to be "No". > > On the flip side, the behaviour described (and experienced through code) is > exactly the same as a NaN, i.e. a floating point operation sets the parity > flag and generates an invalid-operation exception. So if it looks like a > NaN, behaves like a NaN, then even if the document hints (and it is just a > hint right, since it doesn't specifically state it?) that it's different, it > likely is a NaN. What's more, one of the fixes to glibc[1] assumes that > __builtin_isnanl will do the right thing. > > The third alternative (which seems like a step back to me, but will concede > that it is a valid resolution) is to state that unnormal input to isnanl > would result in undefined behaviour and hence it is the responsibility of > the application to ensure that inputs to isnanl are never unnormal. > > Thoughts? > > Siddhesh > > [1] > https://sourceware.org/git/?p=glibc.git;h=0474cd5de60448f31d7b872805257092faa626e4
AArch64 vector ABI vs OpenMP
Last time aarch64 libmvec was discussed, the OpenMP declare variant syntax support was not ready in gcc and there were open questions around how simd isa variants would be supported. https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532940.html The vector function ABI for aarch64 allows the declare variant syntax and that is the only way to declare vector math functions for a particular isa only. https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst#aarch64-variant-traits I would like to get feedback if there may be anything preventing declare variant simd support on aarch64 like float64x2_t simd_cos (float64x2_t); #pragma omp declare variant(simd_cos) \ match(construct={simd(simdlen(2), notinbranch)}, device={isa("simd")}) double cos (double); where isa("simd") means simd_cos can be used when auto vectorizing cos calls with advanced simd. Our hope is that this enables libmvec on aarch64 such that at least advanced simd variants of some math functions can be declared in math.h and implemented in libm, suitable for vectorization. (Using the vector ABI names of those functions.) Eventually we want to add isa("sve") support too, but that may require further work on how scalable vector length is represented. Please let me know if there are outstanding issues with this approach. thanks.
Re: Adding file descriptor attribute(s) to gcc and glibc
The 07/12/2022 18:25, David Malcolm via Libc-alpha wrote: > On Tue, 2022-07-12 at 18:16 -0400, David Malcolm wrote: > > On Tue, 2022-07-12 at 23:03 +0530, Mir Immad wrote: > > GCC's attribute syntax here: > > https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html > > allows for a parenthesized list of parameters for the attribute, which > > can be: > > (a) An identifier > > (b) An identifier followed by a comma and a non-empty comma-separated > > list of expressions > > (c) A possibly empty comma-separated list of expressions > > > > I'd hoped to have an argument number, with an optional extra param > > describing the direction of the access, but syntax (b) puts the > > identifier first, alas. > > > > Here's one possible way of doing it with a single attribute, via syntax > > (b): > > e.g. > > __attribute__((fd_argument (access, 1)) > > __attribute__((fd_argument (read, 1)) > > __attribute__((fd_argument (write, 1)) > > > > meaning that argument 1 of the function is expected to be an open file- > > descriptor, and that it must be possible to read from/write to that fd > > for cases 2 and 3. > > > > Here are some possible examples of how glibc might use this syntax: > > > > int dup (int oldfd) > > __attribute((fd_argument (access, 1)); > > > > int ftruncate (int fd, off_t length) > > __attribute((fd_argument (access, 1)); > > > > ssize_t pread(int fd, void *buf, size_t count, off_t offset) > > __attribute((fd_argument (read, 1)); > > > > ssize_t pwrite(int fd, const void *buf, size_t count, > > off_t offset); > > __attribute((fd_argument (write, 1)); > > > > ...but as I said, I'm most interested in input from glibc developers on > > this. note that glibc headers have to be namespace clean so it would be more like __attribute__((__fd_argument (__access, 1))) __attribute__((__fd_argument (__read, 1))) __attribute__((__fd_argument (__write, 1))) so it would be even shorter to write __attribute__((__fd_argument_access (1))) __attribute__((__fd_argument_read (1))) __attribute__((__fd_argument_write (1))) > > I just realized that the attribute could accept both the single integer > argument number (syntax (c)) for the "don't care about access > direction" case, or the ({read|write}, N) of syntax (b) above, giving > e.g.: > > int dup (int oldfd) > __attribute((fd_argument (1)); > > int ftruncate (int fd, off_t length) > __attribute((fd_argument (1)); > > ssize_t pread(int fd, void *buf, size_t count, off_t offset) > __attribute((fd_argument (read, 1)); > > ssize_t pwrite(int fd, const void *buf, size_t count, >off_t offset); > __attribute((fd_argument (write, 1)); > > for the above examples. > > How does that look? > Dave i think fd in ftruncate should be open for writing. to be honest, i'd expect interesting fd bugs to be dynamic and not easy to statically analyze. the use-after-unchecked-open maybe useful. i would not expect the access direction to catch many bugs.
Re: Adding file descriptor attribute(s) to gcc and glibc
The 07/13/2022 12:55, David Malcolm wrote: > On Wed, 2022-07-13 at 16:01 +0200, Florian Weimer wrote: > > * David Malcolm: > GCC trunk's -fanalyzer implements the new warnings via a state machine > for file-descriptor values; it currently has rules for handling "open", > "close", "read", and "write", and these functions are currently hard- > coded inside the analyzer. > > Here are some examples on Compiler Explorer of what it can/cannot > detect: > https://godbolt.org/z/nqPadvM4f > > Probably the most important one IMHO is the leak detection. nice. > Would it be helpful to have some kind of attribute for "returns a new > open FD"? Are there other ways to close a FD other than calling > "close" on it? (Would converting that to some kind of "closes" > attribute be a good idea?) dup2(oldfd, newfd) dup3(oldfd, newfd, flags) closes newfd (and also opens it to be a dup of oldfd) unless the call fails. close_range(first, last, flags) fclose(fdopen(fd, mode)) but users can write all sorts of wrappers around close too. > > Are there any other "magic" values for file-descriptors we should be > aware of? > mmap may require fd==-1 for anonymous maps.
Re: Division by zero on A53 which does not raise an exception
The 11/28/2022 21:37, Stephen Smith via Binutils wrote: > I am working on a project which is using an A53 core. The core does not > raise an exception if there is a division by zero (for either integer or > floating point division). floating-point division by zero signals the FE_DIVBYZERO exception. you can test this via fetestexcept(FE_DIVBYZERO). integer operations must not affect fenv status flags so integer division by zero does not do that. if you want to *trap* division by zero, there is no reliable way to do that in c (this is not related to particular cpus though).
Re: New TLS usage in libgcc_s.so.1, compatibility impact
The 01/13/2024 13:49, Florian Weimer wrote: > This commit > > commit 8abddb187b33480d8827f44ec655f45734a1749d > Author: Andrew Burgess > Date: Sat Aug 5 14:31:06 2023 +0200 > > libgcc: support heap-based trampolines > > Add support for heap-based trampolines on x86_64-linux, aarch64-linux, > and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and > __builtin_nested_func_ptr_deleted functions for these targets. > > Co-Authored-By: Maxim Blinov > Co-Authored-By: Iain Sandoe > Co-Authored-By: Francois-Xavier Coudert > > added TLS usage to libgcc_s.so.1. The way that libgcc_s is currently > built, it ends up using a dynamic TLS variant on the Linux targets. > This means that there is no up-front TLS allocation with glibc (but > there would be one with musl). > > There is still a compatibility impact because glibc assigns a TLS module > ID upfront. This seems to be what causes the > ust/libc-wrapper/test_libc-wrapper test in lttng-tools to fail. We end > up with an infinite regress during process termination because > libgcc_s.so.1 has been loaded, resulting in a DTV update. When this > happens, the bottom of the stack looks like this: > > #4447 0x77f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1 > #4448 0x77fdb142 in free (ptr=) > at ../include/rtld-malloc.h:50 > #4449 _dl_update_slotinfo (req_modid=3, new_gen=2) at ../elf/dl-tls.c:822 > #4450 0x77fdb214 in update_get_addr (ti=0x77f2bfc0, > gen=) at ../elf/dl-tls.c:916 > #4451 0x77fddccc in __tls_get_addr () > at ../sysdeps/x86_64/tls_get_addr.S:55 > #4452 0x77f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1 > #4453 0x77fdb142 in free (ptr=) > at ../include/rtld-malloc.h:50 > #4454 _dl_update_slotinfo (req_modid=2, new_gen=2) at ../elf/dl-tls.c:822 > #4455 0x77fdb214 in update_get_addr (ti=0x77f39fa0, > gen=) at ../elf/dl-tls.c:916 > #4456 0x77fddccc in __tls_get_addr () > at ../sysdeps/x86_64/tls_get_addr.S:55 > #4457 0x77f36113 in lttng_ust_cancelstate_disable_push () >from /lib64/liblttng-ust-common.so.1 > #4458 0x77f4c2e8 in ust_lock_nocheck () from /lib64/liblttng-ust.so.1 > #4459 0x77f5175a in lttng_ust_cleanup () from /lib64/liblttng-ust.so.1 > #4460 0x77fca0f2 in _dl_call_fini ( > closure_map=closure_map@entry=0x77fbe000) at dl-call_fini.c:43 > #4461 0x77fce06e in _dl_fini () at dl-fini.c:114 > #4462 0x77d82fe6 in __run_exit_handlers () from /lib64/libc.so.6 > > Cc:ing for awareness. > > The issue also requires a recent glibc with changes to DTV management: > commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls > access after dlopen [BZ #19924]"). If I understand things correctly, > before this glibc change, we didn't deallocate the old DTV, so there was > no call to the free function. with 19924 fixed, after a dlopen or dlclose every thread updates its dtv on the next dynamic tls access. before that, dtv was only updated up to the generation of the module being accessed for a particular tls access. so hitting the free in the dtv update path is now more likely but the free is not new, it was there before. also note that this is unlikely to happen on aarch64 since tlsdesc only does dynamic tls access after a 512byte static tls reservation runs out. > > On the glibc side, we should recommend that intercepting mallocs and its > dependencies use initial-exec TLS because that kind of TLS does not use > malloc. If intercepting mallocs using dynamic TLS work at all, that's > totally by accident, and was in the past helped by glibc bug 19924. (I right. > don't think there is anything special about libgcc_s.so.1 that triggers > the test failure above, it is just an object with dynamic TLS that is > implicitly loaded via dlopen at the right stage of the test.) In this > particular case, we can also paper over the test failure in glibc by not > call free at all because the argument is a null pointer: > > diff --git a/elf/dl-tls.c b/elf/dl-tls.c > index 7b3dd9ab60..14c71cbd06 100644 > --- a/elf/dl-tls.c > +++ b/elf/dl-tls.c > @@ -819,7 +819,8 @@ _dl_update_slotinfo (unsigned long int req_modid, size_t > new_gen) >dtv entry free it. Note: this is not AS-safe. */ > /* XXX Ideally we will at some point create a memory >pool. */ > - free (dtv[modid].pointer.to_free); > + if (dtv[modid].pointer.to_free != NULL) > + free (dtv[modid].pointer.to_free); > dtv[modid].pointer.val = TLS_DTV_UNALLOCATED; > dtv[modid].pointer.to_free = NULL; can be done, but !=NULL is more likely since we do modid reuse after dlclose. there is also a realloc in dtv resizing which happens when more than 16 modules with tls are loaded after thread creation (DTV_SURPLUS). i'm not sure if it's worth supporting malloc
Re: [RFC] Linux system call builtins
The 04/08/2024 06:19, Matheus Afonso Martins Moreira via Gcc wrote: > __builtin_linux_system_call(long n, ...) ... > Calling these builtins will make GCC place all the parameters > in the correct registers for the system call, emit the appropriate > instruction for the target architecture and return the result. > In other words, they would implement the calling convention[1] of > the Linux system calls. note: some syscalls / features don't work without asm (posix thread cancellation, vfork, signal return,..) and using raw syscalls outside of the single runtime the application is using is problematic (at least on linux). > + It doesn't make sense for libraries to support it > > There are libraries out there that provide > system call functionality. The various libcs do. > However they usually don't support the full set > of Linux system calls. Using certain system calls > could invalidate global state in these libraries > which leads to them not being supported. Clone is > the quintessential example. So I think libraries > are not the proper place for this functionality. i don't follow the reasoning here, where should the syscall be if not in a library like libc? clone cannot even be used from c code in general as CLONE_VM is not compatible with c semantics without a new stack (child clobbers the parent stack), so the c builtin would not always work, but it is also a syscall that only freestanding application can use not something that calls into the libc, and even in a freestanding application it is tricky to use right (especially in a portable way or with features like shadow stack), so i don't see why clone is the quintessential example. > + It allows freestanding software to easily target Linux > > Freestanding code usually refers to bare metal > targets but Linux is also a viable target. > This will make it much easier for developers > to create freestanding nolibc no dependency > software targeting Linux without having to > write any assembly code at all, making GCC > ever more useful. i think the asm call convention bit is by far not the hardest part in providing portable linux syscall wrappers. my main worry is that the builtins encourage the use of raw syscalls and outside of libc development it is not well understood how to do that correctly, but i guess it's ok if it is by default an error outside of -ffreestanding.
Re: [RFC] Linux system call builtins
The 04/09/2024 23:59, Matheus Afonso Martins Moreira via Gcc wrote: > > and using raw syscalls outside of the single runtime the > > application is using is problematic (at least on linux). > > Why do you say they are problematic on Linux though? Please elaborate. because the portable c api layer and syscall abi layer has a large enough gap that applications can break libc internals by doing raw syscalls. and it's not just the call convention that's target specific (this makes the c syscall() function hard to use on linux) and linux evolves fast enough that raw syscalls have to be adjusted over time (to support new features) which is harder when they are all over the place instead of in the libc only. > > The ABI being stable should mean that I can for example > strace a program, analyze the system calls and implement > a new version of it that performs the same functions. you could do that with syscall() but it is not very useful as the state of the system is not the same when you rerun a process so syscalls would likely fail or do different things than in the first run. > > clone cannot even be used from c code in general > > as CLONE_VM is not compatible with c semantics > > without a new stack (child clobbers the parent stack) > > so the c builtin would not always work > > it is also a syscall that only freestanding > > application can use not something that calls > > into the libc > > There are major projects out there which do use it regardless. that does not make it right. > For example, systemd: > > https://github.com/systemd/systemd/blob/main/src/basic/raw-clone.h > https://github.com/systemd/systemd/blob/main/src/shared/async.h > https://github.com/systemd/systemd/blob/main/src/shared/async.c > https://github.com/systemd/systemd/blob/main/docs/CODING_STYLE.md > > > even in a freestanding application it is tricky to use right > > No argument from me there. It is tricky... > The compiler should make it possible though. > > > so i don't see why clone is the quintessential example. > > I think it is the best example because attempting to use clone > is not actually supported by glibc. > > https://sourceware.org/bugzilla/show_bug.cgi?id=10311 > > "If you use clone() you're on your own." should be "if you use clone() *or* raw clone syscall then you're on your own" which is roughly what i said in that discussion. so your proposal does not fix this particular issue, just provide a simpler footgun. > > i guess it's ok if it is by default an error > > outside of -ffreestanding. > > Hosted C programs could also make good use of them. they should not. > They could certainly start out exclusive to freestanding C > and then made available to general code if there's demand.