Re: AArch64 and -moutline-atomics

2020-05-20 Thread Szabolcs Nagy
The 05/20/2020 09:02, Florian Weimer via Gcc wrote:
> * Richard Henderson:
> > On 5/19/20 3:38 AM, Florian Weimer via Gcc wrote:
> >> One minor improvement would be to document __aarch64_have_lse_atomics as
> >> interposable on the GCC side and define that directly in glibc, so that
> >> lse-init.o is not linked in anymore and __aarch64_have_lse_atomics can
> >> be initialized as soon as ld.so has the hwcap information.
> >
> > The __aarch64_have_lse_atomics symbol is not interposable.
> > We use a direct pc-relative reference to it from each lse thunk.
> 
> What I meant that users are allowed to supply their own definition in a
> static link.  Sorry, not sure what the correct terminology is here.  I
> don't think any code changes would be needed for that, it's just a
> matter of documentation (and being careful about future evolution of the
> code).

are you proposing to put it in libc_nonshared.a/crt1.o?
(and then ld.so would treat it specially when loading a
module to initialize it early)

or only dealing with it in libc.so, and let other modules
still initialize it late (in case there are higher prio
ctors or ifunc resolvers using atomics)?


Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt

2015-07-01 Thread Szabolcs Nagy
On 01/07/15 16:34, Zinovy Nis wrote:
> The only idea on size difference I have is:
> 
> headers text in many of FP-emulation files from compiler_rt contains lines 
> like:
> 
> // This file implements quad-precision soft-float addition ***with the
> IEEE-754 default rounding*** (to nearest, ties to even).
> 

nearest rounding and no exception flags.

in other words they assume no fenv access.



Re: Testing and dynamic linking on remote target

2015-07-10 Thread Szabolcs Nagy
On 09/07/15 16:56, David Talmage wrote:
> I'm looking for a way to specify the LD_LIBRARY_PATH or LD_PRELOAD on the 
> target system when running one of the DejaGNU test suites. I'm testing a gcc 
> cross-compiler on a development board.  I can't replace existing versions of 
> libraries under test because other people are using the development board 
> when 
> I'm testing.
> 
> I found a thread about this in the archives: "Is anyone testing for a 
> (cross-) 
> target (board) with dynlinking?" 
> (https://gcc.gnu.org/ml/gcc/2008-02/msg00201.html). The best suggestion at 
> the 
> time was to NFS mount the cross-compiled library directory and use "-Wl,-
> dynamic-linker -Wl,-rpath" in LDFLAGS.
> 
> NFS mounting isn't an option for me, alas.
> 

i think if you copy the libraries somewhere on the target
then you can use

 -rpath-link=/libs/on/host -rpath=/libs/on/target



Re: Compiler support for erasure of sensitive data

2015-09-09 Thread Szabolcs Nagy
* Zack Weinberg  [2015-09-09 15:03:50 -0400]:
> On 09/09/2015 02:02 PM, paul_kon...@dell.com wrote:
> >> On Sep 9, 2015, at 1:54 PM, David Edelsohn 
> >> wrote:
> >> 
> >> What level of erasure of sensitive data are you trying to ensure? 
> >> Assuming that overwriting values in the ISA registers actually 
> >> completely clears and destroys the values is delusionally naive.
> > 
> > Could you point to some references about that?
> 
> I *assume* David is referring to register renaming, which is not
> architecturally visible...
> 

or async signal handler copying all the register state on sigaltstack
or internal counters and debug features making sensitive info observable
or timing/cache-effect side channels that let other processes get info
or compiling to a highlevel language (js) with different kind of leaks
or running under emulator/debugger that can make secrets visible
or...

> I would consider data leaks via state inaccessible to a program
> executing at the same privilege level as the code to be hardened to be
> out of scope.  (Which does mean that *when hardening an OS kernel* one

specifying the info leak at the abstract c machine level is not useful
(the memset is not observable there, unless you assign meaning to
undefined behaviour which is a can of worms), but you do have to specify
the leak on some abstraction level (that is applicable to the targets of
a compiler and gives useful security properties in practice) otherwise
the attribute is not meaningful.

leaks can happen for many reasons that are layers below the control
of the compiler, but still observable by high level code.


Re: Clarifying attribute-const

2015-09-29 Thread Szabolcs Nagy

On 25/09/15 21:16, Eric Botcazou wrote:

First, a belated follow-up to https://gcc.gnu.org/PR66512 . The bug is
asking why attribute-const appears to have a weaker effect in C++, compared
to C. The answer in that bug is that GCC assumes that attribute-const
function can terminate by throwing an exception.


FWIW there is an equivalent semantics in Ada: the "const" functions can throw
and the language explicitly allows them to be CSEd in this case, etc.



i think a throwing interface that may be moved around
by the compiler makes reasoning about exception safety
hard.. (i.e. the spec cannot be hand-wavy about the
allowed optimizations).

i guess the inconsistency stems from c++ making extern
c apis throw by default (causing some amount of misery:
in c one cannot throw nor declare something nothrow,
so c api is pessimized in c++).


That doesn't actually seem reasonable.  Consider that C counterpart to
throwing is longjmp; it seems to me that GCC should behave consistently:
either assume that attribute-const may both longjmp and throw (I guess
nobody wants that), or that it may not longjmp nor throw.  Intuitively, if
"const" means "free of side effects so that calls can be moved
speculatively or duplicated", then non-local control flow transfer via
throwing should be disallowed as well.


This would pessimize a lot languages where exceptions are pervasive.



i think if the language tends to allow catching invalid
input to pure computation as an exception (e.g. division
by zero) then throwing is preferred.

if bad input is treated as undefined behavior then
nothrow is preferred.


In any case, it would be nice the intended compiler behavior could be
explicitely stated in the manual.


Agreed.



+1



Re: [RFC] Kernel livepatching support in GCC

2015-10-22 Thread Szabolcs Nagy

On 22/10/15 10:23, libin wrote:

From: Jiangjiji 
Date: Sat, 10 Oct 2015 15:29:57 +0800
Subject: [PATCH] * gcc/config/aarch64/aarch64.opt: Add a new option.
  * gcc/config/aarch64/aarch64.c: Add some new functions and Macros.
  * gcc/config/aarch64/aarch64.h: Modify PROFILE_HOOK and FUNCTION_PROFILER.



this patch might be worth submitting to gcc-patches.

i assume this is not redundant with respect to the
nop-padding work.


Signed-off-by: Jiangjiji 
Signed-off-by: Li Bin 
---
  gcc/config/aarch64/aarch64.c   |   23 +++
  gcc/config/aarch64/aarch64.h   |   13 -
  gcc/config/aarch64/aarch64.opt |4 
  3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 752df4e..c70b161 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -440,6 +440,17 @@ aarch64_is_long_call_p (rtx sym)
return aarch64_decl_is_long_call_p (SYMBOL_REF_DECL (sym));
  }

+void
+aarch64_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
+{
+   if (flag_fentry)
+   {
+   fprintf (file, "\tmov\tx9, x30\n");
+   fprintf (file, "\tbl\t__fentry__\n");
+   fprintf (file, "\tmov\tx30, x9\n");
+   }
+}
+


you can even omit the mov x30,x9 at the call site if
__fentry__ does

  stp x9,x30,[sp,#-16]!
  //... profiling
  ldp x30,x9,[sp],#16
  ret x9

is there a problem with this?

i think the rest of the patch means that -pg retains
the old behaviour and -pg -mfentry emits this new entry.

note that -pg rejects -fomit-frame-pointer (for no good
reason), that should be fixed separately (it seems the
kernel now relies on frame pointers on aarch64, but the
mcount abi does not require this and e.g. the glibc
mcount does not use it.)


  /* Return true if the offsets to a zero/sign-extract operation
 represent an expression that matches an extend operation.  The
 operands represent the paramters from
@@ -7414,6 +7425,15 @@ aarch64_emit_unlikely_jump (rtx insn)
add_int_reg_note (insn, REG_BR_PROB, very_unlikely);
  }

+/* Return true, if profiling code should be emitted before
+ * prologue. Otherwise it returns false.
+ * Note: For x86 with "hotfix" it is sorried.  */
+static bool
+aarch64_profile_before_prologue (void)
+{
+   return flag_fentry != 0;
+}
+
  /* Expand a compare and swap pattern.  */

  void
@@ -8454,6 +8474,9 @@ aarch64_cannot_change_mode_class (enum machine_mode from,
  #undef TARGET_ASM_ALIGNED_SI_OP
  #define TARGET_ASM_ALIGNED_SI_OP "\t.word\t"

+#undef TARGET_PROFILE_BEFORE_PROLOGUE
+#define TARGET_PROFILE_BEFORE_PROLOGUE aarch64_profile_before_prologue
+
  #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
  #define TARGET_ASM_CAN_OUTPUT_MI_THUNK \
hook_bool_const_tree_hwi_hwi_const_tree_true
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 77b2bb9..65e34fc 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -804,13 +804,16 @@ do {  
 \
  #define PROFILE_HOOK(LABEL)   \
{   \
  rtx fun, lr;  \
-lr = get_hard_reg_initial_val (Pmode, LR_REGNUM);  \
-fun = gen_rtx_SYMBOL_REF (Pmode, MCOUNT_NAME); \
-emit_library_call (fun, LCT_NORMAL, VOIDmode, 1, lr, Pmode);   \
+   if (!flag_fentry)
+ {
+   lr = get_hard_reg_initial_val (Pmode, LR_REGNUM);   
\
+   fun = gen_rtx_SYMBOL_REF (Pmode, MCOUNT_NAME);  
\
+   emit_library_call (fun, LCT_NORMAL, VOIDmode, 1, lr, Pmode);
\
+ }
}

-/* All the work done in PROFILE_HOOK, but still required.  */
-#define FUNCTION_PROFILER(STREAM, LABELNO) do { } while (0)
+#define FUNCTION_PROFILER(STREAM, LABELNO)
+   aarch64_function_profiler(STREAM, LABELNO)

  /* For some reason, the Linux headers think they know how to define
 these macros.  They don't!!!  */
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 266d873..9e4b408 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -124,3 +124,7 @@ Enum(aarch64_abi) String(ilp32) Value(AARCH64_ABI_ILP32)

  EnumValue
  Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)
+
+mfentry
+Target Report Var(flag_fentry) Init(0)
+Emit profiling counter call at function entry immediately after prologue.





Re: arm64:, Re: [RFC] Kernel livepatching support in GCC

2015-10-22 Thread Szabolcs Nagy

On 22/10/15 11:14, AKASHI Takahiro wrote:

On 10/22/2015 06:07 PM, libin wrote:

在 2015/5/28 16:39, Maxim Kuvyrkov 写道:

Our proposal is that instead of adding -mfentry/-mnop-count/-mrecord-mcount 
options to other architectures,
we should
implement a target-independent option -fprolog-pad=N, which will generate a pad 
of N nops at the beginning
of each
function and add a section entry describing the pad similar to -mrecord-mcount 
[1].

Since adding NOPs is much less architecture-specific then outputting call 
instruction sequences, this option
can be
handled in a target-independent way at least for some/most architectures.

Comments?

As I found out today, the team from Huawei has implemented [2], which follows 
x86 example of -mfentry option
generating a hard-coded call sequence.  I hope that this proposal can be easily 
incorporated into their work
since
most of the livepatching changes are in the kernel.



Thanks very much for your effort for this, and the arch-independed 
implementation
is very good to me, but only have one question that how to enture the atomic
replacement of multi instructions in kernel side?


I have one idea, but we'd better discuss this topic in, at least including, 
linux-arm-kernel.


And before this arch-independed option, can we consider the arch-depended 
-mfentry
implemention for arm64 like arch x86 firstly? I will post it soon.

livepatch for arm64 based on this arm64 -mfentry feature on github:
https://github.com/libin2015/livepatch-for-arm64.git  master



I also have my own version of livepatch support for arm64 using yet-coming 
"-fprolog-add=N" option :)
As we discussed before, the main difference will be how we should preserve LR 
register when invoking
a ftrace hook (ftrace_regs_caller).
But again, this is a topic to discuss mainly in linux-arm-kernel.
(I have no intention of excluding gcc ml from the discussions.)


is -fprolog-add=N enough from gcc?

i assume it solves the live patching, but i thought -mfentry
might be still necessary when live patching is not used.

or is the kernel fine with the current mcount abi for that?
(note that changes the code generation in leaf functions
and currently the kernel relies on frame pointers etc.)



Re: arm64:, Re: [RFC] Kernel livepatching support in GCC

2015-10-23 Thread Szabolcs Nagy

On 23/10/15 10:11, AKASHI Takahiro wrote:

On 10/22/2015 07:26 PM, Szabolcs Nagy wrote:

On 22/10/15 11:14, AKASHI Takahiro wrote:


I also have my own version of livepatch support for arm64 using yet-coming 
"-fprolog-add=N" option :)
As we discussed before, the main difference will be how we should preserve LR 
register when invoking
a ftrace hook (ftrace_regs_caller).
But again, this is a topic to discuss mainly in linux-arm-kernel.
(I have no intention of excluding gcc ml from the discussions.)


is -fprolog-add=N enough from gcc?


Yes, as far as I correctly understand this option.


i assume it solves the live patching, but i thought -mfentry
might be still necessary when live patching is not used.


No.
- Livepatch depends on ftrace's DYNAMIC_FTRACE_WITH_REGS feature
- DYNAMIC_FTRACE_WITH_REGS can be implemented either with -fprolog-add=N or 
-mfentry
- x86 is the only architecture that supports -mfentry AFAIK
- and it is used in the kernel solely to implement this ftrace feature AFAIK
- So once a generic option, fprolog-add=N, is supported, we have no reason to 
add arch-specific -mfentry.


or is the kernel fine with the current mcount abi for that?
(note that changes the code generation in leaf functions


Can you please elaborate your comments in more details?
I didn't get your point here.



ok, i may be confused.

i thought there is a static ftrace (functions are
instrumented with mcount using -pg) and a dynamic one
where the code is modified at runtime.

then i thought adding -fprolog-pad=N would be good for the
dynamic case, but not for the static case.

the static case may need improvements too because the
current way (using regular c call abi for mcount) affects
code generation more significantly than the proposed
-mfentry solution would (e.g. leaf functions turn into
non-leaf ones).

hence the question: is the kernel satisfied with -pg mcount
for the static ftrace or does it want -mfentry behaviour
instead?



Re: Linux-abi group

2016-02-08 Thread Szabolcs Nagy
* H.J. Lu  [2016-02-08 11:24:53 -0800]:
> I created a mailing list to discuss Linux specific,.processor independent
> modification and extension of generic System V Application Binary Interface:
> 
> https://groups.google.com/d/forum/linux-abi
> 
> I will start to document existing Linux extensions, like STT_GNU_IFUNC.
> I will propose some new extensions soon.
> 

seems to require a registered email address at google.
(and the archive does not work from any console based browser
or using direct http get tools.)

the kernel seems to have a lot of mailing lists, may be
they can handle this list too?

thanks


Re: gnu-gabi group

2016-02-15 Thread Szabolcs Nagy
On 15/02/16 16:03, H.J. Lu wrote:
> On Mon, Feb 15, 2016 at 7:37 AM, Alexandre Oliva  wrote:
>> On Feb 12, 2016, Pedro Alves  wrote:
>>
>>
 wonderful. I am not a big fan of google groups mailinglists, they seem
 to make it hard to subscribe and don't have easy to access archives.
 Having a local gnu-gabi group on sourceware.org would be better IMHO.
>>
>>> +1
>>
>> +1
>>
>> Since it's GNU tools we're talking about, we'd better use a medium that
>> we've all already agreed to use, than one that a number of us objects
>> to.  I, for one, have closed my Google account several Valentine's Days
>> ago, for privacy reasons, and this makes the archives of lists hidden
>> there unusable for me.
> 
> Please don't spread false information.  Anyone can subscribe Linux-ABI
> group and its archive is to open to everyone.  You don't need a gmail account
> for any of those.  There are quite a few non-gmail users.  You don't have
> to take my word for it.  I can add your email to Linux-ABI group and you
> can check it out yourself :-).
> 

you as a group admin can do that, others cannot join
without creating a account at google (which requires
the acceptance of the google tos etc).

you also have censorship rights over others.

even if you add users to the list they cannot access
the archive through standard http or https, they need
to allow google to execute javascript code on their
machine. (so wget does not work).

and the url through which you visit a post is not a
reliable permanent link so linking to posts is hard.

i think google groups is not an acceptable forum for
discussing open standards publicly.



Re: gnu-gabi group

2016-02-16 Thread Szabolcs Nagy
On 15/02/16 17:36, Mike Frysinger wrote:
> On 15 Feb 2016 16:18, Szabolcs Nagy wrote:
>> you as a group admin can do that, others cannot join
>> without creating a account at google (which requires
>> the acceptance of the google tos etc).
> 
> that is annoying

i didn't know about list+subscr...@googlegroups.com
(thanks Florian and Joseph)

>> you also have censorship rights over others.
> 
> umm, every mailing list has that.  Google Groups is no different.

it's better if admin right is at some discussion related
organization. (e.g. in case anything happens to H.J.Lu)

>> even if you add users to the list they cannot access
>> the archive through standard http or https,
> 
> you're conflating things here.  of course access is through "standard
> http or https" -- that's the transport protocol that everyone has to
> implement according to the standard in order to work.  Goole is not
> different here.

the contents cannot be accessed with an http or https client.
(unless you know the magic urls below)

>> they need to allow google to execute javascript code on their
>> machine.
> 
> complaining that the web interface executes JS is a bit luddite-ish.

some of us tend to browse the web from terminal (== no js).

>> (so wget does not work).
> 
> every message has a link to the raw message you can use to fetch the
> mail directly.
> 
> perm link:
> https://groups.google.com/d/msg/x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ

redirects me to
https://groups.google.com/forum/#!msg/x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ

> which has a link to the raw message:
> https://groups.google.com/forum/message/raw?msg=x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ

i didn't know about this raw url, it seems there is
https://groups.google.com/forum/print/msg/x32-abi/IHmCJvigOEg/TyjZJYZ63DMJ
too, so if i always change the urls i can browse the archive.
(this is not discoverable without js as far as i can see)

with the +subscribe@ and the raw msg options i'm no longer
against google groups hosting public discussions (provided
the project documents these somewhere), i still prefer more
accessible alternatives though.

> it's actually nicer than mailmain (i.e. sourceware) as it doesn't do all
> the trivial content mangling (s/@/ at/g).  it's not like e-mail scrapers
> today can't reverse that easily.
> 
>> and the url through which you visit a post is not a
>> reliable permanent link so linking to posts is hard.
> 
> every post has a "link" option to get a perm link.  needing the location
> in the URL bar be the perm link is a weak (dumb imo) requirement.
> -mike
> 



Re: Q: (d = NAN) != NAN?

2016-04-08 Thread Szabolcs Nagy
On 08/04/16 11:09, Ulrich Windl wrote:
> Probably I'm doing something wrong, but I have some problems comparing a 
> double with NAN: The value is NAN, but the test fails. Probably I should use 
> isnana().

yes, that's how ieee works, nan != nan is true.



Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-19 Thread Szabolcs Nagy
On 19/04/16 09:20, Richard Biener wrote:
> On Tue, Apr 19, 2016 at 7:08 AM, Alan Modra  wrote:
>> On Mon, Apr 18, 2016 at 07:59:50AM -0700, H.J. Lu wrote:
>>> On Mon, Apr 18, 2016 at 7:49 AM, Alan Modra  wrote:
 On Mon, Apr 18, 2016 at 11:01:48AM +0200, Richard Biener wrote:
> To summarize: there is currently no testcase for a wrong-code issue
> because there is no wrong-code issue.
>>
>> I've added a testcase at
>> https://sourceware.org/bugzilla/show_bug.cgi?id=19965#c3
>> that shows the address problem (&x != x) with older gcc *or* older
>> glibc, and shows the program behaviour problem with current
>> binutils+gcc+glibc.
> 
> Thanks.
> 
> So with all this it sounds that current protected visibility is just broken
> and we should forgo with it, making it equal to default visibility?
> 

the test cases pass for me on musl libc,
it's just a glibc dynamic linker bug
that it does not handle extern protected
visibility correctly.

> At least I couldn't decipher a solution that solves all of the issues
> with protected visibility apart from trying to error at link-time
> (or runtime?) for the cases that are tricky (impossible?) to solve.
> 
> glibc uses "protected visibility" via its using of local aliases, correct?
> But it doesn't use anything like that for data symbols?
> 
> Richard.
> 
>> --
>> Alan Modra
>> Australia Development Lab, IBM
> 



Re: SafeStack proposal in GCC

2016-04-20 Thread Szabolcs Nagy
On 13/04/16 14:01, Cristina Georgiana Opriceana wrote:
> I bring to your attention SafeStack, part of a bigger research project
> - CPI/CPS [1], which offers complete protection against stack-based
> control flow hijacks.

i think it does not provide complete protection.

it cannot instrument the c runtime or dsos and attacks
can be retried on a forking server which has fixed memory
layout, so there is still significant attack surface.

(it would be nice if security experts made such claims
much more carefully).

> In GCC, we propose a design composed of an instrumentation module
> (implemented as a GIMPLE pass) and a runtime library.
...
> The runtime support will have to deal with unsafe stack allocation - a
> hook in the pthread create/destroy functions to create per-thread
> stack regions. This runtime support might be reused from the Clang
> implementation.

the SafeStack runtime in compiler-rt has various issues
that should be clearly documented.

it seems the runtime

* aborts the process on allocation failure.

* deallocates the unsafe stack using tsd dtors, but
signal handlers may run between dtors and the actual
thread exit.. without a mapped unsafe stack.

* determines the main stack with broken heuristic
(since the rlimit can change at runtime i don't think
this is possible to do correctly in general).

* interposes pthread_create but not c11 thrd_create
so conforming c11 code will crash. (same for non-standard
usage of raw clone.)

* sigaltstack and swapcontext are broken too.

i think the runtime issues are more likely to cause
problems than the compiler parts: it has to be reliable
and abi stable since safestack is advertised for
production use.

(i think gcc should raise the bar for runtime code
quality higher than that, but there is precedent
for much worse runtimes in gcc so this should not
block the safestack porting work, however consider
these issues when communicating about it to upstream
or to potential users.)



GCC 6 symbol poisoning and c++ header usage is fragile

2016-04-21 Thread Szabolcs Nagy
building gcc6 using musl based gcc6 fails with symbol poisoning error
(details at the end of the mail).

the root cause is c++: c++ headers include random libc headers with
_GNU_SOURCE ftm so all sorts of unexpected symbols are defined/declared.

since it's unlikely the c++ standard gets fixed (to properly specify
the namespace rules) it is not acceptable to include std headers after
system.h, where the poisoning happens, because trivial libc header
change will break the build.

c++ header use in gcc seems inconsistent, e.g. there are cases where
-  is included before system.h (go-system.h)
-  is included after system.h (indirectly through )
-  is included in system.h because INCLUDE_STRING is defined.
-  is included in system.h and in source files using it.. sometimes

i think it should be consistently before system.h (i'm not sure what's
going on with the INCLUDE_STRING macro), fortunately not many files are
affected in gcc/:

auto-profile.c
diagnostic.c
graphite-isl-ast-to-gimple.c
ipa-icf.c
ipa-icf-gimple.c
pretty-print.c
toplev.c

i can prepare a patch moving the c++ includes up and i'm open to
other suggestions. (including libc headers is also problematic because
of _GNU_SOURCE, but still safer than what is happening in c++ land
where #include  makes all locale.h, pthread.h, time.h, sched.h,
etc symbols visible).


x86_64-linux-musl-g++ -fno-PIE -c   -g -O2 -DIN_GCC -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables
-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings   -DHAVE_CONFIG_H 
-I. -I. -I/src/gcc/gcc
-I/src/gcc/gcc/. -I/src/gcc/gcc/../include -I/src/gcc/gcc/../libcpp/include 
-I/build/host-tools/include
-I/build/host-tools/include -I/build/host-tools/include  
-I/src/gcc/gcc/../libdecnumber
-I/src/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/src/gcc/gcc/../libbacktrace   -o auto-profile.o -MT
auto-profile.o -MMD -MP -MF ./.deps/auto-profile.TPo /src/gcc/gcc/auto-profile.c
In file included from /tool/x86_64-linux-musl/include/pthread.h:30:0,
 from 
/tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr-default.h:35,
 from 
/tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr.h:148,
 from 
/tool/x86_64-linux-musl/include/c++/6.0.1/ext/atomicity.h:35,
 from 
/tool/x86_64-linux-musl/include/c++/6.0.1/bits/basic_string.h:39,
 from /tool/x86_64-linux-musl/include/c++/6.0.1/string:52,
 from /tool/x86_64-linux-musl/include/c++/6.0.1/stdexcept:39,
 from /tool/x86_64-linux-musl/include/c++/6.0.1/array:39,
 from /tool/x86_64-linux-musl/include/c++/6.0.1/tuple:39,
 from 
/tool/x86_64-linux-musl/include/c++/6.0.1/bits/stl_map.h:63,
 from /tool/x86_64-linux-musl/include/c++/6.0.1/map:61,
 from /src/gcc/gcc/auto-profile.c:36:
/tool/x86_64-linux-musl/include/sched.h:74:7: error: attempt to use poisoned 
"calloc"
 void *calloc(size_t, size_t);
   ^
/tool/x86_64-linux-musl/include/sched.h:114:36: error: attempt to use poisoned 
"calloc"
 #define CPU_ALLOC(n) ((cpu_set_t *)calloc(1,CPU_ALLOC_SIZE(n)))
^



Re: GCC 6 symbol poisoning and c++ header usage is fragile

2016-04-21 Thread Szabolcs Nagy
On 21/04/16 12:36, Richard Biener wrote:
> On Thu, Apr 21, 2016 at 1:11 PM, Szabolcs Nagy  wrote:
>> building gcc6 using musl based gcc6 fails with symbol poisoning error
>> (details at the end of the mail).
>>
>> the root cause is c++: c++ headers include random libc headers with
>> _GNU_SOURCE ftm so all sorts of unexpected symbols are defined/declared.
>>
>> since it's unlikely the c++ standard gets fixed (to properly specify
>> the namespace rules) it is not acceptable to include std headers after
>> system.h, where the poisoning happens, because trivial libc header
>> change will break the build.
>>
>> c++ header use in gcc seems inconsistent, e.g. there are cases where
>> -  is included before system.h (go-system.h)
>> -  is included after system.h (indirectly through )
>> -  is included in system.h because INCLUDE_STRING is defined.
>> -  is included in system.h and in source files using it.. sometimes
>>
>> i think it should be consistently before system.h (i'm not sure what's
>> going on with the INCLUDE_STRING macro), fortunately not many files are
>> affected in gcc/:
> 
> system headers should be included from _within_ system.h.  To avoid including
> them everywhere we use sth like
> 
> /* Include  before "safe-ctype.h" to avoid GCC poisoning
>the ctype macros through safe-ctype.h */
> 
> #ifdef __cplusplus
> #ifdef INCLUDE_STRING
> # include 
> #endif
> #endif
> 
> so sources do
> 
> #define INCLUDE_STRING
> #include "config.h"
> #include "system.h"
> 
> So the  cases can be simplified with INCLUDE_STRING and the
>  case should be added similarly (unless we decide  is cheap
> enough to be always included).
> 

 is always included already.

there is also , ,  usage and go-system.h is special.
(and gmp.h includes  when built with c++)

so i can prepare a patch with INCLUDE_{MAP,SET,LIST} and remove
the explicit libc/libstdc++ includes.

> Richard.
> 
>> auto-profile.c
>> diagnostic.c
>> graphite-isl-ast-to-gimple.c
>> ipa-icf.c
>> ipa-icf-gimple.c
>> pretty-print.c
>> toplev.c
>>
>> i can prepare a patch moving the c++ includes up and i'm open to
>> other suggestions. (including libc headers is also problematic because
>> of _GNU_SOURCE, but still safer than what is happening in c++ land
>> where #include  makes all locale.h, pthread.h, time.h, sched.h,
>> etc symbols visible).
>>
>>
>> x86_64-linux-musl-g++ -fno-PIE -c   -g -O2 -DIN_GCC -fno-exceptions 
>> -fno-rtti -fasynchronous-unwind-tables
>> -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual 
>> -Wmissing-format-attribute -Woverloaded-virtual -pedantic
>> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings   
>> -DHAVE_CONFIG_H -I. -I. -I/src/gcc/gcc
>> -I/src/gcc/gcc/. -I/src/gcc/gcc/../include -I/src/gcc/gcc/../libcpp/include 
>> -I/build/host-tools/include
>> -I/build/host-tools/include -I/build/host-tools/include  
>> -I/src/gcc/gcc/../libdecnumber
>> -I/src/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
>> -I/src/gcc/gcc/../libbacktrace   -o auto-profile.o -MT
>> auto-profile.o -MMD -MP -MF ./.deps/auto-profile.TPo 
>> /src/gcc/gcc/auto-profile.c
>> In file included from /tool/x86_64-linux-musl/include/pthread.h:30:0,
>>  from 
>> /tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr-default.h:35,
>>  from 
>> /tool/x86_64-linux-musl/include/c++/6.0.1/x86_64-linux-musl/bits/gthr.h:148,
>>  from 
>> /tool/x86_64-linux-musl/include/c++/6.0.1/ext/atomicity.h:35,
>>  from 
>> /tool/x86_64-linux-musl/include/c++/6.0.1/bits/basic_string.h:39,
>>  from /tool/x86_64-linux-musl/include/c++/6.0.1/string:52,
>>  from /tool/x86_64-linux-musl/include/c++/6.0.1/stdexcept:39,
>>  from /tool/x86_64-linux-musl/include/c++/6.0.1/array:39,
>>  from /tool/x86_64-linux-musl/include/c++/6.0.1/tuple:39,
>>  from 
>> /tool/x86_64-linux-musl/include/c++/6.0.1/bits/stl_map.h:63,
>>  from /tool/x86_64-linux-musl/include/c++/6.0.1/map:61,
>>  from /src/gcc/gcc/auto-profile.c:36:
>> /tool/x86_64-linux-musl/include/sched.h:74:7: error: attempt to use poisoned 
>> "calloc"
>>  void *calloc(size_t, size_t);
>>^
>> /tool/x86_64-linux-musl/include/sched.h:114:36: error: attempt to use 
>> poisoned "calloc"
>>  #define CPU_ALLOC(n) ((cpu_set_t *)calloc(1,CPU_ALLOC_SIZE(n)))
>> ^
>>
> 



Re: GCC 6 symbol poisoning and c++ header usage is fragile

2016-04-21 Thread Szabolcs Nagy
On 21/04/16 12:52, Jonathan Wakely wrote:
> On 21 April 2016 at 12:11, Szabolcs Nagy wrote:
>> the root cause is c++: c++ headers include random libc headers with
>> _GNU_SOURCE ftm so all sorts of unexpected symbols are defined/declared.
> 
> Yes, I'd really like to be able to stop defining _GNU_SOURCE
> unconditionally. It needs some libstdc++ and glibc changes for that to
> happen, I'll be looking at it for gcc 7.
> 
> 
>> since it's unlikely the c++ standard gets fixed (to properly specify
>> the namespace rules)
> 
> Fixed how? What's wrong with the rules? (I'd like to understand what's
> wrong here before I try to change anything, and I don't understand the
> comment above).
> 

posix has "namespace rules" specifying what symbols
are reserved for the implementation when certain
headers are included.
(it's not entirely trivial, i have a collected list
http://port70.net/~nsz/c/posix/reserved.txt
http://port70.net/~nsz/c/posix/README.txt
i use for testing musl headers, glibc also does
such namespace checks.)

e.g. the declared function names in a header are
reserved to be defined as macros.

c++ does not specify how its headers interact with
posix headers except for a few c standard headers
where it requires no macro definition for functions
(and imposes some other requirements on the libc
like being valid c++ syntax, using extern "C" where
appropriate etc).

so from a libc implementor's point of view, including
libc headers into c++ code is undefined behaivour
(neither posix nor c++ specifies what should happen).
without a specification libc headers just piling
#ifdef __cplusplus hacks when ppl run into problems.

e.g. c++ code uses ::pthread_equal(a,b), but musl used
a macro for pthread_equal (the only sensible
implementation is (a)==(b), this has to be suppressed
for c++, which now uses an extern call to do the
same), i'm also pretty sure a large number of c++
code would break if unistd.h defined "read", "write",
"link" etc as macros, since these are often used as
method names in c++, but this would be a conforming
libc implementation.



Re: SafeStack proposal in GCC

2016-05-10 Thread Szabolcs Nagy
On 09/05/16 22:45, Michael Matz wrote:
> On Mon, 9 May 2016, Rich Felker wrote:
> 
>>> Done.  I never understood why they left in the hugely unuseful 
>>> {sig,}{set,long}jmp() but removed the actually useful *context() 
>>> (amended somehow like above).
>>
>> Because those are actually part of the C language
> 
> Sure.  Same QoI bug in my book.  (And I'm not motivated enough to find out 
> if the various C standards weren't just following POSIX whe setjmp was 
> included, or really the other way around).
> 
>> (the non-sig versions, but the sig versions are needed to work around 
>> broken unices that made the non-sig versions save/restore signal mask 
>> and thus too slow to ever use). They're also much more useful for 
>> actually reasonable code (non-local exit across functions that were 
>> badly designed with no error paths)
> 
> Trivially obtainable with getcontext/setcontext as well.
> 
>> as opposed to just nasty hacks that 
>> are mostly/entirely UB anyway (coroutines, etc.).
> 
> Well, we differ in the definition of reasonable :)  And I certainly don't 
> see any material difference in undefined behaviour between both classes of 
> functions.  Both are "special" regarding compilers (e.g. returning 
> multiple times) and usage.  But as the *jmp() functions can be implemented 
> with *context(), but not the other way around, it automatically follows 

no, no, no,

don't try to present getcontext as equal to setjmp,
getcontext is broken while setjmp is just ugly.

setjmp is defined so that the compiler can treat it
specially and the caller has to make sure certain
objects are volatile, cannot appear in arbitrary
places (e.g. in the declaration of a vla), longjmp
must be in same thread etc.

all those requirements that make setjmp implementible
at all were missing from the getcontext specs, so you
can call it through a function pointer and access
non-volatile modified local state after the second
return, etc. (the compiler treating "getcontext"
specially is a hack, not justified by any standard.)

i think both gccgo and qemu can setcontext into another
thread, so when getcontext returns all tls object
addresses are wrong.. the semantics of this case was
not properly defined anywhere (and there are
implementation internal objects with thread local
storage duration like fenv so this matters even if
the caller does not use tls). this is unlikely to
work correctly with whatever safestack implementation.

setcontext were originally specified to be able to
use the ucontext from async signal handlers.. this
turned out to be problematic for several reasons
(kernel saved ucontext is different from user space
ucontext and sigaltstack needs special treatment).

if setcontext finishes executing the last linked
context in the main thread it was not clearly
specified what cleanups will be performed.

there is just a never ending list of issues with
these apis, so unless there is an actual proposal
how to tighten their specification, any caller of
the context apis rely on undefined semantics.

> (to me!) that the latter are more useful, if for nothing else than basic 
> building blocks.  (there are coroutine libs that try to emulate a real 
> makecontext with setjmp/longjmp on incapable architectures.  As this is 
> impossible for all corner cases they are broken and generally awful on 
> them :) )
> 
> 
> Ciao,
> Michael.
> 



Re: LTO and undefined reference to typeinfo

2016-05-23 Thread Szabolcs Nagy
On 23/05/16 12:36, MM wrote:
> Hello,
> 
> g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
> GNU gold (version 2.25-17.fc23) 1.11
> I successfully link a executable in debug mode (-std=c++11 -g) but not in
> release mode (-std=c++11 -flto -O3). All sources are compiled with the same
> option. Shared libraries are used.
> The compiler driver is used to launch the final link line:
> /bin/c++-std=c++11 -Wno-multichar -O3 -DNDEBUG -flto   
>  -o  -rdynamic  Wl,-rpath,
> 
> These are the errors I see (only in release, not in debug):
>  ... [clone .constprop.79]: error: undefined reference to
>  'typeinfo for market [clone .lto_priv.1353]'
> 
> Both the debug and release version of the object referencing this show the
> same with gcc-nm:
> 
>  U typeinfo for market
>  Note this bit   " [clone .lto_priv.1353]" is not in the symbol at all.
> 
> This is what gcc-nm says for the object where the symbol is defined
> (market.cpp.o, which is part of libmarkets.so):
> 
> 1. In DEBUG
> gcc-nm -C market.cpp.o |  grep 'typeinfo for market'
>   V typeinfo for market
> 
> 2. In RELEASE
> gcc-nm -C market.cpp.o |  grep 'typeinfo for market'
>  W typeinfo for market
> This is the one that fails.
> Given the versions of gcc and ld, the default behaviour for lto should be
> straightforward?
> Any ideas what's going on?
> 

typeinfo seems to be a weak object symbol
which is known to be broken with lto, so
this may be related to:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69271


> Thanks
> MM
> 



Re: LTO and undefined reference to typeinfo

2016-05-24 Thread Szabolcs Nagy
On 23/05/16 14:24, MM wrote:
> On 23 May 2016 at 12:53, Szabolcs Nagy  wrote:
>> On 23/05/16 12:36, MM wrote:
>>> Hello,
>>>
>>> g++ (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
>>> GNU gold (version 2.25-17.fc23) 1.11
>>> I successfully link a executable in debug mode (-std=c++11 -g) but not in
>>> release mode (-std=c++11 -flto -O3). All sources are compiled with the same
>>> option. Shared libraries are used.
>>> The compiler driver is used to launch the final link line:
>>> /bin/c++-std=c++11 -Wno-multichar -O3 -DNDEBUG -flto   
>>>  -o  -rdynamic  Wl,-rpath,
>>>
>>> These are the errors I see (only in release, not in debug):
>>>  ... [clone .constprop.79]: error: undefined reference to
>>>  'typeinfo for market [clone .lto_priv.1353]'
>>>
>>> Both the debug and release version of the object referencing this show the
>>> same with gcc-nm:
>>>
>>>  U typeinfo for market
>>>  Note this bit   " [clone .lto_priv.1353]" is not in the symbol at all.
>>>
>>> This is what gcc-nm says for the object where the symbol is defined
>>> (market.cpp.o, which is part of libmarkets.so):
>>>
>>> 1. In DEBUG
>>> gcc-nm -C market.cpp.o |  grep 'typeinfo for market'
>>>   V typeinfo for market
>>>
>>> 2. In RELEASE
>>> gcc-nm -C market.cpp.o |  grep 'typeinfo for market'
>>>  W typeinfo for market
>>> This is the one that fails.
>>> Given the versions of gcc and ld, the default behaviour for lto should be
>>> straightforward?
>>> Any ideas what's going on?
>>>
>>
>> typeinfo seems to be a weak object symbol
>> which is known to be broken with lto, so
>> this may be related to:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69271
>>
> 
> Is it a workaround to not compile the referencing cpp and the referred
> cpp without lto, yet compile all the rest and link with lto?
> Otherwise, I'll turn off LTO until that bug is resolved.

it is not clear if this the same issue as pr692771,
so i think you should submit a bug report with
test code if possible.

> 
> Thanks
> 



Re: Should we import gnulib under gcc/ or at the top-level like libiberty?

2016-06-23 Thread Szabolcs Nagy
On 23/06/16 12:18, Pedro Alves wrote:
> gdb doesn't put that gnulib wrapper library at the top level, mainly
> just because of history -- we didn't always have that wrapper
> library -- and the fact that gdb/gdbserver/ itself is not at top
> level either, even though it would be better moved to top level.
> 
> See this long email, explaining how the current gdb's gnulib import
> is set up:
> 
>  https://sourceware.org/ml/gdb-patches/2012-04/msg00426.html
> 
> I suggest gcc reuses the whole of gdb's wrapper library and scripts:
> 
>  
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=tree;f=gdb/gnulib;h=cdf326774716ae427dc4fb47c9a410fcdf715563;hb=HEAD
> 
> ... but put it in the top level instead.

if both gcc and binutils used a toplevel gnulib directory
then shared tree build would have the same problem as
libiberty has now: gcc and binutils can depend on different
versions of libiberty and then the build can fail.

as far as i know the shared tree build is the only way to
build a toolchain without install (using in tree binutils)
and it would be nice to fix that use case.



Re: GCC libatomic ABI specification draft

2016-11-29 Thread Szabolcs Nagy
On 17/11/16 20:12, Bin Fan wrote:
> 
> Although this ABI specification specifies that 16-byte properly aligned 
> atomics are inlineable on platforms
> supporting cmpxchg16b, we document the caveats here for further discussion. 
> If we decide to change the
> inlineable attribute for those atomics, then this ABI, the compiler and the 
> runtime implementation should be
> updated together at the same time.
> 
> 
> The compiler and runtime need to check the availability of cmpxchg16b to 
> implement this ABI specification.
> Here is how it would work: The compiler can get the information either from 
> the compiler flags or by
> inquiring the hardware capabilities. When the information is not available, 
> the compiler should assume that
> cmpxchg16b instruction is not supported. The runtime library implementation 
> can also query the hardware
> compatibility and choose the implementation at runtime. Assuming the user 
> provides correct compiler options

with this abi the runtime implementation *must* query the hardware
(because there might be inlined cmpxchg16b in use in another module
on a hardware that supports it and the runtime must be able to sync
with it).

currently gcc libatomic does not guarantee this which is dangerously
broken: if gcc is configured with --disable-gnu-indirect-function
(or on targets without ifunc support: solaris, bsd, android, musl,..)
the compiler may inline cmpxchg16b in one translation unit but use
incompatible runtime function in another.

there is PR 70191 but this issue has wider scope.

> and the inquiry returns the correct information, on a platform that supports 
> cmpxchg16b, the code generated
> by the compiler will both use cmpxchg16b; on a platform that does not support 
> cmpxchg16b, the code generated
> by the compiler, including the code generated for a generic platform, always 
> call the support function, so
> there is no compatibility problem.



Re: GCC libatomic ABI specification draft

2016-12-20 Thread Szabolcs Nagy
On 20/12/16 13:26, Ulrich Weigand wrote:
> Torvald Riegel wrote:
>> On Fri, 2016-12-02 at 12:13 +0100, Gabriel Paubert wrote:
>>> On Thu, Dec 01, 2016 at 11:13:37AM -0800, Bin Fan at Work wrote:
 Thanks for the comment. Yes, the ABI requires libatomic must query the 
 hardware. This is 
 necessary if we want the compiler to generate inlined code for 16-byte 
 atomics. Note that 
 this particular issue only affects x86. 
>>>
>>> Why? Power (at least recent ones) has 128 bit atomic instructions
>>> (lqarx/stqcx.) and Z has 128 bit compare and swap. 
>>
>> That's not the only factor affecting whether cmpxchg16b or such is used
>> for atomics.  If the HW just offers a wide CAS but no wide atomic load,
>> then even an atomic load is not truly just a load, which breaks (1)
>> atomic loads on read-only mapped memory and (2) volatile atomic loads
>> (unless we claim that an idempotent store is like a load, which is quite
>> a stretch for volatile I think).
> 
> I may have missed the context of the discussion, but just on the
> specific ISA question here: both Power and z not only have the
> 16-byte CAS (or load-and-reserve/store-conditional), but they also both
> have specific 16-byte atomic load and store instructions (lpq/stpq
> on z, lq/stq on Power).
> 
> Those are available on any system supporting z/Architecture (z900 and up),
> and on any Power system supporting the V2.07 ISA (POWER8 and up).  GCC
> does in fact use those instructions to implement atomic operations on
> 16-byte data types on those machines.

that's a bug.

at least i don't see how gcc makes sure the libatomic
calls can interoperate with inlined atomics.



Re: GCC libatomic ABI specification draft

2017-01-04 Thread Szabolcs Nagy
On 22/12/16 17:37, Segher Boessenkool wrote:
> We do not always have all atomic instructions.  Not all processors have
> all, and it depends on the compiler flags used which are used.  How would
> libatomic know what compiler flags are used to compile the program it is
> linked to?
> 
> Sounds like a job for multilibs?

x86_64 uses ifunc dispatch to always use atomic
instructions if available (which is bad because
ifunc is not supported on all platforms).

either such runtime feature detection and dispatch
is needed in libatomic or different abis have to
be supported (with the usual hassle).



Re: .../lib/gcc//7.1.1/ vs. .../lib/gcc//7/

2017-01-06 Thread Szabolcs Nagy
On 06/01/17 12:48, Jakub Jelinek wrote:
> SUSE and some other distros use a hack that omits the minor and patchlevel
> versions from the directory layout, just uses the major number, it is very

what is the benefit?



Re: .../lib/gcc//7.1.1/ vs. .../lib/gcc//7/

2017-01-06 Thread Szabolcs Nagy
On 06/01/17 13:11, Jakub Jelinek wrote:
> On Fri, Jan 06, 2017 at 01:07:23PM +0000, Szabolcs Nagy wrote:
>> On 06/01/17 12:48, Jakub Jelinek wrote:
>>> SUSE and some other distros use a hack that omits the minor and patchlevel
>>> versions from the directory layout, just uses the major number, it is very
>>
>> what is the benefit?
> 
> Various packages use the paths to gcc libraries/includes etc. in various
> places (e.g. libtool, *.la files, etc.).  So any time you upgrade gcc

it is a bug that gcc installs libtool la files,
because a normal cross toolchain is relocatable
but the la files have abs path in them.

that would be nice to fix, so build scripts don't
have to manually delete the bogus la files.

> (say from 6.1.0 to 6.2.0 or 6.2.0 to 6.2.1), everything that has those paths
> needs to be rebuilt.  By having only the major number in the paths (which is
> pretty much all that matters), you only have to rebuild when the major
> version of gcc changes (at which time one usually want to mass rebuild
> everything anyway).

i thought only the gcc driver needs to know
these paths because there are no shared libs
there that are linked into binaries so no binary
references those paths so nothing have to be
rebuilt.



weak pthread symbols in libgcc/gthr-posix.h cause issues

2014-11-16 Thread Szabolcs Nagy
the weakref magic in libgcc/gthr-posix.h is not guaranteed to work
which can at least break libstdc++ with static linking and dlopen

there are several bugs here:

- fallback code (unknown posix systems) should assume multi-threaded
application instead of using a fragile threadedness test

- determining threadedness with weak symbols is broken for dynamic
loading and static linking as well (dlopened library can pull in pthread
dependency at runtime, and with static linking a symbol does not indicate
the availability of another)

- using symbols through weak references at runtime is wrong with static
linking (it just happens to work with hacks that put a single .o into
libpthread.a)

see this analysis for more details and crashing example code:

http://www.openwall.com/lists/musl/2014/10/18/5

the static linking issue there was fixed by unconditionally disabling
the weak symbols in libgcc/gthr.h when building the toolchain:

#define GTHREAD_USE_WEAK 0


i sent this report to the libstdc++ list first but got redirected here:

https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html

the static linking issue there was worked around by using linker flags
'-Wl,--whole-archive -lpthread -Wl,--no-whole-archive'

i think upstream should fix this properly


Re: RFC: Creating a more efficient sincos interface

2018-09-13 Thread Szabolcs Nagy
On 13/09/18 14:52, Florian Weimer wrote:
> On 09/13/2018 03:27 PM, Wilco Dijkstra wrote:
>> Hi,
>>
>> The existing sincos functions use 2 pointers to return the sine and cosine 
>> result. In
>> most cases 4 memory accesses are necessary per call. This is inefficient and 
>> often
>> significantly slower than returning values in registers. I ran a few 
>> experiments on the
>> new optimized sincosf implementation in GLIBC using the following interface:
>>
>> __complex__ float sincosf2 (float);
>>
>> This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 
>> for
>> random inputs in the range +-PI/4. Larger inputs take longer and thus have 
>> lower
>> gains, but there is still a 5% gain on the (rarely used) path with full 
>> range reduction.
>> Given sincos is used in various HPC applications this can give a worthwile 
>> speedup.
> 
> I think this is totally fine if you call it expif or something like that (and 
> put the sine in the imaginary part, of course).
> 

gcc seems to have a __builtin_cexpif
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/builtins.c;h=58ea7475ef7bb2a8abad2463b896efaa8fd79650;hb=HEAD#l2439

but i dont see it documented, may be we
can add an actual cexpif symbol with the
above signature?

> In general, I would object to using complex numbers for arbitrary pairs, but 
> this doesn't apply to this case.
> 
> Thanks,
> Florian



Re: TLSDESC clobber ABI stability/futureproofness?

2018-10-11 Thread Szabolcs Nagy
On 11/10/18 04:53, Alexandre Oliva wrote:
> On Oct 10, 2018, Rich Felker  wrote:
>> For aarch64 at least, according to discussions I had with Szabolcs
>> Nagy, there is an intent that any new extensions to the aarch64
>> register file be treated as clobbered by tlsdesc functions, rather
>> than preserved.
> 
> That's unfortunate.  I'm not sure I understand the reasoning behind this
> intent.  Maybe we should discuss it further?
> 

sve registers overlap with existing float registers
so float register access clobbers them.

so new code is suddenly not compatible with existing
tlsdesc entry points in the libc.

i think extensions should not cause such abi break.
we could mark binaries so they fail to load on an old
system instead of failing randomly at runtime, but
requiring new libc for a new system is suboptimal
(you cannot deploy stable linux distros then).

if we update the libc then the tlsdesc entry has to
save/restore all sve regs, which can be huge state
(cause excessive stack usage), but more importantly
suddenly the process becomes "sve enabled" even if it
otherwise does not use sve at all (linux kernel keeps
track of which processes use sve instructions, ones
that don't can enter the kernel more quickly as the
sve state does not have to be saved)

i don't see a good solution for this, but in practice
it's unlikely that user code would need tls access and
sve together much, so it seems reasonable to just add
sve registers to tlsdesc call clobber list and do the
same for future extensions too (tlsdesc call will not
be worse than a normal indirect call).

(in principle it's possible that tlsdesc entry avoids
using any float regs, but in practice that requires
hackery in the libc: marking every affected translation
units with -mgeneral-regs-only or similar)


Re: Parallelize the compilation using Threads

2018-11-15 Thread Szabolcs Nagy
On 15/11/18 10:29, Richard Biener wrote:
> In my view (I proposed the thing) the most interesting parts are
> getting GCCs global state documented and reduced.  The parallelization
> itself is an interesting experiment but whether there will be any
> substantial improvement for builds that can already benefit from make
> parallelism remains a question.

in the common case (project with many small files, much more than
core count) i'd expect a regression:

if gcc itself tries to parallelize that introduces inter thread
synchronization and potential false sharing in gcc (e.g. malloc
locks) that does not exist with make parallelism (glibc can avoid
some atomic instructions when a process is single threaded).


Re: autovectorization in gcc

2019-01-10 Thread Szabolcs Nagy
On 10/01/2019 08:19, Richard Biener wrote:
> On Wed, 9 Jan 2019, Jakub Jelinek wrote:
> 
>> On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote:
>>> extern void vf1()
>>> {
>>>#pragma vectorize enable
>>>for ( int i = 0 ; i < 32768 ; i++ )
>>>  data [ i ] = std::sqrt ( data [ i ] ) ;
>>> }
>>>
>>> Compiling on this x86_64 box with -fopt-info-vec-missed shows the
>>
>>>   _7 = .SQRT (_1);
>>>   if (_1 u>= 0.0)
>>> goto ; [99.95%]
>>>   else
>>> goto ; [0.05%]
>>>
>>>[local count: 1062472912]:
>>>   goto ; [100.00%]
>>>
>>>[local count: 531495]:
>>>   __builtin_sqrtf (_1);
>>>
>>> I'm not sure where that control flow came from: it isn't in
>>>   sqrt-test.cc.104t.stdarg
>>> but is in
>>>   sqrt-test.cc.105t.cdce
>>> so I think it's coming from the argument-range code in cdce.
>>>
>>> Arguably the location on the statement is wrong: it's on the loop
>>> header, when it presumably should be on the std::sqrt call.
>>
>> See my either mail, it is the result of the -fmath-errno default,
>> the inline emitted sqrt doesn't handle errno setting and we emit
>> essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); 
>> where
>> the former sqrt is inline using HW instructions and the latter is the
>> library call.
>>
>> With some extra work we could vectorize it; e.g. if we make it handle
>> OpenMP #pragma omp ordered simd efficiently, it would be the same thing
>> - allow non-vectorizable portions of vectorized loops by doing there a
>> scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the 
>> limitation
>> that the vectorized loop is a single bb.  Essentially, in this case it would
>> be
>>   vec1 = vec_load (data + i);
>>   vec2 = vec_sqrt (vec1);
>>   if (__builtin_expect (any (vec2 < 0.0)))
>> {
>>   for (int i = 0; i < vf; i++)
>> sqrt (vec2[i]);
>> }
>>   vec_store (data + i, vec2);
>> If that would turn to be way too hard, we could for the vectorization
>> purposes hide that into the .SQRT internal fn, say add a fndecl argument to
>> it if it should treat the exceptional cases some way so that the control
>> flow isn't visible in the vectorized loop.
> 
> If we decide it's worth the trouble I'd rather do that in the epilogue
> and thus make the any (vec2 < 0.0) a reduction.  Like
> 
>smallest = min(smallest, vec1);
> 
> and after the loop do the errno thing on the smallest element.
> 
> That said, this is a transform that is probably worthwhile even
> on scalar code, possibly easiest to code-gen right from the start
> in the call-dce pass.

if this is useful other than errno handling then fine,
but i think it's a really bad idea to add optimization
complexity because of errno handling: nobody checks
errno after sqrt (other than conformance test code).

-fno-math-errno is almost surely closer to what the user
wants than trying to vectorize the errno handling.


Vector Function ABI specifications for AArch64 update

2019-05-13 Thread Szabolcs Nagy
Arm released an update (2019Q1.1) of the Vector Function ABI specifications for
AArch64 that uses the `declare variant` directive from OpenMP 5.0 to support 
user
defined vector functions. The mechanism is introduced in chapter 4, and it is in
beta release status to allow feedback from the open source community.
The mechanism also allows declaring SVE and AdvSIMD vector functions 
independently
which is not possible with the current OpenMP and attribute(simd) support in 
gcc.
Feedback needs to be provided at arm.eabi (at) arm.com by end of June 16th 
(AOE).

https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi

Thanks.


Re: aarch64 TLS optimizations?

2019-05-20 Thread Szabolcs Nagy
On 17/05/2019 14:51, Tom Horsley wrote:
> I'm trying (for reason too complex to go into) to
> locate the TLS offset of the tcache_shutting_down
> variable from malloc in the ubuntu provided
> glibc on aarch64 ubuntu 18.04.
> 
> Various "normal" TLS variables appear to operate
> much like x86_64 with a GOT table entry where the
> TLS offset of the variable gets stashed.

this is more of a glibc question than a gcc one
(i.e. libc-help list would be better).

tls in glibc uses the initial-exec tls access model,
(tls object is at a fixed offset from tp across threads),
that requires a GOT entry for the offset which is set
up via a R_*_TPREL dynamic reloc at startup time.

(note: if a symbol is internal to the module its TPREL
reloc is not tied to a symbol, it only has an addend
for the offset within the module)

> But in the ubuntu glibc there is no GOT entry for
> that variable, and disassembly of the code shows
> that it seems to "just know" the offset to use.

i see adrp+ldr sequences that access GOT entries.

e.g. in the objdump of libc.so.6:

000771d0 <__libc_malloc@@GLIBC_2.17>:
...
   77400:   f6c0adrpx0, 152000 

   77404:   f9470c00ldr x0, [x0, #3608]
   77408:   d53bd041mrs x1, tpidr_el0

you can verify that 0x152000 + 3608 == 0x152e18 is
indeed a GOT entry (falls into .got) and there is a

00152e18 R_AARCH64_TLS_TPREL64  *ABS*+0x0010

dynamic relocation for that entry as expected.
(but i don't know which symbol this entry is for,
only that the symbol must be a local tls sym)

> Is there some kind of magic TLS optimization that
> can happen for certain variables on aarch64? I'm trying
> to understand how it could know the offset like
> it appears to do in the code.

there is no magic.


Re: aarch64 TLS optimizations?

2019-05-20 Thread Szabolcs Nagy
On 20/05/2019 16:59, Tom Horsley wrote:
> On Mon, 20 May 2019 15:43:53 +
> Szabolcs Nagy wrote:
> 
>> you can verify that 0x152000 + 3608 == 0x152e18 is
>> indeed a GOT entry (falls into .got) and there is a
>>
>> 00152e18 R_AARCH64_TLS_TPREL64  *ABS*+0x0010
> 
> There are a couple of other TLS variables in malloc, and I
> suspect this is one of them, where it is actually looking
> at tcache_shutting_down (verified with debug info and disassembly),
> it is simply using the tpidr_el0 value still laying around
> in the register from the 1st TLS reference and loading
> tcache_shutting_down from an offset which appears for all the
> world to simply be hard coded, no GOT reference involved.
> 
> I suppose at some point I'll be forced to understand how to build
> glibc from the ubuntu source package so I can see exactly
> what options and ifdefs are used and check the relocations in
> the malloc.o file from before it is incorporated with libc.so

in my build of malloc.os in glibc in the symtab i see

84:  0 TLS LOCAL  DEFAULT   10 .LANCHOR3
85:  8 TLS LOCAL  DEFAULT   10 thread_arena
86: 0008 8 TLS LOCAL  DEFAULT   10 tcache
87: 0010 1 TLS LOCAL  DEFAULT   10 tcache_shutting_down

and the R_*_TLSIE_* relocs are for .LANCHOR3 + 0,
so there will be one GOT entry for the 3 objects
and you should see

tp + got_value + (0 or 8 or 16)

address computation to access the 3 objects.

e.g. in __malloc_arena_thread_freeres i see

4e04:   d53bd056mrs x22, tpidr_el0
4e08:   9015adrpx21, 0 <_dl_tunable_set_mmap_threshold> 
4e08: R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21   .LANCHOR3
4e0c:   f94002b5ldr x21, [x21]  4e0c: 
R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC .LANCHOR3
4e10:   a90153f3stp x19, x20, [sp, #16]
4e14:   8b1502c0add x0, x22, x21   // x0 = tp + got_value
4e18:   f9400414ldr x20, [x0, #8]  // read from tcache
4e1c:   f9001bf7str x23, [sp, #48]
4e20:   b4000234cbz x20, 4e64 
<__malloc_arena_thread_freeres+0x6c>
4e24:   52800021mov w1, #0x1// #1
4e28:   91010293add x19, x20, #0x40
4e2c:   91090297add x23, x20, #0x240
4e30:   f900041fstr xzr, [x0, #8] // write to tcache
4e34:   39004001strbw1, [x0, #16] // write to 
tchace_shutting_down

i doubt ubuntu changed this, but if the offset is
a fixed const in the binary that means they moved
that variable into the glibc internal pthread struct
(which is at a fixed offset from tp).



[AArch64 ELF ABI] Vector calls and lazy binding on AArch64

2019-05-22 Thread Szabolcs Nagy
The lazy binding code of aarch64 currently only preserves q0-q7 of the
fp registers, but for an SVE call [AAPCS64+SVE] it should preserve p0-p3
and z0-z23, and for an AdvSIMD vector call [VABI64] it should preserve
q0-q23. (Vector calls are extensions of the base PCS [AAPCS64].)

A possible fix is to save and restore the additional register state in
the lazy binding entry code, this was discussed in

  https://sourceware.org/ml/libc-alpha/2018-08/msg00017.html

the main objections were

(1) Linux may optimize the kernel entry code for processes that don't
use SVE, so lazy binding should avoid accessing SVE registers.

(2) If this is fixed in the dynamic linker, vector calls will not be
backward compatible with old glibc.

(3) The saved SVE register state can be large (> 8K), so binaries that
work today may run out of stack space on an SVE system during lazy
binding (which can e.g. happen in a signal handler on a tiny stack).

and the proposed solution was to force bind now semantics for vector
functions e.g. by not calling them via PLT. This turned out to be harder
than I expected. I no longer think (1) and (2) are critically important,
but (3) is a correctness issue which is hard to argue away (would
require larger stack allocations to accommodate the worst case stack
size increase, but the stack allocation is not always under the control
of glibc, so it cannot provide strict guarantees).

Some approaches to make symbols "bind now" were discussed at

  https://groups.google.com/forum/#!topic/generic-abi/Bfb2CwX-u4M

The ABI change draft is below the notes, it requires marking symbols
in the ELF symbol table that follow the vector PCS (or other variant
PCS conventions). This is most relevant to dynamic linkers with lazy
binding support and to ELF linkers targeting AArch64, but assemblers
will need to be updated too.

Note 1: the dynamic linker may have to run user code during lazy binding
because of ifunc resolvers, so it cannot avoid clobbering fp regs.

Note 2: the tlsdesc entry is also affected by (3), so either the the
initial DTV setup should avoid clobbering fp regs or the SVE register
state should not be callee-preserved by the tlsdesc call ABI (the latter
was chosen, which is backward compatible with old dynamic linkers, but
tls access from SVE code is as expensive as an extern call now: the
caller has to spill).

Note 3: signal frame and SVE register spills in code using SVE can also
lead to variable stack usage (AT_MINSIGSZTKSZ was introduced to address
the former issue on linux) so it is a valid approach to just increase
min stack size limits on aarch64 compared to other targets (this is less
invasive, but does not fix old binaries).

Note 4: the proposal requires marking symbols in asm and elf objects, so
it is not compatible with existing tooling (old as or ld cannot create
valid vector function symbol references or definitions) and it is only
effective with a new dynamic linker.

Note 5: -fno-plt style code generation for vector function calls might
have worked too, but on aarch64 it requires compiler and linker changes
to avoid PLT in position dependent code when that is emitted for the
sake of pointer equality. It also requires tightening the ABI to ensure
the static linker does not introduce PLT when processing certain static
relocations. This approach would generate suboptimal static linked code
(the no-plt code is hard to relax into direct calls on aarch64) fragile
(easy to accidentally introduce a PLT) and hard to diagnose.

Note 6: the proposed solution applies to both SVE calls and AdvSIMD
vector calls, even though some issues only apply to SVE.

Note 7: a separate dynamic linker entry point for variant PCS calls
may be introduced (requires further ELF changes for a PLT0 like stub)
or the dynamic linker may decide to always preserve all registers or
decide to always bind symbols at load time.


AAELF64: in the Symbol Table section add

 st_other Values
 The  st_other  member  of  a symbol table entry specifies the symbol's
 visibility in the lowest 2 bits.  The top 6 bits  are  unused  in  the
 generic  ELF ABI [SCO-ELF], and while there are no values reserved for
 processor-specific semantics, many other architectures have used these
 bits.

 The  defined  processor-specific  st_other  flag  values are listed in
 Table 4-5-1.

 Table 4-5-1, Processor specific st_other flags
 ++--+-+
 |Name| Mask | Comment |
 ++--+-+
 |STO_AARCH64_VARIANT_PCS | 0x80 | Thefunction |
 ||  | associated with the |
 ||  | symbol may follow a |
 ||  | variant   procedure |
 ||  | call  standard with |
 |   

Re: [AArch64 ELF ABI] Vector calls and lazy binding on AArch64

2019-05-22 Thread Szabolcs Nagy
On 22/05/2019 16:06, Florian Weimer wrote:
> * Szabolcs Nagy:
> 
>> AAELF64: in the Symbol Table section add
>>
>>  st_other Values
>>  The  st_other  member  of  a symbol table entry specifies the symbol's
>>  visibility in the lowest 2 bits.  The top 6 bits  are  unused  in  the
>>  generic  ELF ABI [SCO-ELF], and while there are no values reserved for
>>  processor-specific semantics, many other architectures have used these
>>  bits.
>>
>>  The  defined  processor-specific  st_other  flag  values are listed in
>>  Table 4-5-1.
>>
>>  Table 4-5-1, Processor specific st_other flags
>>  ++--+-+
>>  |Name| Mask | Comment |
>>  ++--+-+
>>  |STO_AARCH64_VARIANT_PCS | 0x80 | Thefunction |
>>  ||  | associated with the |
>>  ||  | symbol may follow a |
>>  ||  | variant   procedure |
>>  ||  | call  standard with |
>>  ||  | different  register |
>>  ||  | usage convention.   |
>>  ++--+-+
>>
>>  A  symbol  table entry that is marked with the STO_AARCH64_VARIANT_PCS
>>  flag set in its st_other field may be associated with a function  that
>>  follows  a  variant  procedure  call  standard with different register
>>  usage convention from the one  defined  in  the  base  procedure  call
>>  standard  for  the  list  of  argument,  caller-saved and callee-saved
>>  registers [AAPCS64].  The rules  in  the  Call  and  Jump  relocations
>>  section  still  apply to such functions, and if a subroutine is called
>>  via a symbol reference that  is  marked  with  STO_AARCH64_VARIANT_PCS
>>  then  code that runs between the calling routine and called subroutine
>>  must preserve the contents of all registers except IP0,  IP1  and  the
>>  condition code flags [AAPCS64].
> 
> Can you clarify if there has to be a valid stack at this point which can
> be used during the call transfer?  What about the stack alignment
> requirement?

the intention is to only allow 'register usage convention' to be
relaxed compared to the base PCS (which has rules for stack etc),
and even the register usage convention has to be compatible with
the 'Call and Jump relocations section' which essentially says that
veneers inserted by the linker between calls can clobber IP0, IP1
and the condition flags.

i.e. a variant pcs function follows the same rules as base pcs, but
it may use different caller-/callee-saved/argument regiseters.

when SVE pcs is merged into the current AAPCS document, then i hope
the 'variant pcs' term used here will be properly specified so the
ELF ABI will just refer back to that.



Re: [AArch64 ELF ABI] Vector calls and lazy binding on AArch64

2019-05-22 Thread Szabolcs Nagy
On 22/05/2019 16:34, Florian Weimer wrote:
> * Szabolcs Nagy:
> 
>> On 22/05/2019 16:06, Florian Weimer wrote:
>>> * Szabolcs Nagy:
>>>
>>>> AAELF64: in the Symbol Table section add
>>>>
>>>>  st_other Values
>>>>  The  st_other  member  of  a symbol table entry specifies the symbol's
>>>>  visibility in the lowest 2 bits.  The top 6 bits  are  unused  in  the
>>>>  generic  ELF ABI [SCO-ELF], and while there are no values reserved for
>>>>  processor-specific semantics, many other architectures have used these
>>>>  bits.
>>>>
>>>>  The  defined  processor-specific  st_other  flag  values are listed in
>>>>  Table 4-5-1.
>>>>
>>>>  Table 4-5-1, Processor specific st_other flags
>>>>  ++--+-+
>>>>  |Name| Mask | Comment |
>>>>  ++--+-+
>>>>  |STO_AARCH64_VARIANT_PCS | 0x80 | Thefunction |
>>>>  ||  | associated with the |
>>>>  ||  | symbol may follow a |
>>>>  ||  | variant   procedure |
>>>>  ||  | call  standard with |
>>>>  ||  | different  register |
>>>>  ||  | usage convention.   |
>>>>  ++--+-+
>>>>
>>>>  A  symbol  table entry that is marked with the STO_AARCH64_VARIANT_PCS
>>>>  flag set in its st_other field may be associated with a function  that
>>>>  follows  a  variant  procedure  call  standard with different register
>>>>  usage convention from the one  defined  in  the  base  procedure  call
>>>>  standard  for  the  list  of  argument,  caller-saved and callee-saved
>>>>  registers [AAPCS64].  The rules  in  the  Call  and  Jump  relocations
>>>>  section  still  apply to such functions, and if a subroutine is called
>>>>  via a symbol reference that  is  marked  with  STO_AARCH64_VARIANT_PCS
>>>>  then  code that runs between the calling routine and called subroutine
>>>>  must preserve the contents of all registers except IP0,  IP1  and  the
>>>>  condition code flags [AAPCS64].
>>>
>>> Can you clarify if there has to be a valid stack at this point which can
>>> be used during the call transfer?  What about the stack alignment
>>> requirement?
>>
>> the intention is to only allow 'register usage convention' to be
>> relaxed compared to the base PCS (which has rules for stack etc),
>> and even the register usage convention has to be compatible with
>> the 'Call and Jump relocations section' which essentially says that
>> veneers inserted by the linker between calls can clobber IP0, IP1
>> and the condition flags.
>>
>> i.e. a variant pcs function follows the same rules as base pcs, but
>> it may use different caller-/callee-saved/argument regiseters.
>>
>> when SVE pcs is merged into the current AAPCS document, then i hope
>> the 'variant pcs' term used here will be properly specified so the
>> ELF ABI will just refer back to that.
> 
> My concern is that with the current language, it's not clear whether
> it's possible to use the stack as a scratch area during the call
> transition, or rely on a valid TCB.  I think this is rather
> underspecified.

i think that's underspecified in general for normal calls too,
currently the glibc dynamic linker assumes it can use some stack
space and do various async signal safe operations (some of which
may even fail), variant pcs does not change any of this.

it only provides a per symbol escape hatch for functions with a
bit special call convention, and i plan to use the symbol marking
in glibc as 'force bind now for these symbols', because other
behaviour may not be forward compatible if the architecture
changes again (if lazy binding turns out to be very important
for these symbols i'd prefer introducing a second entry point
for them instead of checking the elf flags from the entry asm).

i'll try to post patches implementing this abi soon.


Re: [AArch64 ELF ABI] Vector calls and lazy binding on AArch64

2019-06-28 Thread Szabolcs Nagy
On 22/05/2019 15:42, Szabolcs Nagy wrote:
> [AAELF64]: ELF for the Arm 64-bit Architecture (AArch64)
>https://developer.arm.com/docs/ihi0056/latest
> [VABI64]:  Vector Function ABI Specification for AArch64
>
> https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi

the new ABI has been published with minor wording changes
compared to the draft version.

the ABI is implemented in gcc, binutils and glibc in a
series of patches listed below.


gcc:

commit 779640c76d37b32f4d8a7b97637ed9e345d750b4
Commit: nsz 
CommitDate: 2019-06-03 13:50:53 +

aarch64: emit .variant_pcs for aarch64_vector_pcs symbol references
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271869 
138bc75d-0d04-0410-961f-82ee72b054a4

commit d403a7711c2cf9a7a4892d76b875a1c99a690f89
Commit: nsz 
CommitDate: 2019-06-04 16:16:52 +

aarch64: fix asm visibility for extern symbols
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271913 
138bc75d-0d04-0410-961f-82ee72b054a4

commit 042371f341a956de8c76557df700ebdc1af9ab4f
Commit: nsz 
CommitDate: 2019-06-18 11:11:07 +

aarch64: fix gcc.target/aarch64/pcs_attribute-2.c on non-gnu targets
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272414 
138bc75d-0d04-0410-961f-82ee72b054a4


binutils:

commit 2301ed1c9af1316b4bad3747d2b03f7d44940f87
Commit:     Szabolcs Nagy 
CommitDate: 2019-05-24 15:05:57 +0100

aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS

commit f166ae0188dcb89c5ae925034260a708a254ab2f
Commit:     Szabolcs Nagy 
CommitDate: 2019-05-24 15:07:42 +0100

aarch64: handle .variant_pcs directive in gas

commit 0b4eac57c44ec4c9e13f5201b40936c3b3e6c639
Commit:     Szabolcs Nagy 
CommitDate: 2019-05-24 15:09:06 +0100

aarch64: override default elf .set handling in gas

commit 823710d5856996d1f54f04ecb2f7647aeae99b5b
Commit:     Szabolcs Nagy 
CommitDate: 2019-05-24 15:11:00 +0100

aarch64: handle STO_AARCH64_VARIANT_PCS in bfd

commit 65f381e729bedb933f3e1376e7f53f0ff63ac9a8
Commit:     Szabolcs Nagy 
CommitDate: 2019-05-28 12:03:51 +0100

aarch64: fix variant_pcs ld tests


glibc:

commit 55f82d328d2dd1c7c13c1992f4b9bf9c95b57551
Commit:     Szabolcs Nagy 
CommitDate: 2019-06-13 09:44:44 +0100

aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS

commit 82bc69c012838a381c4167c156a06f4598f34227
Commit:     Szabolcs Nagy 
CommitDate: 2019-06-13 09:45:00 +0100

aarch64: handle STO_AARCH64_VARIANT_PCS



Re: Implicit function declarations and GCC 10

2019-07-05 Thread Szabolcs Nagy
On 04/07/2019 12:27, Florian Weimer wrote:
> Implicit function declarations were removed from C99, more than twenty
> years ago.  So far, GCC only warns about them because there were too
> many old configure scripts where an error would lead to incorrect
> configure check failures.
> 
> I can try to fix the remaining configure scripts in Fedora and submit
> the required changes during this summer and fall.
> 
> I would appreciate if GCC 10 refused to declare functions implicitly by
> default.

+1

> 
> According to my observations, lack of an error diagnostic has turned
> into a major usability issue.  For bugs related to pointer truncation,
> we could perhaps change the C front end to produce a hard error if an
> int value returned from an implicitly declared function is converted to
> a pointer.  But the other case involves functions defined as returning
> _Bool, and the result is used in a boolean context.  The x86-64 ABI only
> requires that the lowest 8 bits of the return value are defined, so an
> implicit int results in int values which incorrectly compare as inqueal
> to zero.
> 
> Given that the pointer truncation issue is only slightly more common,
> than the _Bool issue, I don't think the diagnostic improvement for
> pointers would be very helpful, and we should just transition to errors.
> 
> Implicit int we should remove as well.  Checking configure scripts for
> both issues at the same time would not be much more work.

+1 for making implicit int an error by default.

> 
> Thanks,
> Florian
> 



Re: PPC64 libmvec implementation of sincos

2019-09-30 Thread Szabolcs Nagy
On 27/09/2019 20:23, GT wrote:
> I am attempting to create a vector version of sincos for PPC64.
> The relevant discussion thread is on the GLIBC libc-alpha mailing list.
> Navigate it beginning at 
> https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
> 
> The intention is to reuse as much as possible from the existing GCC 
> implementation of other libmvec functions.
> My questions are: Which function(s) in GCC;
> 
> 1. Gather scalar function input arguments, from multiple loop iterations, 
> into a single vector input argument for the vector function version?
> 2. Distribute scalar function outputs, to appropriate loop iteration result, 
> from the single vector function output result?
> 
> I am referring especially to vectorization of sin and cos.

i wonder if gcc can auto-vectorize scalar sincos
calls, the vectorizer seems to want the calls to
have no side-effect, but attribute pure or const
is not appropriate for sincos (which has no return
value but takes writable pointer args)

"#pragma omp simd" on a loop seems to work but i
could not get unannotated sincos loops to vectorize.

it seems it would be nice if we could add pure/const
somehow (maybe to the simd variant only? afaik openmp
requires no sideeffects for simd variants, but that's
probably only for explicitly marked loops?)


Re: PPC64 libmvec implementation of sincos

2019-09-30 Thread Szabolcs Nagy
On 30/09/2019 18:30, GT wrote:
> ‐‐‐ Original Message ‐‐‐
> On Monday, September 30, 2019 9:52 AM, Szabolcs Nagy  
> wrote:
> 
>> On 27/09/2019 20:23, GT wrote:
>>
>>> I am attempting to create a vector version of sincos for PPC64.
>>> The relevant discussion thread is on the GLIBC libc-alpha mailing list.
>>> Navigate it beginning at 
>>> https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
>>> The intention is to reuse as much as possible from the existing GCC 
>>> implementation of other libmvec functions.
>>> My questions are: Which function(s) in GCC;
>>>
>>> 1.  Gather scalar function input arguments, from multiple loop iterations, 
>>> into a single vector input argument for the vector function version?
>>> 2.  Distribute scalar function outputs, to appropriate loop iteration 
>>> result, from the single vector function output result?
>>>
>>> I am referring especially to vectorization of sin and cos.
>>
>> i wonder if gcc can auto-vectorize scalar sincos
>> calls, the vectorizer seems to want the calls to
>> have no side-effect, but attribute pure or const
>> is not appropriate for sincos (which has no return
>> value but takes writable pointer args)
> 
> 1.  Do you mean whether x86_64 already does auto-vectorize sincos?

any current target with simd attribute or omp delcare simd support.

> 2.  Where in the code do you see the vectorizer require no side-effect?

i don't know where it is in the code, but

__attribute__((simd)) float foo (float);

void bar (float *restrict a, float *restrict b)
{
for(int i=0; i<4000; i++)
a[i] = foo (b[i]);
}

is not vectorized, however it gets vectorized if

i add __attribute__((const)) to foo
OR
if i add '#pragma omp simd' to the loop and compile with
-fopenmp-simd.

(which makes sense to me: you don't want to vectorize
if you don't know the side-effects, otoh, there is no
attribute to say that i know there will be no side-effects
in functions taking pointer arguments so i don't see
how sincos can get vectorized)

>> "#pragma omp simd" on a loop seems to work but i
>> could not get unannotated sincos loops to vectorize.
>>
>> it seems it would be nice if we could add pure/const
>> somehow (maybe to the simd variant only? afaik openmp
>> requires no sideeffects for simd variants, but that's
>> probably only for explicitly marked loops?)
> 
> 1. Example 1 and Example 2 at https://sourceware.org/glibc/wiki/libmvec show 
> the 2 different
> ways to activate auto-vectorization. When you refer to "unannotated sincos", 
> which of
> the 2 techniques do you mean?

example 1 annotates the loop with #pragma omp simd.
(and requires -fopenmp-simd cflag to work)

example 2 is my goal where -ftree-vectorize is enough
without annotation.

> 2. Which function was auto-vectorized by "pragma omp simd" in the loop?

see example above.


Re: Commit messages and the move to git

2019-11-20 Thread Szabolcs Nagy
On 19/11/2019 23:44, Joseph Myers wrote:
> I do think "Related to PR N (description)" or similar is a good 
> summary line to insert where the present summary line is just a ChangeLog 
> date/author line.

i agree.



Re: -fpatchable-function-entry should set SHF_WRITE and create one __patchable_function_entries per function

2020-01-07 Thread Szabolcs Nagy
On 07/01/2020 07:25, Fangrui Song wrote:
> On 2020-01-06, Fangrui Song wrote:
>> The addresses of NOPs are collected in a section named 
>> __patchable_function_entries.
>> A __patchable_function_entries entry is relocated by a symbolic relocation 
>> (e.g. R_X86_64_64, R_AARCH64_ABS64, R_PPC64_ADDR64).
>> In -shared or -pie mode, the linker will create a dynamic relocation 
>> (non-preemptible: relative relocation (e.g. R_X86_64_RELATIVE);
>> preemptible: symbolic relocation (e.g. R_X86_64_64)).
>>
>> In either case, the section contents will be modified at runtime.
>> Thus, the section should have the SHF_WRITE flag to avoid text relocations 
>> (DF_TEXTREL).

pie/pic should either imply writable __patchable_function_entries,
or __patchable_function_entries should be documented to be offsets
from some base address in the module: the users of it have to modify
.text and do lowlevel hacks so they should be able to handle such
arithmetics.

i think it's worth opening a gcc bug report.

>> When -ffunction-sections is used, ideally GCC should emit one 
>> __patchable_function_entries (SHF_LINK_ORDER) per .text.foo .
>> If the corresponding .text.foo is discarded (--gc-sections, COMDAT, 
>> /DISCARD/), the linker can discard the associated
>> __patchable_function_entries. This can be seen as a lightweight COMDAT 
>> section group. (A section group adds an extra section and costs 3 words)
>> Currently lld (LLVM linker) has implemented such SHF_LINK_ORDER collecting 
>> features. GNU ld and gold don't have the features.
>>
>> I have summarized the feature requests in this post 
>> https://sourceware.org/ml/binutils/2019-11/msg00266.html
>>
>> gcc -fpatchable-function-entry=2 -ffunction-sections -c a.c
>>
>>  [ 4] .text.f0  PROGBITS     40 09 00  
>> AX  0   0  1
>>  ### No W flag
>>  ### One __patchable_function_entries instead of 3.
>>  [ 5] __patchable_function_entries PROGBITS     49 
>> 18 00   A  0   0  1
>>  [ 6] .rela__patchable_function_entries RELA     
>> 000280 48 18   I 13   5  8
>>  [ 7] .text.f1  PROGBITS     61 09 00  
>> AX  0   0  1
>>  [ 8] .text.f2  PROGBITS     6a 09 00  
>> AX  0   0  1
> 
> Emitting a __patchable_function_entries for each function may waste
> object file sizes (64 bytes per function on ELF64). If zeros are
> allowed, emitting a single __patchable_function_entries should be fine.
> 
> If we do want to emit unique sections, the condition should be either
> -ffunction-sections or COMDAT is used.

again it's worth raising a gcc bug i think.

there is another known issue: aarch64 -mbranch-protect=bti
(and presumably x86_64 -fcf-protection=branch) has to add
landing pad at the begining of each indirectly called function
so the patchable nops can only come after that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424

no matter how this gets resolved i think this will require
documentation changes too.


Re: Successful bootstrap and install of gcc (GCC) 6.3.0 on aarch64-unknown-linux-gnu

2017-01-27 Thread Szabolcs Nagy
On 25/01/17 19:02, Aaro Koskinen wrote:
> Configured with: ../gcc-6.3.0/configure --with-arch=armv8-a+crc 
> --with-cpu=cortex-a53 --disable-multilib --disable-nls 
> --prefix=/home/aaro/gcctest/newcompiler --enable-languages=c,c++ 
> --host=aarch64-unknown-linux-gnu --build=aarch64-unknown-linux-gnu 
> --target=aarch64-unknown-linux-gnu --with-system-zlib --with-sysroot=/
> host:   raspberrypi-3
> distro: los.git rootfs=96c66f native=96c66f
> kernel: Linux 4.9.0-rpi3-los_8e2f1c
> binutils: GNU binutils 2.27
> make:   GNU Make 4.2.1
> libc:   GNU C Library (GNU libc) stable release version 2.24
> zlib:   1.2.8
> mpfr:   3.1.3
> gmp:6
...
> processor : 0
> BogoMIPS  : 38.40
> Features  : fp asimd evtstrm crc32
> CPU implementer   : 0x41
> CPU architecture: 8
> CPU variant   : 0x0
> CPU part  : 0xd03
> CPU revision  : 4

this seems to be an r0p4 revision of cortex-a53, if you
use your toolchain to build binaries that are potentially
executed on such hw then i think the safe way is to
configure gcc with

--enable-fix-cortex-a53-835769
--enable-fix-cortex-a53-843419

since it may not be easy to tell what software is
affected on a case by case basis (there are flags to
turn these on/off at compile time if you want to go
that way).



Re: [contribution] C11 threads implementation for Unix and Windows environments

2017-02-20 Thread Szabolcs Nagy
On 20/02/17 07:49, Sebastian Huber wrote:
> Hello Gokan,
> 
> you may have a look at:
> 
> https://svnweb.freebsd.org/base/head/lib/libstdthreads/

note that despite the looks this is non-portable
and non-conforming implementation, it is way better
than the posted github code, but it can still clobber
errno, leak resources (and introduces cancellation
points which may or may not be conforming depending
how posix will integrate c11)

as far as i'm aware the only c11 conforming open source
implementation is the one in musl libc (and that's not
portable to other libcs either).



Re: RFC: Add ___tls_get_addr

2017-07-05 Thread Szabolcs Nagy
On 05/07/17 16:38, H.J. Lu wrote:
> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
> GCCs older than GCC 4.9.4:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> 
> continue to work even if vector instructions are used by functions called
> from __tls_get_addr, which assumes 16-byte stack alignment as specified
> by x86-64 psABI.
> 
> We are considering to add an alternative interface, ___tls_get_addr, to
> glibc, which doesn't realign stack.  Compilers, which properly align stack
> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
> if ___tls_get_addr is available.
> 
> Any comments?
> 
> 

what happens when new compiler generating the new symbol
is used with old glibc?




Re: RFC: Add ___tls_get_addr

2017-07-06 Thread Szabolcs Nagy
On 05/07/17 17:18, H.J. Lu wrote:
> On Wed, Jul 5, 2017 at 8:53 AM, Szabolcs Nagy  wrote:
>> On 05/07/17 16:38, H.J. Lu wrote:
>>> On x86-64, __tls_get_addr has to realigns stack so that binaries compiled by
>>> GCCs older than GCC 4.9.4:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
>>>
>>> continue to work even if vector instructions are used by functions called
>>> from __tls_get_addr, which assumes 16-byte stack alignment as specified
>>> by x86-64 psABI.
>>>
>>> We are considering to add an alternative interface, ___tls_get_addr, to
>>> glibc, which doesn't realign stack.  Compilers, which properly align stack
>>> for TLS, call generate call to ___tls_get_addr, instead of __tls_get_addr,
>>> if ___tls_get_addr is available.
>>>
>>> Any comments?
>>>
>>>
>>
>> what happens when new compiler generating the new symbol
>> is used with old glibc?
>>
> 
> Compiler shouldn't do that.
> 

i don't see how can the compiler not do that

e.g. somebody building an old glibc from
source with new compiler, then runs the tests,
all tls tests would fail because the compiler
generated the new symbol.

or users interposing __tls_get_addr (asan) need
to update their code.

or there are cases when libraries built against
one libc is used with another (e.g. musl can
mostly use a libstdc++ compiled against glibc
on x86_64)

i think introducing new libc<->compiler abi
should be done conservatively when it is really
necessary and from Rich's mail it seems there
is no need for new abi here.



Re: [Bug web/?????] New: Fwd: failure notice: Bugzilla down.

2017-08-15 Thread Szabolcs Nagy
On 15/08/17 04:10, Martin Sebor wrote:
> On 08/14/2017 04:22 PM, Eric Gallager wrote:
>> I'm emailing this manually to the list because Bugzilla is down and I
>> can't file a bug on Bugzilla about Bugzilla being down. The error
>> message looks like this:
> 
> Bugzilla and the rest of gcc.gnu.org have been down much of
> the afternoon/evening due to a hard drive upgrade (the old disk
> apparently failed).  You're not the only one who found out about
> it the hard way.  I (and I suspect most others) also discovered
> it when things like Git and SVN (and Bugzilla) stopped working.
> 
> I've CC'd the gcc list to let others know (not sure what list
> to subscribe to in order to get a heads up on these kinds of
> maintenance issues).
> 

i seems the database got corrupted.

at least one of my bugs was overwritten by another:

original 81846:
https://gcc.gnu.org/ml/gcc-bugs/2017-08/msg01574.html
current 81846:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81846

similarly there are two bugs on the bug mailing list
for 81845 and later bugs are missing.



Re: Behaviour of __forced_unwind with noexcept

2017-08-15 Thread Szabolcs Nagy
On 15/08/17 16:21, Ron wrote:
> On Tue, Aug 15, 2017 at 01:31:10PM +0200, Richard Biener wrote:
>> On Tue, Aug 15, 2017 at 1:28 PM, Jonathan Wakely  
>> wrote:
>>> On 15 August 2017 at 11:24, Richard Biener  
>>> wrote:
 On Tue, Aug 15, 2017 at 6:44 AM, Ron  wrote:
> On Mon, Aug 14, 2017 at 06:22:39PM +0100, Jonathan Wakely wrote:
>> On 13 August 2017 at 19:20, Ron wrote:
>>>
>>> Hi,
>>>
>>> I'm looking for some clarification of how the __forced_unwind thread
>>> cancellation exceptions intersect with noexcept.  I've long been a
>>> big fan of the __forced_unwind idiom, but now that C++14 is the default
>>> since GCC 6.1, and many methods including destructors are implicitly
>>> noexcept, using it safely appears to have become a lot more tricky.
>>>
>>> The closest I've found so far to an "authoritative" statement of the
>>> expected behaviour is the comments from Jonathan Wakely here:
>>>
>>> https://stackoverflow.com/questions/14268080/cancelling-a-thread-that-has-a-mutex-locked-does-not-unlock-the-mutex
>>>
>>> In particular: "It interacts with noexcept as you'd expect:
>>> std::terminate() is called if a __forced_unwind escapes a noexcept
>>> function, so noexcept functions are really noexcept, they won't
>>> unexpectedly throw some 'special' type"
>>>
>>> Which does seem logical, but unless I'm missing something this makes
>>> it unsafe to perform any operation in a destructor which might cross
>>> a cancellation point, unless that destructor is noexcept(false).
>>
>> Unfortunately I still think that's true.
>>
>> This was also raised in 
>> https://gcc.gnu.org/ml/gcc-help/2015-08/msg00040.html
>
> Ouch.  Had you considered the option of having any scope that is
> noexcept(true) also be treated as if it was implicitly in a scoped
> pthread_setcancelstate(PTHREAD_CANCEL_DISABLE), restoring the
> old state when it leaves that scope?
>
> Would it be feasible for the compiler to automatically generate that?
>
> For any toolchain which does use the unwinding exceptions extension,
> that also seems like a logical extension to the noexcept behaviour,
> since allowing cancellation will otherwise result in an exception and
> process termination.  If people really need cancellation in such
> scopes, then they can more manageably mark just those noexcept(false).
>
>
> It would need to be done by the compiler, since in user code I can't
> do that in a destructor in a way that will also protect unwinding
> members of a class (which may have destructors in code I don't
> control).
>
> I can't even completely mitigate this by just always using -std=c++03
> because presumably I'm also exposed to (at least) libstdc++.so being
> built with the new compiler default of C++14 or later.
>
>
> I'd be really sad to lose the stack unwinding we currently have when
> a thread is cancelled.  I've always known it was an extension (and I'm
> still a bit surprised it hasn't become part of the official standard),
> but it is fairly portable in practice.
>
> On Linux (or on Debian at least) clang also supports it.  It's also
> supported by gcc on FreeBSD and MacOS (though not by clang there).
> It's supported by mingw for Windows builds.  OpenBSD is currently
> the only platform I know of where even its gcc toolchain doesn't
> support this (but they're also missing support for standard locale
> functionality so it's a special snowflake anyway).
>
>
> It seems that we need to find some way past the status-quo though,
> because "don't ever use pthread_cancel" is the same as saying that
> there's no longer any use for the forced_unwind extension.  Or that
> "you can have a pthread_cancel which leaks resources, or none at all".
>
> Having a pthread_cancel that only works on cancellation points that
> aren't noexcept seems like a reasonable compromise and extension to
> the shortcomings of the standard to me.  Am I missing something there
> which makes that solution not a viable option either?

 Have glibc override the abort () from the forced_unwind if in 
 pthread_cancel
 context?
>>>
>>> If the forced_unwind exception escapes a noexcept function then the
>>> compiler calls std::terminate(). That can be replaced by the user so
>>> that it doesn't call abort(). It must not return, but a user-supplied
>>> terminate handler could trap or raise SIGKILL or something else.
>>>
>>> Required behavior: A terminate_handler shall terminate execution of
>>> the program without returning
>>> to the caller.
>>> Default behavior: The implementation’s default terminate_handler calls 
>>> abort().
>>>
>>> I don't think glibc can help, I think the compiler would need to
>>> change to not call std::terminate().
>>
>> Maybe it could call 

Re: Behaviour of __forced_unwind with noexcept

2017-08-15 Thread Szabolcs Nagy
On 15/08/17 16:47, Richard Biener wrote:
> On Tue, Aug 15, 2017 at 5:21 PM, Ron  wrote:
>> Is changing the cancellation state really an expensive operation?
>> Moreso than the checking which I assume already needs to be done for
>> noexcept to trap errant exceptions?
> 
> The noexcept checking only needs to happen if an exception is thrown
> while the pthread cancel state needs to be adjusted whenever we are
> about to enter/exit such function.
> 
>> If it really is, I guess we could also have an attribute which declares
>> a stronger guarantee than noexcept, to claim there are no cancellation
>> points in that scope, if people have something in a hot path where a few
>> cycles really matter to them and this protection is not actually needed.
>> Which could also be an automatic optimisation if the compiler is able to
>> prove there are no cancellation points?
> 
> I guess that's possible.
> 
> I suppose prototyping this would be wrapping all noexcept calls in
> 
>   try { pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, &old); call
> (); } finally { pthread_setcancelstate (old, &old); }
> 

i think changing the state this way is only valid if call
itself does not change the state, which we don't know.



Re: [Bug web/?????] New: Fwd: failure notice: Bugzilla down.

2017-08-25 Thread Szabolcs Nagy
On 16/08/17 18:38, Joseph Myers wrote:
> On Wed, 16 Aug 2017, Eric Gallager wrote:
>> I see Richi redid all his 7.2 release changes; does that imply that
>> the server restore is now complete?
> 
> No, there's still a search process ongoing to identify corrupted or 
> missing files by comparison with the last backup.
> 
> My expectation is that all the other Bugzilla changes from 13 and 14 
> August UTC need redoing manually (recreating bugs with new numbers in the 
> case of new bugs filed during that period, if those bugs are still 
> relevant, repeating comments, etc. - and possibly recreating accounts for 
> people who created accounts and filed bugs during that period).  But I 
> haven't seen any official announcement from overseers to all affected 
> projects (for both GCC and Sourceware Bugzillas) yet.
> 

can i resubmit my lost bug reports now?



libmvec simd math functions in fortran

2017-11-01 Thread Szabolcs Nagy
is there a way to get vectorized math functions in fortran?

in c code there is attribute simd declarations or openmp
declare simd pragma to tell the compiler which functions
have simd variant, but i see no such thing in fortran.

some targets have -mveclibabi=type which allows vectorizing
a set of math functions, but this does not support the
libmvec abi of glibc.


Re: libmvec simd math functions in fortran

2017-11-01 Thread Szabolcs Nagy
On 01/11/17 16:26, Jakub Jelinek wrote:
> On Wed, Nov 01, 2017 at 04:23:11PM +0000, Szabolcs Nagy wrote:
>> is there a way to get vectorized math functions in fortran?
>>
>> in c code there is attribute simd declarations or openmp
>> declare simd pragma to tell the compiler which functions
>> have simd variant, but i see no such thing in fortran.
> 
> !$omp declare simd should work fine in fortran (with -fopenmp
> or -fopenmp-simd).
> 

1) i don't want to change the fortran.

2) it does not work for me.

i want this to call vector powf in libmvec:

subroutine foo(a,b,c)
  real(4) a(8000),b(8000),c(8000)
  do j=1,8000
a(j)=b(j)**c(j)
  end do
end

where do i put

!$omp declare simd (powf)

?



Re: libmvec simd math functions in fortran

2017-11-02 Thread Szabolcs Nagy
On 01/11/17 16:47, Szabolcs Nagy wrote:
> On 01/11/17 16:26, Jakub Jelinek wrote:
>> On Wed, Nov 01, 2017 at 04:23:11PM +, Szabolcs Nagy wrote:
>>> is there a way to get vectorized math functions in fortran?
>>>
>>> in c code there is attribute simd declarations or openmp
>>> declare simd pragma to tell the compiler which functions
>>> have simd variant, but i see no such thing in fortran.
>>
>> !$omp declare simd should work fine in fortran (with -fopenmp
>> or -fopenmp-simd).
>>
> 
> 1) i don't want to change the fortran.
> 
> 2) it does not work for me.
> 
> i want this to call vector powf in libmvec:
> 
> subroutine foo(a,b,c)
>   real(4) a(8000),b(8000),c(8000)
>   do j=1,8000
> a(j)=b(j)**c(j)
>   end do
> end
> 
> where do i put
> 
> !$omp declare simd (powf)
> 
> ?

to answer my question..

it seems fortran cannot express the type signature
of mathematical functions because arguments are
passed by reference.

so there is no way to declare math interfaces
and then add omp declare simd to them to get
simd versions.

(it's not clear to me how omp declare simd is
supposed to work in fortran, but it is not useful
for vectorizing loops with math functions.)

so gfortran will need a different mechanism to
do the vectorization, e.g. an option like
-mveclibabi=glibc, but the list of available
vector functions need to be specified somewhere.



Re: -static-pie and -static -pie

2018-02-02 Thread Szabolcs Nagy

On 31/01/18 15:44, Cory Fields wrote:

After looking at this for quite a while, I'm afraid I'm unsure how to proceed.

As of now, static and static-pie are mutually exclusive. So given the
GNU_USER_TARGET_STARTFILE_SPEC you pasted
earlier, "static" matches before "static-pie", causing the wrong start files.

It seems to me that the static-pie target complicates things more than
matching against static+pie individually.

If I convert -static + -pie to -static-pie, then "static" won't be
matched in specs, where maybe it otherwise should. Same for -pie.



you can change PIE_SPEC to pie|static-pie
and occurrences of static to static|static-pie
(and !static: to !static:%{!static-pie: etc.),
except where it is used to mean "no-pie static",
those should be changed to PIE_SPEC:;static:
(and i think --no-dynamic-linker should always
be passed to ld in LD_PIE_SPEC for static pie,
not just on linux systems and selected targets.)

then there should be no difference between -static -pie
and -static-pie. (and the new -static-pie flag would
be redundant.)

this would e.g. break static linking with default pie
toolchain on systems where the static libc is not pie
or missing the rcrt startup file after upgrading to gcc-8.
i'm not sure if this is a good enough reason to introduce
the -static-pie mess, however if we don't want to break
any previously working configuration then -static-pie has
to be different from -static -pie.


Would you prefer to swallow -static and -pie and pass along only
-static-pie? Or forward them all along, and fix the specs which look
for static before static-pie ?

Regards,
Cory

On Tue, Jan 30, 2018 at 2:36 PM, H.J. Lu  wrote:

On Tue, Jan 30, 2018 at 11:18 AM, Cory Fields  wrote:

On Tue, Jan 30, 2018 at 2:14 PM, H.J. Lu  wrote:

On Tue, Jan 30, 2018 at 11:07 AM, Cory Fields  wrote:

On Tue, Jan 30, 2018 at 1:35 PM, H.J. Lu  wrote:

On Tue, Jan 30, 2018 at 10:26 AM, Cory Fields  wrote:

Hi list

I'm playing with -static-pie and musl, which seems to be in good shape
for 8.0.0. Nice work :)

However, the fact that "gcc -static -pie" and "gcc -static-pie"
produce different results is very unexpected. I understand the case
for the new link-type, but merging the options when possible would be
a huge benefit to existing buildsystems that already cope with both
individually.

My use-case:
I'd like to build with --enable-default-pie, and by adding "-static"


Why not adding "-static-pie" instead of "-static"?


to my builds, produce static-pie binaries. But at the moment, that
attempts to add an interp section.

So my question is, if no conflicting options are found, why not hoist
"-static -pie" to "-static-pie" ?

Regards,
Cory




--
H.J.


My build system, and plenty of others I'm sure, already handle -static
and -pie. Having that understood to mean "static-pie" would mean that
the combination would now just work.

Asking a different way, if I request -static and -pie, without -nopie,
quietly creating non-pie binary seems like a bug. Is there a reason
_not_ to interpret it as -static-pie in that case?


GNU_USER_TARGET_STARTFILE_SPEC is defined as

#define GNU_USER_TARGET_STARTFILE_SPEC \
   "%{shared:; \
  pg|p|profile:%{static-pie:grcrt1.o%s;:gcrt1.o%s}; \
  static:crt1.o%s; \
  static-pie:rcrt1.o%s; \
  " PIE_SPEC ":Scrt1.o%s; \
  :crt1.o%s} \
crti.o%s \
%{static:crtbeginT.o%s; \
  shared|static-pie|" PIE_SPEC ":crtbeginS.o%s; \
  :crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s} \
" CRTOFFLOADBEGIN

to pick a suitable crt1.o for static PIE when -static-pie is used.

If gcc.c can convert ... -static ... -pie and ... -pie ... -static ... to
-static-pic for GNU_USER_TARGET_STARTFILE_SPEC, it
should work.

--
H.J.


Great, that's how I've fixed it locally. Would you consider accepting
a patch for this?


I'd like to see it in GCC 8.  Please open a GCC bug and submit your
patch against it.

Thanks.

--
H.J.




Re: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Szabolcs Nagy

On 26/02/18 04:00, Ruslan Nikolaev via gcc wrote:

1. Not consistent with clang/llvm which completely supports double-width 
atomics for arm32, arm64, x86 and x86-64 making it possible to write portable 
code (w/o specific extensions or assembly code) across all these architectures 
(which is finally possible with C11!)

this should be reported as a bug against clang.

there is no abi guarantee that double-width atomics
will be able to synchronize with code in other modules,
you have to introduce a new abi to do this whatever
that takes (new elf flag, new dynamic linker name,..).


4. atomic_load can be implemented using read-modify-write as it is the only 
option for x86-64 and arm64 (see below).



no, it can't be.


      [..]  The actual nature of read-only memory and how it can be used are 
outside the scope of the standard, so there is nothing to prevent atomic_load 
from being implemented as a read-modify-write operation.



rmw load is only valid if the implementation can
guarantee that atomic objects are never read-only.

current implementations on linux (including clang)
don't do that, so an rmw load can observably break
conforming c code: a static global const object is
placed in .rodata section and thus rmw on it is a
crash at runtime contrary to c standard requirements.

on an aarch64 machine clang miscompiles this code:

$ cat a.c
#include 

static const _Atomic struct S {long i,j;} x;

int f(const _Atomic struct S *p)
{
struct S y = *p;
return y.i;
}

int main()
{
return f(&x);
}
$ gcc a.c -latomic
$ ./a.out
$ clang a.c -latomic
$ ./a.out
Segmentation fault (core dumped)



Re: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Szabolcs Nagy

On 26/02/18 13:56, Alexander Monakov wrote:

On Mon, 26 Feb 2018, Szabolcs Nagy wrote:


rmw load is only valid if the implementation can
guarantee that atomic objects are never read-only.


OK, but that sounds like a matter of not emitting atomic
objects into .rodata, which shouldn't be a big problem,
if not for backwards compatibility concern?



well gcc wants to allow atomic access on non-atomic
objects too, otherwise public interfaces may need to
change to use the _Atomic qualifier (which is not even
valid in c++ so it would cause all sorts of breakage).

i think it would be valid to put _Atomic stuff in
writable section and then say atomic load is only
supported on const objects if it is declared with
_Atomic, this would make all strictly conforming
c code work as well as most code that ppl write in
practice (they probably don't use atomics on global
consts).


current implementations on linux (including clang)
don't do that, so an rmw load can observably break
conforming c code: a static global const object is
placed in .rodata section and thus rmw on it is a
crash at runtime contrary to c standard requirements.


Note that in your example GCC emits 'x' as a common symbol,
you need '... x = { 0 };' for it to appear in .rodata,



i see.

static ... x = {0}; and static ... x; are
equivalent in c, so if gcc treats them differently
that's a gcc weirdness, but does not change the
issue that there is no guarantee about readonlyness.


on an aarch64 machine clang miscompiles this code:

[...]

and then with new enough libatomic on Glibc this segfaults
with GCC on x86_64 too due to IFUNC redirection mentioned
in the other subthread.



that's yet another issue, that this is not fully
fixed in x86 gcc.


Re: Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-27 Thread Szabolcs Nagy

On 27/02/18 12:56, Ruslan Nikolaev wrote:

Formally speaking, either implementation satisfies C11 because the standard 
allows much leeway in the interpretation here.


no,

1) your proposal would make gcc non-conforming to iso c unless it changes how 
static const objects are emitted.
2) the two implementations are not abi compatible, the choice is already made, 
changing it is an abi break.
3) Torvald pointed out further considerations such as users expecting lock-free 
atomic loads to be faster than stores.

the solutions is to add a language extension, but that requires careful design.


libmvec in gcc to have vector math in fortran

2018-04-10 Thread Szabolcs Nagy

i had a query earlier about libmvec vector functions in fortran:
https://gcc.gnu.org/ml/gcc/2017-11/msg7.html

but there were no simple solutions to make math functions vectorizable
in fortran, because it's hard to make libc headers with simd attributes
visible to the fortran front end.

i think a possible workaround is to have a dummy libmvec implementation
in libgcc.a (or more likely as a separate libgccmvec.a) that just calls
scalar functions from libm like

vdouble _ZGVbN2v_sin(vdouble x)
{
  return (vdouble){sin(x[0]), sin(x[1])};
}

and similarly for all relevant single and double precision functions
for all vector lengths and other supported variants.

then gcc knows that there is an implementation for these functions
available and with the right link order a better implementation from
libmvec can override these dummy implementations. (the cost model
cannot assume a faster vector algorithm than the scalar one though)

- this allows vectorizing loops with math functions even in fortran,
- and on targets without a libmvec implementation (but with a vector abi),
- and allows users to provide their own vector math implementation
more easily without hacking around glibc math.h (which may not support
vector math or only enable it for a small subset of math functions).

gcc needs a new cflag and ldflag to enable this.
(maybe -mveclibabi= already present in x86 and ppc can be used for this)



Re: libmvec in gcc to have vector math in fortran

2018-04-10 Thread Szabolcs Nagy

On 10/04/18 11:14, Janne Blomqvist wrote:
As I mentioned previously in that thread you linked to, the fortran frontend never generates a direct call to libm sin(), or for that matter 
ZGVbN2v_sin(). Instead it generates a "call" to __builtin_sin(). And similarly for other libm functions that have gcc builtins. The middle-end 
optimizers are then free to do whatever optimizations they like on that __builtin_sin call, such as constant folding, and at least as far as the 
fortran frontend is concerned, vectorizing if -mveclibabi= or such is in effect.


the generated builtin call is not the issue (same happens in c),
the knowledge about libc declarations is.

the middle-end has no idea what functions can be vectorized,
only the libc knows it and declares this in c headers.

this is the problem i'm trying to solve.


Re: libmvec in gcc to have vector math in fortran

2018-04-17 Thread Szabolcs Nagy

On 10/04/18 14:27, Richard Biener wrote:

On April 10, 2018 3:06:55 PM GMT+02:00, Jakub Jelinek  wrote:

On Tue, Apr 10, 2018 at 02:55:43PM +0200, Richard Biener wrote:

I wonder if it is possible for glibc to ship a "module" for fortran

instead

containing the appropriate declarations and gfortran auto-include

that

(if present).


Then we'd run into module binary format changing every release, so hard
for
glibc to ship that.

Another thing is how would we express it in the module,
we could just use OpenMP syntax,
  interface
function sin(x) bind(C,name="__builtin_sin") result(res)
  import
  !$omp declare simd notinbranch
  real(c_double) :: res
  real(c_double),value :: x
end function
  end interface
but we'd need to temporarily enable OpenMP while parsing that module.
I see Fortran now supports already
!GCC$ attributes stdcall, fastcall::test
Could we support
!GCC$ attributes simd
and
!GCC$ attributes simd('notinbranch')
too?


Maybe we can also generate this module in a fixinlclude way?



ideally everything should work magically but i think
it's good to have a big hammer solution that's easy
to reason about.

the gcc vectorizer should be testable independently
of glibc, and users should be able to specify what
can be vectorized.

if this is via a per-frontend declaration syntax,
then i see implementation and usability issues, while
those are sorted out a single flag that requests every
function known to gcc to be vectorized sounds to me a
viable big hammer solution: easy to implement and
enables users to start experimenting with simd math.

(the implementation may use a preincluded fortran
module internally, but i think it makes sense to
have a single flag ui too)


Re: libmvec in gcc to have vector math in fortran

2018-06-15 Thread Szabolcs Nagy

On 15/06/18 08:59, Florian Weimer wrote:

* Richard Biener:


'pure' makes it pure but there doesn't seem to be a way to make it const?


Does Fortran support setting the rounding mode?



yes, but vec math is only enabled with -ffast-math (so it can
assume -fno-rounding-math)


In C, sin is not const because it depends on the current rounding
mode.



hm i don't see const in glibc even in case of -ffast-math compilation,
i wonder if that can be changed.


Re: How to get GCC on par with ICC?

2018-06-22 Thread Szabolcs Nagy

On 11/06/18 11:05, Martin Jambor wrote:

The int rate numbers (running 1 copy only) were not too bad, GCC was
only about 2% slower and only 525.x264_r seemed way slower with GCC.
The fp rate numbers (again only 1 copy) showed a larger difference,
around 20%.  521.wrf_r was more than twice as slow when compiled with
GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed
significant slowdowns when compiled with GCC vs. ICC.



Keep in mind that when discussing FP benchmarks, the used math library
can be (almost) as important as the compiler.  In the case of 481.wrf,
we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU)
performance is about 70% of ICC's.  When we just linked against AMD's
libm, we got to 83%. When we instructed GCC to generate calls to Intel's
SVML library and linked against it, we got to 91%.  Using both SVML and
AMD's libm, we achieved 93%.



i think glibc 2.27 should outperform amd's libm on wrf
(since i upstreamed the single precision code from
https://github.com/ARM-software/optimized-routines/ )

the 83% -> 93% diff is because gcc fails to vectorize
math calls in fortran to libmvec calls.


That means that there likely still is 7% to be gained from more clever
optimizations in GCC but the real problem is in GNU libm.  And 481.wrf
is perhaps the most extreme example but definitely not the only one.


there is no longer a problem in gnu libm for the most
common single precision calls and if things go well
then glibc 2.28 will get double precision improvements
too.

but gcc has to learn how to use libmvec in fortran.


Re: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-12 Thread Szabolcs Nagy

On 12/07/18 16:20, Umesh Kalappa wrote:

Hi everyone,

we have our source base ,that was compiled for armv7 on gcc8.1 with
soft-float and for following input

a=0x0010
b=0x0001

  result = a - b ;

we are getting the result as "0x000e" and with
-mhard-float (disabled the flush to zero mode ) we are getting the
result as ""0x000f" as expected.



please submit it as a bug report to bugzilla


while debugging the soft-float code,we see that ,the compiler calls
the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing
"a" in r0 and r1 registers and respectively for "b".

we are investigating the routine "__aeabi_dsub" that comes from libgcc
for incorrect result  and meanwhile we would like to know that

a)do libgcc routines/intrinsic for float operations support or
consider the subnormal values ? ,if so how we can enable the same.

Thank you
~Umesh





Re: [RFC] man7/system_data_types.7: Document [unsigned] __int128

2020-10-01 Thread Szabolcs Nagy via Gcc
The 10/01/2020 12:14, Alejandro Colomar via Gcc wrote:
> Here is the rendered intmax_t:
> 
> intmax_t
>   Include: .  Alternatively, .
> 
>   A  signed  integer type capable of representing any value of any
>   signed integer type supported by the implementation.   According
>   to  the C language standard, it shall be capable of storing val-
>   ues in the range [INTMAX_MIN, INTMAX_MAX].
> 
>   The macro INTMAX_C() expands its argument to an integer constant
>   of type intmax_t.
> 
>   The  length  modifier  for  intmax_t  for  the printf(3) and the
>   scanf(3) families of functions is j; resulting commonly  in  %jd
>   or %ji for printing intmax_t values.
> 
>   Bugs:  intmax_t  is not large enough to represent values of type
>   __int128 in implementations where __int128 is defined  and  long
>   long is less than 128 bits wide.

or __int128 is not an integer type.

integer types are either standard or extended.
and __int128 is neither because it can be
larger than intmax_t and stdint.h does not
provide the necessary macros for it.

> 
>   Conforming to: C99 and later; POSIX.1-2001 and later.
> 
>   See also the uintmax_t type in this page.



Re: unnormal Intel 80-bit long doubles and isnanl

2020-11-24 Thread Szabolcs Nagy via Gcc
The 11/24/2020 16:23, Siddhesh Poyarekar wrote:
> Hi,
> 
> The Intel 80-bit long double format has a concept of "unnormal" numbers that
> have a non-zero exponent and zero integer bit (i.e. bit 63) in the mantissa;
> all valid long double numbers have their integer bit set to 1.  Unnormal
> numbers are mentioned in "8.2.2 Unsupported Double Extended-Precision
> Floating-Point Encodings and Pseudo-Denormals" and listed in Table 8-3 in
> the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume
> 1:Basic Architecture.
> 
> As per the manual, these numbers are considered unsupported and generate an
> invalid-operation exception if they are used as operands to any floating
> point instructions.  The question of this email is how the toolchain
> (including glibc) should treat these numbers since as things stand today,
> glibc and gcc disagree when it comes to isnanl.

ideally fpclassify (and other classification macros) would
handle all representations.

architecturally invalid or trap representations can be a
non-standard class but i think classifying them as FP_NAN
would break the least amount of code.

> glibc evaluates the bit pattern of the 80-bit long double and in the
> process, ignores the integer bit, i.e. bit 63.  As a result, it considers
> the unnormal number as a valid long double and isnanl returns 0.

i think m68k and x86 are different here.

> 
> gcc on the other hand, simply uses the number in a floating point comparison
> and uses the parity flag (which indicates an unordered compare, signalling a
> NaN) to decide if the number is a NaN.  The unnormal numbers behave like
> NaNs in this respect, in that they set the parity flag and with
> -fsignalling-nans, would result in an invalid-operation exception.  As a
> result, __builtin_isnanl returns 1 for an unnormal number.

compiling isnanl to a quiet fp compare is wrong with
-fsignalling-nans: classification is not supposed to
signal exceptions for snan.

> 
> So the question is, which behaviour should be considered correct? Strictly
> speaking, unnormal numbers are listed separately from NaNs in the document
> and as such are distinct from NaNs.  So on the question of "is nan?" the
> answer ought to be "No".
> 
> On the flip side, the behaviour described (and experienced through code) is
> exactly the same as a NaN, i.e. a floating point operation sets the parity
> flag and generates an invalid-operation exception.  So if it looks like a
> NaN, behaves like a NaN, then even if the document hints (and it is just a
> hint right, since it doesn't specifically state it?) that it's different, it
> likely is a NaN.  What's more, one of the fixes to glibc[1] assumes that
> __builtin_isnanl will do the right thing.
> 
> The third alternative (which seems like a step back to me, but will concede
> that it is a valid resolution) is to state that unnormal input to isnanl
> would result in undefined behaviour and hence it is the responsibility of
> the application to ensure that inputs to isnanl are never unnormal.
> 
> Thoughts?
> 
> Siddhesh
> 
> [1] 
> https://sourceware.org/git/?p=glibc.git;h=0474cd5de60448f31d7b872805257092faa626e4


AArch64 vector ABI vs OpenMP

2022-06-29 Thread Szabolcs Nagy via Gcc
Last time aarch64 libmvec was discussed, the OpenMP
declare variant syntax support was not ready in gcc
and there were open questions around how simd isa
variants would be supported.

https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532940.html

The vector function ABI for aarch64 allows the declare
variant syntax and that is the only way to declare
vector math functions for a particular isa only.

https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst#aarch64-variant-traits

I would like to get feedback if there may be anything
preventing declare variant simd support on aarch64 like

  float64x2_t simd_cos (float64x2_t);

  #pragma omp declare variant(simd_cos) \
 match(construct={simd(simdlen(2), notinbranch)}, device={isa("simd")})
  double cos (double);

where isa("simd") means simd_cos can be used when
auto vectorizing cos calls with advanced simd.

Our hope is that this enables libmvec on aarch64
such that at least advanced simd variants of
some math functions can be declared in math.h
and implemented in libm, suitable for vectorization.
(Using the vector ABI names of those functions.)

Eventually we want to add isa("sve") support too,
but that may require further work on how scalable
vector length is represented.

Please let me know if there are outstanding issues
with this approach.

thanks.


Re: Adding file descriptor attribute(s) to gcc and glibc

2022-07-13 Thread Szabolcs Nagy via Gcc
The 07/12/2022 18:25, David Malcolm via Libc-alpha wrote:
> On Tue, 2022-07-12 at 18:16 -0400, David Malcolm wrote:
> > On Tue, 2022-07-12 at 23:03 +0530, Mir Immad wrote:
> > GCC's attribute syntax here:
> >   https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
> > allows for a parenthesized list of parameters for the attribute, which
> > can be:
> >  (a) An identifier
> >  (b) An identifier followed by a comma and a non-empty comma-separated
> > list of expressions
> >  (c) A possibly empty comma-separated list of expressions
> > 
> > I'd hoped to have an argument number, with an optional extra param
> > describing the direction of the access, but syntax (b) puts the
> > identifier first, alas.
> > 
> > Here's one possible way of doing it with a single attribute, via syntax
> > (b):
> > e.g.
> >    __attribute__((fd_argument (access, 1))
> >    __attribute__((fd_argument (read, 1))
> >    __attribute__((fd_argument (write, 1))
> > 
> > meaning that argument 1 of the function is expected to be an open file-
> > descriptor, and that it must be possible to read from/write to that fd
> > for cases 2 and 3.
> > 
> > Here are some possible examples of how glibc might use this syntax:
> > 
> >     int dup (int oldfd)
> >   __attribute((fd_argument (access, 1)); 
> > 
> >     int ftruncate (int fd, off_t length)
> >   __attribute((fd_argument (access, 1)); 
> > 
> >     ssize_t pread(int fd, void *buf, size_t count, off_t offset)
> >   __attribute((fd_argument (read, 1));
> > 
> >     ssize_t pwrite(int fd, const void *buf, size_t count, 
> >    off_t offset);
> >   __attribute((fd_argument (write, 1));
> > 
> > ...but as I said, I'm most interested in input from glibc developers on
> > this.

note that glibc headers have to be namespace clean so it
would be more like

  __attribute__((__fd_argument (__access, 1)))
  __attribute__((__fd_argument (__read, 1)))
  __attribute__((__fd_argument (__write, 1)))

so it would be even shorter to write

  __attribute__((__fd_argument_access (1)))
  __attribute__((__fd_argument_read (1)))
  __attribute__((__fd_argument_write (1)))

> 
> I just realized that the attribute could accept both the single integer
> argument number (syntax (c)) for the "don't care about access
> direction" case, or the ({read|write}, N) of syntax (b) above, giving
> e.g.:
> 
> int dup (int oldfd)
>   __attribute((fd_argument (1)); 
> 
> int ftruncate (int fd, off_t length)
>   __attribute((fd_argument (1)); 
> 
> ssize_t pread(int fd, void *buf, size_t count, off_t offset)
>   __attribute((fd_argument (read, 1));
> 
> ssize_t pwrite(int fd, const void *buf, size_t count, 
>off_t offset);
>   __attribute((fd_argument (write, 1));
> 
> for the above examples.
> 
> How does that look?
> Dave

i think fd in ftruncate should be open for writing.

to be honest, i'd expect interesting fd bugs to be
dynamic and not easy to statically analyze.
the use-after-unchecked-open maybe useful. i would
not expect the access direction to catch many bugs.


Re: Adding file descriptor attribute(s) to gcc and glibc

2022-07-14 Thread Szabolcs Nagy via Gcc
The 07/13/2022 12:55, David Malcolm wrote:
> On Wed, 2022-07-13 at 16:01 +0200, Florian Weimer wrote:
> > * David Malcolm:
> GCC trunk's -fanalyzer implements the new warnings via a state machine
> for file-descriptor values; it currently has rules for handling "open",
> "close", "read", and "write", and these functions are currently hard-
> coded inside the analyzer.
> 
> Here are some examples on Compiler Explorer of what it can/cannot
> detect:
>   https://godbolt.org/z/nqPadvM4f
> 
> Probably the most important one IMHO is the leak detection.

nice.

> Would it be helpful to have some kind of attribute for "returns a new
> open FD"?  Are there other ways to close a FD other than calling
> "close" on it?  (Would converting that to some kind of "closes"
> attribute be a good idea?)

dup2(oldfd, newfd)
dup3(oldfd, newfd, flags)

closes newfd (and also opens it to be a dup of oldfd)
unless the call fails.

close_range(first, last, flags)

fclose(fdopen(fd, mode))

but users can write all sorts of wrappers around close too.

> 
> Are there any other "magic" values for file-descriptors we should be
> aware of?
> 

mmap may require fd==-1 for anonymous maps.


Re: Division by zero on A53 which does not raise an exception

2022-11-29 Thread Szabolcs Nagy via Gcc
The 11/28/2022 21:37, Stephen Smith via Binutils wrote:
> I am working on a project which is using an A53 core.   The core does not 
> raise an exception if there is a division by zero (for either integer or 
> floating point division).

floating-point division by zero signals the FE_DIVBYZERO exception.
you can test this via fetestexcept(FE_DIVBYZERO).

integer operations must not affect fenv status flags so integer
division by zero does not do that.

if you want to *trap* division by zero, there is no reliable way
to do that in c (this is not related to particular cpus though).


Re: New TLS usage in libgcc_s.so.1, compatibility impact

2024-01-15 Thread Szabolcs Nagy via Gcc
The 01/13/2024 13:49, Florian Weimer wrote:
> This commit
> 
> commit 8abddb187b33480d8827f44ec655f45734a1749d
> Author: Andrew Burgess 
> Date:   Sat Aug 5 14:31:06 2023 +0200
> 
> libgcc: support heap-based trampolines
> 
> Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
> and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
> __builtin_nested_func_ptr_deleted functions for these targets.
> 
> Co-Authored-By: Maxim Blinov 
> Co-Authored-By: Iain Sandoe 
> Co-Authored-By: Francois-Xavier Coudert 
> 
> added TLS usage to libgcc_s.so.1.  The way that libgcc_s is currently
> built, it ends up using a dynamic TLS variant on the Linux targets.
> This means that there is no up-front TLS allocation with glibc (but
> there would be one with musl).
> 
> There is still a compatibility impact because glibc assigns a TLS module
> ID upfront.  This seems to be what causes the
> ust/libc-wrapper/test_libc-wrapper test in lttng-tools to fail.  We end
> up with an infinite regress during process termination because
> libgcc_s.so.1 has been loaded, resulting in a DTV update.  When this
> happens, the bottom of the stack looks like this:
> 
> #4447 0x77f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1
> #4448 0x77fdb142 in free (ptr=)
> at ../include/rtld-malloc.h:50
> #4449 _dl_update_slotinfo (req_modid=3, new_gen=2) at ../elf/dl-tls.c:822
> #4450 0x77fdb214 in update_get_addr (ti=0x77f2bfc0, 
> gen=) at ../elf/dl-tls.c:916
> #4451 0x77fddccc in __tls_get_addr ()
> at ../sysdeps/x86_64/tls_get_addr.S:55
> #4452 0x77f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1
> #4453 0x77fdb142 in free (ptr=)
> at ../include/rtld-malloc.h:50
> #4454 _dl_update_slotinfo (req_modid=2, new_gen=2) at ../elf/dl-tls.c:822
> #4455 0x77fdb214 in update_get_addr (ti=0x77f39fa0, 
> gen=) at ../elf/dl-tls.c:916
> #4456 0x77fddccc in __tls_get_addr ()
> at ../sysdeps/x86_64/tls_get_addr.S:55
> #4457 0x77f36113 in lttng_ust_cancelstate_disable_push ()
>from /lib64/liblttng-ust-common.so.1
> #4458 0x77f4c2e8 in ust_lock_nocheck () from /lib64/liblttng-ust.so.1
> #4459 0x77f5175a in lttng_ust_cleanup () from /lib64/liblttng-ust.so.1
> #4460 0x77fca0f2 in _dl_call_fini (
> closure_map=closure_map@entry=0x77fbe000) at dl-call_fini.c:43
> #4461 0x77fce06e in _dl_fini () at dl-fini.c:114
> #4462 0x77d82fe6 in __run_exit_handlers () from /lib64/libc.so.6
> 
> Cc:ing  for awareness.
> 
> The issue also requires a recent glibc with changes to DTV management:
> commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls
> access after dlopen [BZ #19924]").  If I understand things correctly,
> before this glibc change, we didn't deallocate the old DTV, so there was
> no call to the free function.

with 19924 fixed, after a dlopen or dlclose every thread updates
its dtv on the next dynamic tls access.

before that, dtv was only updated up to the generation of the
module being accessed for a particular tls access.

so hitting the free in the dtv update path is now more likely
but the free is not new, it was there before.

also note that this is unlikely to happen on aarch64 since
tlsdesc only does dynamic tls access after a 512byte static
tls reservation runs out.

> 
> On the glibc side, we should recommend that intercepting mallocs and its
> dependencies use initial-exec TLS because that kind of TLS does not use
> malloc.  If intercepting mallocs using dynamic TLS work at all, that's
> totally by accident, and was in the past helped by glibc bug 19924.  (I

right.

> don't think there is anything special about libgcc_s.so.1 that triggers
> the test failure above, it is just an object with dynamic TLS that is
> implicitly loaded via dlopen at the right stage of the test.)  In this
> particular case, we can also paper over the test failure in glibc by not
> call free at all because the argument is a null pointer:
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 7b3dd9ab60..14c71cbd06 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -819,7 +819,8 @@ _dl_update_slotinfo (unsigned long int req_modid, size_t 
> new_gen)
>dtv entry free it.  Note: this is not AS-safe.  */
> /* XXX Ideally we will at some point create a memory
>pool.  */
> -   free (dtv[modid].pointer.to_free);
> +   if (dtv[modid].pointer.to_free != NULL)
> + free (dtv[modid].pointer.to_free);
> dtv[modid].pointer.val = TLS_DTV_UNALLOCATED;
> dtv[modid].pointer.to_free = NULL;

can be done, but !=NULL is more likely since we do modid reuse
after dlclose.

there is also a realloc in dtv resizing which happens when more
than 16 modules with tls are loaded after thread creation
(DTV_SURPLUS).

i'm not sure if it's worth supporting malloc

Re: [RFC] Linux system call builtins

2024-04-09 Thread Szabolcs Nagy via Gcc
The 04/08/2024 06:19, Matheus Afonso Martins Moreira via Gcc wrote:
> __builtin_linux_system_call(long n, ...)
...
> Calling these builtins will make GCC place all the parameters
> in the correct registers for the system call, emit the appropriate
> instruction for the target architecture and return the result.
> In other words, they would implement the calling convention[1] of
> the Linux system calls.

note: some syscalls / features don't work without asm
(posix thread cancellation, vfork, signal return,..)

and using raw syscalls outside of the single runtime the
application is using is problematic (at least on linux).

>   + It doesn't make sense for libraries to support it
> 
> There are libraries out there that provide
> system call functionality. The various libcs do.
> However they usually don't support the full set
> of Linux system calls. Using certain system calls
> could invalidate global state in these libraries
> which leads to them not being supported. Clone is
> the quintessential example. So I think libraries
> are not the proper place for this functionality.

i don't follow the reasoning here, where should the
syscall be if not in a library like libc?

clone cannot even be used from c code in general as
CLONE_VM is not compatible with c semantics without
a new stack (child clobbers the parent stack), so
the c builtin would not always work, but it is also
a syscall that only freestanding application can use
not something that calls into the libc, and even in
a freestanding application it is tricky to use right
(especially in a portable way or with features like
shadow stack), so i don't see why clone is the
quintessential example.

>   + It allows freestanding software to easily target Linux
> 
> Freestanding code usually refers to bare metal
> targets but Linux is also a viable target.
> This will make it much easier for developers
> to create freestanding nolibc no dependency
> software targeting Linux without having to
> write any assembly code at all, making GCC
> ever more useful.

i think the asm call convention bit is by far not the
hardest part in providing portable linux syscall wrappers.

my main worry is that the builtins encourage the use of raw
syscalls and outside of libc development it is not well
understood how to do that correctly, but i guess it's ok if
it is by default an error outside of -ffreestanding.


Re: [RFC] Linux system call builtins

2024-04-10 Thread Szabolcs Nagy via Gcc
The 04/09/2024 23:59, Matheus Afonso Martins Moreira via Gcc wrote:
> > and using raw syscalls outside of the single runtime the
> > application is using is problematic (at least on linux).
> 
> Why do you say they are problematic on Linux though? Please elaborate.

because the portable c api layer and syscall abi layer
has a large enough gap that applications can break
libc internals by doing raw syscalls.

and it's not just the call convention that's target
specific (this makes the c syscall() function hard to
use on linux)

and linux evolves fast enough that raw syscalls have
to be adjusted over time (to support new features)
which is harder when they are all over the place
instead of in the libc only.

> 
> The ABI being stable should mean that I can for example
> strace a program, analyze the system calls and implement
> a new version of it that performs the same functions.

you could do that with syscall() but it is not very
useful as the state of the system is not the same
when you rerun a process so syscalls would likely
fail or do different things than in the first run.

> > clone cannot even be used from c code in general
> > as CLONE_VM is not compatible with c semantics
> > without a new stack (child clobbers the parent stack)
> > so the c builtin would not always work
> > it is also a syscall that only freestanding
> > application can use not something that calls
> > into the libc
> 
> There are major projects out there which do use it regardless.

that does not make it right.

> For example, systemd:
> 
> https://github.com/systemd/systemd/blob/main/src/basic/raw-clone.h
> https://github.com/systemd/systemd/blob/main/src/shared/async.h
> https://github.com/systemd/systemd/blob/main/src/shared/async.c
> https://github.com/systemd/systemd/blob/main/docs/CODING_STYLE.md
> 
> > even in a freestanding application it is tricky to use right
> 
> No argument from me there. It is tricky...
> The compiler should make it possible though.
> 
> > so i don't see why clone is the quintessential example.
> 
> I think it is the best example because attempting to use clone
> is not actually supported by glibc.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=10311
> 
> "If you use clone() you're on your own."

should be

"if you use clone() *or* raw clone syscall then
 you're on your own"

which is roughly what i said in that discussion.

so your proposal does not fix this particular issue,
just provide a simpler footgun.

> > i guess it's ok if it is by default an error
> > outside of -ffreestanding.
> 
> Hosted C programs could also make good use of them.

they should not.

> They could certainly start out exclusive to freestanding C
> and then made available to general code if there's demand.