getting spam
Hi. These days I began to get spam in my mail box. I found, that my mail address ([EMAIL PROTECTED]) is published on : http://gcc.gnu.org/ml/gcc/2006-08/msg00227.html Please, remove it from there, thanks.
How implemented "typeof"
Hello! How I can know more about implementation at 'tree' level such extension as 'typeof'? I am not want to explore and change sources now, maybe someone cam help? -- Best regards, Alexander mailto:[EMAIL PROTECTED]
How can I get access to tree representation
I started exploring code base of cc1plus, and now I have little question - how I can get access to tree representation of program (I should do it after gcc/cp/parser.c:cp_parser_translation unit(...), isnt it?) If I wasnt mistaken, RTL began build only if parser says that syntax OK? -- Best regards, Alexander mailto:[EMAIL PROTECTED]
Re[2]: How can I get access to tree representation
Hello Gabriel, Monday, December 12, 2005, 12:47:17 PM, you wrote: > Alexander <[EMAIL PROTECTED]> writes: > | I started exploring code base of cc1plus, and now I have little > | question - how I can get access to tree representation of program (I > | should do it after gcc/cp/parser.c:cp_parser_translation unit(...), isnt > it?) > | If I wasnt mistaken, RTL began build only if parser says that syntax > | OK? > In an ideal world, that is what we would like to have. But currently, > things are a bit intertwined -- and historically, it has been worse :-). > In the current state, the gimplifiers run even if -fsyntax-only. > Worse, they run even when we know the translation unit contains an error. > However, you might get a starting point from the codes that call hooks for > tree dumping. > -- Gaby Uhh... sad :-) When cc1plus make tree dumping can we be sure, that tree correctly represented program structure, i.e. symbolic information unharmed, and we could make back-transformation - from tree to source-without-comments level? -- Best regards, Alexandermailto:[EMAIL PROTECTED]
Re: Fwd: Re: gcc 4.1.1 for mcore
Hello Nick, Thanks for your support. this is the error message I'm getting: " ... make[4]: Leaving directory `/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/gcc' /proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/./gcc/xgcc -B/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/./gcc/ -B/usr/local/mcore-elf/bin/ -B/usr/local/mcore-elf/lib/ -isystem /usr/local/mcore-elf/include -isystem /usr/local/mcore-elf/sys-include -O2 -O2 -g -O2 -DIN_GCC -DCROSS_COMPILE -DDONT_HAVE_STDIO -DDONT_HAVE_SETJMP -Dinhibit_libc -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -O3 -DNO_FLOATLIB_FIXUNSDFSI -g -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc -I. -I. -I../../gcc-4.1.1/gcc -I../../gcc-4.1.1/gcc/. -I../../gcc-4.1.1/gcc/../include -I../../gcc-4.1.1/gcc/../libcpp/include -DL_floatdisf -c ../../gcc-4.1.1/gcc/libgcc2.c -o libgcc/./_floatdisf.o /tmp/ccvk5vjH.s: Assembler messages: /tmp/ccvk5vjH.s:38: Error: operand must be absolute in range 1..32, not 53 make[3]: *** [libgcc/./_floatdisf.o] Error 1 make[3]: Leaving directory `/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/gcc' make[2]: *** [stmp-multilib] Error 2 make[2]: Leaving directory `/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/gcc' make[1]: *** [all-gcc] Error 2 make[1]: Leaving directory `/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1' make: *** [all] Error 2 attached are the library file compiled with -S option with the bmaski r4,53 wrong instruction form. and my configuration file. I run on a Linux machine with AMD CPU (x86_64). It runs 3.1.3-6.10 RedHat KDE release 2.4.21-32.EL Thanks again Alex. --- Nick Clifton <[EMAIL PROTECTED]> wrote: > Hi Alex, > > > I have some troubles with the MCORE GCC target, > (see > > below). Do have any idea what it can be and how to > > solve the problem? > > Yes - First unless you are constrained to only use > the 4.1.1 sources > please try using the current mainline gcc > development sources. It is > possible, but unlikely, that the bug has already > been fixed. > > Assuming that the problem persists, please could you > put together a > small test case to demonstrate it, including details > of how you > configured the toolchain and what sort of host > machine you are using. > If you could send this to me, but, *important*, CC > the email to the gcc > mailing list as well, then I can investigate and try > to fix the problem. > > >>> The compiler generates illigal form of bmaski > >>> instruction with operand out of range. > > Interesting. I have looked at the code locations > where this instruction > is generated and it appears to be correct. I will > be intrigued to see > how this bug is produced. > > Cheers >Nick > > > libgcc2.s Description: 846145001-libgcc2.s config.status Description: 881791177-config.status
Pointer addition/subtraction tree node
As part of adding a new pass to GCC I am intercepting addition to and subtraction from pointers. These are represented by PLUS_EXPR and MINUS_EXPR tree nodes. I need to be able to find out which of the node's two operands is the actual pointer and which is the integer that has been added to it. Subtraction is not a problem. The real pointer always seems to be the first operand (makes sense). If the integer is a constant (e.g. p = p + 4) I can catch it by checking the type: TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST But in other cases (e.g. p = p + i) I can't do this by looking at the type as they both have the same type as the result pointer. Is there another way to find out which is which? Thanks. Alex
RE: Pointer addition/subtraction tree node
Code of the form int[10] a; int* p = a; int* q = a; int i = 3; p = q + i; is transformed into int * D.900; unsigned int D.899; unsigned int i.0; : i = 3; p = &a; q = &a; i.0 = (unsigned int) i; D.899 = i.0 * 4; D.900 = (int *) D.899; p = D.900 + q; by the time it reaches the fixupcfg pass. It has been suggested to me that a solution might be to trace back through the tree to find which of the operands is derived from a non-pointer variable. I am new to GCC development. How might I go about doing this? Another approach I tried was to detect which of the operands was a compiler intermediate (my theory being that this would always be the non-pointer operand) by using DECL_ARTIFICIAL (TREE_OPERAND (t, 0)) but this breaks if tried on an operand that is not a VAR_DECL. I don't think my theory is sounds but if it is, is there a way to get this to work? Thanks. Alex. > -Original Message- > From: Andrew Pinski [mailto:[EMAIL PROTECTED] > Sent: 19 March 2007 00:47 > To: Alexander Lamaison > Cc: gcc@gcc.gnu.org > Subject: Re: Pointer addition/subtraction tree node > > On 3/18/07, Alexander Lamaison <[EMAIL PROTECTED]> wrote: > > As part of adding a new pass to GCC I am intercepting addition to and > > subtraction from pointers. These are represented by PLUS_EXPR and > > MINUS_EXPR tree nodes. I need to be able to find out which of the > node's > > two operands is the actual pointer and which is the integer that has > been > > added to it. > > > > Is there another way to find out which is which? > > Not right now, I have been working on a new representation of pointer > arithmetic for the tree level. The basic implementation is already > done, see the pointer_plus branch. > > Thanks, > Andrew Pinski
RE: Pointer addition/subtraction tree node
> -Original Message- > From: Andrew Pinski [mailto:[EMAIL PROTECTED] > Sent: 19 March 2007 00:47 > To: Alexander Lamaison > Cc: gcc@gcc.gnu.org > Subject: Re: Pointer addition/subtraction tree node > > On 3/18/07, Alexander Lamaison <[EMAIL PROTECTED]> wrote: > > As part of adding a new pass to GCC I am intercepting addition to and > > subtraction from pointers. These are represented by PLUS_EXPR and > > MINUS_EXPR tree nodes. I need to be able to find out which of the > node's > > two operands is the actual pointer and which is the integer that has > been > > added to it. > > > > Is there another way to find out which is which? > > Not right now, I have been working on a new representation of pointer > arithmetic for the tree level. The basic implementation is already > done, see the pointer_plus branch. > > Thanks, > Andrew Pinski Apologies for top-posting before: Code of the form int[10] a; int* p = a; int* q = a; int i = 3; p = q + i; is transformed into int * D.900; unsigned int D.899; unsigned int i.0; : i = 3; p = &a; q = &a; i.0 = (unsigned int) i; D.899 = i.0 * 4; D.900 = (int *) D.899; p = D.900 + q; by the time it reaches the fixupcfg pass. It has been suggested to me that a solution might be to trace back through the tree to find which of the operands is derived from a non-pointer variable. I am new to GCC development. How might I go about doing this? Another approach I tried was to detect which of the operands was a compiler intermediate (my theory being that this would always be the non-pointer operand) by using DECL_ARTIFICIAL (TREE_OPERAND (t, 0)) but this breaks if tried on an operand that is not a VAR_DECL. I don't think my theory is sounds but if it is, is there a way to get this to work? Thanks. Alex.
Using SSA
I am adding a new optimisation pass to GCC and I have found that I probably need to make use of SSA's definition-finding. The problem I am having is that the trees I am working on don't seem to be in SSA form (i.e. not SSA_NAME nodes). I have looked endlessly and can't find any documentation on the basics of getting set up to use SSA. I've tried looking at existing optimisation passes but I can't find anything that they do in common in the way of set-up. The tree_opt_pass for my pass has PROP_ssa set in the properties_required field. Is this all I need to do? Any help is greatly appreciated. Alex
RE: Using SSA
> > The tree_opt_pass for my pass has PROP_ssa set in the > properties_required > > field. Is this all I need to do? > > You need to put your pass after pass_build_ssa. Setting PROP_ssa does > not build SSA itself, but it will cause an assertion failure if the > pass is run while SSA is (not yet) available. > > Paolo I think (if I'm correctly interpreting the list in passes.c) it is. It's right after pass_warn_function_noreturn, just before pass_mudflap_2. Is this right? I don't get any assertion about SSA not being available. Thanks. Alex
RE: Using SSA
> > I think (if I'm correctly interpreting the list in passes.c) it is. > It's > > right after pass_warn_function_noreturn, just before pass_mudflap_2. > Is > > this right? I don't get any assertion about SSA not being available. > > In this case, it is also after pass_del_ssa, which means SSA has > already > been destroyed. Oh, ok. Thanks! I had assumed the mudflap passes would have SSA enabled as the 'Tree-SSA passes' section of the GCC internal manual listed them: http://gcc.gnu.org/onlinedocs/gccint/Tree_002dSSA-passes.html#Tree_002dSSA-p asses. Thanks. Alex.
SoC Project: Propagating array data dependencies from Tree-SSA to RTL
Melnik: http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01518.html -- Alexander Monakov
Re: SoC Project: Propagating array data dependencies from Tree-SSA to RTL
On Sun, 25 Mar 2007, Daniel Berlin wrote: Ayal has not signed up to be a mentor (as of yet). If he doesn't, i'd be happy to mentor you here, since i wrote part of tree-data-ref.c Thanks, I'll be very glad to work with you. On Mon, 26 Mar 2007, Ayal Zaks wrote: Sorry, I fear I may have too little time to devote to this; plus, it would be very useful to start with a good understanding of tree-data-ref from which to start propagating the dep info. Vladimir Yanovsky and I will be able to help when it comes to what/how to feed the modulo scheduler. Thank you for your attention. I hope I will have a chance to ask you for help in the frame of GSoC project. -- Alexander Monakov
libcc_s.so and libc.so curcular dependency on FreeBSD
Hi, I am working on integrating GCC 4.1.x series into FreeBSD src/ tree. I've been running with the new compiler on FreeBSD 7.0 for quite a while now, but was hesitant to commit my changes because of a couple of unsolved issues. I would really appreciate your input on the way to overcome them. One of the main goals for the upcoming compiler refresh in FreeBSD was to start using shared libgcc_s.so.1, something we did not do before. Ideally I want libgcc_s.so.1 built from FreeBSD src/ tree to be 100% binary compatible with the library build from stock GCC sources as checked out from FSF SVN repository. This is where the first problem lies. libgcc_s.so.1 depends on libc.so.1 (and libpthread.so) for symbols like malloc, free, pthread_once, pthread_mutex_lock, etc. libc in turn depends on libgcc_s.so.1 due to the default way exception frame info information registration is implemented in FreeBSD. Each shared object is expected to issue calls to __register_frame_info and __deregister_frame_info in its startup/shutdown code in order for exception handlers to work across shared library boundaries. This creates a dependency cycle that I need to break. The simplest way to go appears to follow Linux's lead and eliminate the need for shared modules to have explicit frame into registration calls at startup and allow exception handling code to locate necessary info with the help of dynamic loader using dl_iterate_phdr call. I went ahead and implemented necessary code in FreeBSD's ld-elf.so.1 and with little changes in gcc crtstuff.c and unwind-dw2-fde-glibc.c I was able to get things working. I did not find any breakage yet and everything seems to run smoothly so far. I wonder if you could confirm that this is a good resolution for the circular dependency issue or is there a better way. Since both libc and libgcc_s will both have symbol version support turned on on them, I cannot import the new compiler into FreeBSD until this dilemma is solved in one way or another. I also noticed that on Linux glibc implements some of the libgcc symbols, namely _Unwind_Find_FDE and __register_frame_info_bases family of functions. I wonder why it is done and if I missed something obvious here. I wonder if GCC team will be willing to accept gcc part of my changes into mainstream sources if I submit a patch. What are chances of something like this to be committed into not only trunk, but also GCC 4.1, 4.2 and 4.3 branches? FreeBSD 7.0 wants to ship with GCC 4.1, but newer version is likely to be imported into trunk shortly after CVS is branched for 7.0 release. All I had to do is to extend checks for GLIBC 2.2.4 in above mentioned GCC files to also check for suitable __FreeBSD_version__ and they compile just fine. I also had to add Linux-compatible definition for struct dl_phdr_info along with dl_iterate_phdr function prototype into FreeBSD's link.h header file. -- Alexander Kabaev signature.asc Description: PGP signature
Re: Questions about trampolines
On Mar 14, 2005, at 8:11 AM, Marc Espie wrote: In article <[EMAIL PROTECTED]> you write: Well as I said above, trampolines or an equivalent are currently critically needed by some front ends (and of course by anyone using the (very useful IMO) extension of nested functions in C). This is your opinion, but I've yet to find an actual piece of code in a real project that uses that extension. I use it; it involves writing less code than a userdata pointer would. I admit that I've never encountered any other code that takes the address of local functions, and I could easily use inlining to stop it using trampolines.
Re: GCC 4.0, Fast Math, and Acovea
On May 3, 2005, at 4:54 PM, Diego Novillo wrote: On Tue, May 03, 2005 at 04:45:55PM -0400, Scott Robert Ladd wrote: If you have a suggestion for better benchmarks, I'm listening. Is your ray tracer available? I recently heard of Openbench, a project to create an open version of the SPEC benchmarks http://www.exactcode.de/oss/openbench/ There's also this benchmark project, although it's nowhere near complete yet: http://arsware.org/cms/showpage.php?cid=104
Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions
On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote: > I've not seen any follow-up to this version. Should we go ahead and adopt > this? Can we please go with 'committed' (lowercase) rather than all-caps COMMITTED? Spelling this with all-caps seems like a recent thing on gcc-patches, before everyone used the lowercase version, which makes more sense (no need to shout about the thing that didn't need any discussion before applying the patch). Also, while tools like 'git format-patch' will automatically put [PATCH] in the subject, for '[COMMITTED]' it will be the human typing that out, and it makes little sense to require people to meticulously type that out in caps. Especially when the previous practice was opposite. Thanks. Alexander
Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions
On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote: > Upper case is what glibc has, though it appears that it's a rule that is not > strictly followed. If we change it, then it becomes another friction point > between developer groups. Personally, I'd leave it as is, then turn a blind > eye to such minor non-conformance. In that case can we simply say that both 'committed' and 'COMMITTED' are okay, if we know glibc doesn't follow that rule and anticipate we will not follow it either? Thanks. Alexander
Re: Missed optimization with endian and alignment independent memory access on x64
On Thu, 6 Feb 2020, Moritz Strübe wrote: > Why is this so hard optimize? As it's quite a common pattern I'd expect that > there would be at least some hand-coded special case optimizer. (This isn't > criticism - I'm honestly curious.) Or is there a reason gcc shouldn't optimize > this / Why it doesn't matter that this is missed? The compiler would need to exploit the fact that signed overflow is undefined, or deduce it cannot happen. Imagine what happens in a more general case if i is INT_MAX (so without undefined overflow i+1 would be INT_MIN): int f(unsigned char *ptr, int i) { return ptr[i] | ptr[i+1] << 8; } With 64-bit address space this might access two bytes 4GB apart. But you're right that it's a missed optimization in GCC, so you can file it to the GCC Bugzilla. > Is there a way to write such code that gcc optimizes? Simply write a function that accepts one pointer: int load_16be(unsigned char *ptr) { return ptr[0] << 8 | ptr[1]; } and use it as load_16be(data+i) or load_16be(&data[i]). > From a performance point of view: If I actually need two consecutive bytes, > wouldn't it be better to load them as word and split them at the register > level? The question is not entirely clear to me, but usually the answer is that it depends on the microarchitecture and details of the computations that need to be done with loaded values. Often you'd need more than one instruction to "split" the wide load, so it wouldn't be profitable. Alexander
Re: Branch instructions that depend on target distance
On Mon, 24 Feb 2020, Andreas Schwab wrote: > On Feb 24 2020, Petr Tesarik wrote: > > > On Mon, 24 Feb 2020 12:29:40 +0100 > > Andreas Schwab wrote: > > > >> On Feb 24 2020, Petr Tesarik wrote: > >> > >> > This works great ... until there's some inline asm() statement, for > >> > which gcc cannot keep track of the length attribute, so it is probably > >> > taken as zero. > >> > >> GCC computes it by counting the number of asm insns. You can use > >> ADJUST_INSN_LENGTH to adjust this as needed. > > > > Hmm, that's interesting, but does it work for inline asm() statements? > > Yes, for a suitable definition of work. > > > The argument is essentially a free-form string (with some > > substitution), and the compiler cannot know how many bytes they occupy. > > That's why ADJUST_INSN_LENGTH can adjust it. I think Petr might be unaware of the fact that GCC counts the **number of instructions in an inline asm statement** by counting separators in the asm string. This may overcount when a separator appears in a string literal for example, but triggering under-counting is trickier. Petr, please see https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html for some more discussion. Alexander
GSoC topic: precise lifetimes in GIMPLE
Hi, following the conversation in PR 90348, I wonder if it would make sense to suggest the idea presented there as a potential GSoC topic? Like this: **Enhance GIMPLE IR to represent lifetimes explicitly** At the moment, GCC internal representation GIMPLE lacks precise lifetime information for addressable variables: GIMPLE marks the end of the lifetime by the so-called "GIMPLE clobber" statement, corresponding to the point where the variable goes out of scope in the original program. However, the event of the "birth" of a variable (where it appears in scope) is lost, making the IR ambiguous and opening the door for invalid optimizations, as shown in bug #90348. The project would entail inventing a way to represent "lifetime start" in GIMPLE, adjusting front-ends to emit it, implementing a verifier to check that optimizations do not move references outside of the variable's lifetime, and potentially enhancing optimizations to move lifetime markers, expanding the lifetime, where necessary. I know we already have good project ideas, and I suspect this idea may be too complicated for GSoC, but on the other hand it sounds useful, and gives an "experimental" topic that may be interesting for students. What do you think? Thanks. Alexander
Re: GSoC topic: precise lifetimes in GIMPLE
On Mon, 2 Mar 2020, Richard Biener wrote: > PR90348 is certainly entertaining. But I guess for a GSoC project > we need a more elaborate implementation plan. The above suggesting > of a "lifetime start" is IMHO a no-go btw. Instead I think the > only feasible way is to make all references indirect and thus > make both "allocation" and "deallocation" points explicit. Then > there's a data dependence on the "allocation" statement which > provides upward safety and the "deallocation" statement would > need to act as a barrier in some to be determined way. That is, > how to make optimizers preserve the lifetime end is still unsolved. I think a verifier that ensures that all references are dominated by "lifetime start" and post-dominated by clobbers/lifetime-end would be a substantial step towards that. Agreed that data dependence on allocation would automatically ensure part of that verification, but then the problem with deallocation remains, as you say. > IMHO whatever we do should combine with the CLOBBERs we have now, > not be yet another mechanism. This seems contradictory with the ideas in your previous paragraph. I agree though, CLOBBER-as-lifetime-end makes sense and does not call for a replacement. Thanks. Alexander
[PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic
Some time ago Variable Length Arrays (VLA) were removed from the kernel. The kernel is built with '-Wvla'. Let's exclude alloca() from the instrumentation logic and make it simpler. The build-time assertion against alloca() is added instead. Unfortunately, for that assertion we can't simply check cfun->calls_alloca during RTL phase. It turned out that gcc before version 7 called allocate_dynamic_stack_space() from expand_stack_vars() for runtime alignment of constant-sized stack variables. That caused cfun->calls_alloca to be set for functions that don't use alloca(). Signed-off-by: Alexander Popov --- scripts/gcc-plugins/stackleak_plugin.c | 51 +++--- 1 file changed, 21 insertions(+), 30 deletions(-) diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c index cc75eeba0be1..1ecfe50d0bf5 100644 --- a/scripts/gcc-plugins/stackleak_plugin.c +++ b/scripts/gcc-plugins/stackleak_plugin.c @@ -9,10 +9,9 @@ * any of the gcc libraries * * This gcc plugin is needed for tracking the lowest border of the kernel stack. - * It instruments the kernel code inserting stackleak_track_stack() calls: - * - after alloca(); - * - for the functions with a stack frame size greater than or equal - * to the "track-min-size" plugin parameter. + * It instruments the kernel code inserting stackleak_track_stack() calls + * for the functions with a stack frame size greater than or equal to + * the "track-min-size" plugin parameter. * * This plugin is ported from grsecurity/PaX. For more information see: * https://grsecurity.net/ @@ -46,7 +45,7 @@ static struct plugin_info stackleak_plugin_info = { "disable\t\tdo not activate the plugin\n" }; -static void stackleak_add_track_stack(gimple_stmt_iterator *gsi, bool after) +static void stackleak_add_track_stack(gimple_stmt_iterator *gsi) { gimple stmt; gcall *stackleak_track_stack; @@ -56,12 +55,7 @@ static void stackleak_add_track_stack(gimple_stmt_iterator *gsi, bool after) /* Insert call to void stackleak_track_stack(void) */ stmt = gimple_build_call(track_function_decl, 0); stackleak_track_stack = as_a_gcall(stmt); - if (after) { - gsi_insert_after(gsi, stackleak_track_stack, - GSI_CONTINUE_LINKING); - } else { - gsi_insert_before(gsi, stackleak_track_stack, GSI_SAME_STMT); - } + gsi_insert_before(gsi, stackleak_track_stack, GSI_SAME_STMT); /* Update the cgraph */ bb = gimple_bb(stackleak_track_stack); @@ -87,14 +81,13 @@ static bool is_alloca(gimple stmt) /* * Work with the GIMPLE representation of the code. Insert the - * stackleak_track_stack() call after alloca() and into the beginning - * of the function if it is not instrumented. + * stackleak_track_stack() call into the beginning of the function. */ static unsigned int stackleak_instrument_execute(void) { basic_block bb, entry_bb; - bool prologue_instrumented = false, is_leaf = true; - gimple_stmt_iterator gsi; + bool is_leaf = true; + gimple_stmt_iterator gsi = { 0 }; /* * ENTRY_BLOCK_PTR is a basic block which represents possible entry @@ -111,27 +104,17 @@ static unsigned int stackleak_instrument_execute(void) */ FOR_EACH_BB_FN(bb, cfun) { for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) { - gimple stmt; - - stmt = gsi_stmt(gsi); + gimple stmt = gsi_stmt(gsi); /* Leaf function is a function which makes no calls */ if (is_gimple_call(stmt)) is_leaf = false; - if (!is_alloca(stmt)) - continue; - - /* Insert stackleak_track_stack() call after alloca() */ - stackleak_add_track_stack(&gsi, true); - if (bb == entry_bb) - prologue_instrumented = true; + /* Variable Length Arrays are forbidden in the kernel */ + gcc_assert(!is_alloca(stmt)); } } - if (prologue_instrumented) - return 0; - /* * Special cases to skip the instrumentation. * @@ -168,7 +151,7 @@ static unsigned int stackleak_instrument_execute(void) bb = single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun)); } gsi = gsi_after_labels(bb); - stackleak_add_track_stack(&gsi, false); + stackleak_add_track_stack(&gsi); return 0; } @@ -185,12 +168,20 @@ static bool large_stack_frame(void) /* * Work with the RTL representation of the code. * Remove the unneeded stackleak_track_stack()
[PATCH 0/5] Improvements of the stackleak gcc plugin
In this patch series I collected various improvements of the stackleak gcc plugin. The first patch excludes alloca() from the stackleak instrumentation logic to make it simpler. The second patch is the main improvement. It eliminates an unwanted side-effect of kernel code instrumentation. This patch is a deep reengineering of the idea described on grsecurity blog: https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction The third patch adds 'verbose' plugin parameter for printing additional info about the kernel code instrumentation. Two other patches disable unneeded stackleak instrumentation for some files. I would like to thank Alexander Monakov for his advisory on gcc internals. This patch series was tested for gcc version 4.8, 5, 6, 7, 8, 9, and 10 on x86_64, i386 and arm64. That was done using the project 'kernel-build-containers': https://github.com/a13xp0p0v/kernel-build-containers Alexander Popov (5): gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving gcc-plugins/stackleak: Add 'verbose' plugin parameter gcc-plugins/stackleak: Don't instrument itself gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO arch/arm64/kernel/vdso/Makefile| 3 +- include/linux/compiler_attributes.h| 13 ++ kernel/Makefile| 1 + kernel/stackleak.c | 16 +- scripts/Makefile.gcc-plugins | 2 + scripts/gcc-plugins/stackleak_plugin.c | 260 - 6 files changed, 232 insertions(+), 63 deletions(-) -- 2.25.2
[PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving
The kernel code instrumentation in stackleak gcc plugin works in two stages. At first, stack tracking is added to GIMPLE representation of every function (except some special cases). And later, when stack frame size info is available, stack tracking is removed from the RTL representation of the functions with small stack frame. There is an unwanted side-effect for these functions: some of them do useless work with caller-saved registers. As an example of such case, proc_sys_write without instrumentation: 55 push %rbp 41 b8 01 00 00 00 mov$0x1,%r8d 48 89 e5mov%rsp,%rbp e8 11 ff ff ff callq 81284610 5d pop%rbp c3 retq 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 00 00 00 proc_sys_write with instrumentation: 55 push %rbp 48 89 e5mov%rsp,%rbp 41 56 push %r14 41 55 push %r13 41 54 push %r12 53 push %rbx 49 89 f4mov%rsi,%r12 48 89 fbmov%rdi,%rbx 49 89 d5mov%rdx,%r13 49 89 cemov%rcx,%r14 4c 89 f1mov%r14,%rcx 4c 89 eamov%r13,%rdx 4c 89 e6mov%r12,%rsi 48 89 dfmov%rbx,%rdi 41 b8 01 00 00 00 mov$0x1,%r8d e8 f2 fe ff ff callq 81298e80 5b pop%rbx 41 5c pop%r12 41 5d pop%r13 41 5e pop%r14 5d pop%rbp c3 retq 66 0f 1f 84 00 00 00nopw 0x0(%rax,%rax,1) 00 00 Let's improve the instrumentation to avoid this: 1. Make stackleak_track_stack() save all register that it works with. Use no_caller_saved_registers attribute for that function. This attribute is available for x86_64 and i386 starting from gcc-7. 2. Insert calling stackleak_track_stack() in asm: asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer)) Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h. The input constraint is taken into account during gcc shrink-wrapping optimization. It is needed to be sure that stackleak_track_stack() call is inserted after the prologue of the containing function, when the stack frame is prepared. This work is a deep reengineering of the idea described on grsecurity blog https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction Signed-off-by: Alexander Popov --- include/linux/compiler_attributes.h| 13 ++ kernel/stackleak.c | 16 +- scripts/Makefile.gcc-plugins | 2 + scripts/gcc-plugins/stackleak_plugin.c | 206 + 4 files changed, 196 insertions(+), 41 deletions(-) diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index cdf016596659..522d57ae8532 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -41,6 +41,7 @@ # define __GCC4_has_attribute___nonstring__ 0 # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8) # define __GCC4_has_attribute___fallthrough__ 0 +# define __GCC4_has_attribute___no_caller_saved_registers__ 0 #endif /* @@ -175,6 +176,18 @@ */ #define __mode(x) __attribute__((__mode__(x))) +/* + * Optional: only supported since gcc >= 7 + * + * gcc: https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-no_005fcaller_005fsaved_005fregisters-function-attribute_002c-x86 + * clang: https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers + */ +#if __has_attribute(__no_caller_saved_registers__) +# define __no_caller_saved_registers __attribute__((__no_caller_saved_registers__)) +#else +# define __no_caller_saved_registers +#endif + /* * Optional: not supported by clang * diff --git a/kernel/stackleak.c b/kernel/stackleak.c index b193a59fc05b..a8fc9ae1d03d 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -104,19 +104,9 @@ asmlinkage void notrace stackleak_erase(void) } NOKPROBE_SYMBOL(stackleak_erase); -void __used notrace stackleak_track_stack(void) +void __used __no_caller_saved_registers notrace stackleak_track_stack(void) { - /* -* N.B. stackleak_erase() fills the kernel stack with the poison value, -* which has the register width. That code assumes that the value -* of 'lowest_stack' is aligned on the register width boundary. -* -* That is true for x86 and x86_64 because of the kernel stack -* alignment on these platforms (for details, see 'c
[PATCH 4/5] gcc-plugins/stackleak: Don't instrument itself
There is no need to try instrumenting functions in kernel/stackleak.c. Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin is disabled. Signed-off-by: Alexander Popov --- kernel/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/Makefile b/kernel/Makefile index 4cb4130ced32..d372134ac9ec 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -118,6 +118,7 @@ obj-$(CONFIG_RSEQ) += rseq.o obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o +CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN) obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o KASAN_SANITIZE_stackleak.o := n KCOV_INSTRUMENT_stackleak.o := n -- 2.25.2
[PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
Don't try instrumenting functions in arch/arm64/kernel/vdso/vgettimeofday.c. Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin is disabled. Signed-off-by: Alexander Popov --- arch/arm64/kernel/vdso/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 3862cad2410c..9b84cafbd2da 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -32,7 +32,8 @@ UBSAN_SANITIZE:= n OBJECT_FILES_NON_STANDARD := y KCOV_INSTRUMENT:= n -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \ + $(DISABLE_STACKLEAK_PLUGIN) ifneq ($(c-gettimeofday-y),) CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y) -- 2.25.2
[PATCH 3/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter
Add 'verbose' plugin parameter for stackleak gcc plugin. It can be used for printing additional info about the kernel code instrumentation. For using it add the following to scripts/Makefile.gcc-plugins: gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \ += -fplugin-arg-stackleak_plugin-verbose Signed-off-by: Alexander Popov --- scripts/gcc-plugins/stackleak_plugin.c | 31 +- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c index 0769c5b9156d..19358712d4ed 100644 --- a/scripts/gcc-plugins/stackleak_plugin.c +++ b/scripts/gcc-plugins/stackleak_plugin.c @@ -33,6 +33,8 @@ __visible int plugin_is_GPL_compatible; static int track_frame_size = -1; static bool build_for_x86 = false; static const char track_function[] = "stackleak_track_stack"; +static bool disable = false; +static bool verbose = false; /* * Mark these global variables (roots) for gcc garbage collector since @@ -45,6 +47,7 @@ static struct plugin_info stackleak_plugin_info = { .help = "track-min-size=nn\ttrack stack for functions with a stack frame size >= nn bytes\n" "arch=target_arch\tspecify target build arch\n" "disable\t\tdo not activate the plugin\n" + "verbose\t\tprint info about the instrumentation\n" }; static void add_stack_tracking_gcall(gimple_stmt_iterator *gsi) @@ -98,6 +101,10 @@ static tree get_current_stack_pointer_decl(void) return var; } + if (verbose) { + fprintf(stderr, "stackleak: missing current_stack_pointer in %s()\n", + DECL_NAME_POINTER(current_function_decl)); + } return NULL_TREE; } @@ -366,6 +373,7 @@ static bool remove_stack_tracking_gasm(void) */ static unsigned int stackleak_cleanup_execute(void) { + const char *fn = DECL_NAME_POINTER(current_function_decl); bool removed = false; /* @@ -376,11 +384,17 @@ static unsigned int stackleak_cleanup_execute(void) * For more info see gcc commit 7072df0aae0c59ae437e. * Let's leave such functions instrumented. */ - if (cfun->calls_alloca) + if (cfun->calls_alloca) { + if (verbose) + fprintf(stderr, "stackleak: instrument %s() old\n", fn); return 0; + } - if (large_stack_frame()) + if (large_stack_frame()) { + if (verbose) + fprintf(stderr, "stackleak: instrument %s()\n", fn); return 0; + } if (lookup_attribute_spec(get_identifier("no_caller_saved_registers"))) removed = remove_stack_tracking_gasm(); @@ -506,9 +520,6 @@ __visible int plugin_init(struct plugin_name_args *plugin_info, /* Parse the plugin arguments */ for (i = 0; i < argc; i++) { - if (!strcmp(argv[i].key, "disable")) - return 0; - if (!strcmp(argv[i].key, "track-min-size")) { if (!argv[i].value) { error(G_("no value supplied for option '-fplugin-arg-%s-%s'"), @@ -531,6 +542,10 @@ __visible int plugin_init(struct plugin_name_args *plugin_info, if (!strcmp(argv[i].value, "x86")) build_for_x86 = true; + } else if (!strcmp(argv[i].key, "disable")) { + disable = true; + } else if (!strcmp(argv[i].key, "verbose")) { + verbose = true; } else { error(G_("unknown option '-fplugin-arg-%s-%s'"), plugin_name, argv[i].key); @@ -538,6 +553,12 @@ __visible int plugin_init(struct plugin_name_args *plugin_info, } } + if (disable) { + if (verbose) + fprintf(stderr, "stackleak: disabled for this translation unit\n"); + return 0; + } + /* Give the information about the plugin */ register_callback(plugin_name, PLUGIN_INFO, NULL, &stackleak_plugin_info); -- 2.25.2
Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
On 04.06.2020 17:14, Jann Horn wrote: > On Thu, Jun 4, 2020 at 3:58 PM Will Deacon wrote: >> On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote: >>> Don't try instrumenting functions in arch/arm64/kernel/vdso/vgettimeofday.c. >>> Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin >>> is disabled. >>> >>> Signed-off-by: Alexander Popov >>> --- >>> arch/arm64/kernel/vdso/Makefile | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/arm64/kernel/vdso/Makefile >>> b/arch/arm64/kernel/vdso/Makefile >>> index 3862cad2410c..9b84cafbd2da 100644 >>> --- a/arch/arm64/kernel/vdso/Makefile >>> +++ b/arch/arm64/kernel/vdso/Makefile >>> @@ -32,7 +32,8 @@ UBSAN_SANITIZE := n >>> OBJECT_FILES_NON_STANDARD:= y >>> KCOV_INSTRUMENT := n >>> >>> -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables >>> +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \ >>> + $(DISABLE_STACKLEAK_PLUGIN) >> >> I can pick this one up via arm64, thanks. Are there any other plugins we >> should be wary of? I can't tell exactly. I'm sure Kees has the whole picture. >> It looks like x86 filters out $(GCC_PLUGINS_CFLAGS) >> when building the vDSO. Yes, that's why building x86 vDSO doesn't need $(DISABLE_STACKLEAK_PLUGIN). > Maybe at some point we should replace exclusions based on > GCC_PLUGINS_CFLAGS and KASAN_SANITIZE and UBSAN_SANITIZE and > OBJECT_FILES_NON_STANDARD and so on with something more generic... > something that says "this file will not be built into the normal > kernel, it contains code that runs in realmode / userspace / some > similarly weird context, and none of our instrumentation > infrastructure is available there"... Good idea. I would also add 'notrace' to that list. Best regards, Alexander
Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
On 04.06.2020 17:25, Jann Horn wrote: > On Thu, Jun 4, 2020 at 4:21 PM Alexander Popov wrote: >> On 04.06.2020 17:14, Jann Horn wrote: >>> Maybe at some point we should replace exclusions based on >>> GCC_PLUGINS_CFLAGS and KASAN_SANITIZE and UBSAN_SANITIZE and >>> OBJECT_FILES_NON_STANDARD and so on with something more generic... >>> something that says "this file will not be built into the normal >>> kernel, it contains code that runs in realmode / userspace / some >>> similarly weird context, and none of our instrumentation >>> infrastructure is available there"... >> >> Good idea. I would also add 'notrace' to that list. > > Hm? notrace code should definitely still be subject to sanitizer > instrumentation. I mean ftrace is sometimes disabled for functions that are executed in those weird contexts. As well as kcov instrumentation. It would be nice if that generic mechanism could help with choosing which kernel code instrumentation technologies should be disabled in the given context. Best regards, Alexander
Re: [PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic
On 04.06.2020 17:01, Jann Horn wrote: > On Thu, Jun 4, 2020 at 3:51 PM Alexander Popov wrote: >> Some time ago Variable Length Arrays (VLA) were removed from the kernel. >> The kernel is built with '-Wvla'. Let's exclude alloca() from the >> instrumentation logic and make it simpler. The build-time assertion >> against alloca() is added instead. > [...] >> + /* Variable Length Arrays are forbidden in the >> kernel */ >> + gcc_assert(!is_alloca(stmt)); > > There is a patch series from Elena and Kees on the kernel-hardening > list that deliberately uses __builtin_alloca() in the syscall entry > path to randomize the stack pointer per-syscall - see > <https://lore.kernel.org/kernel-hardening/20200406231606.37619-4-keesc...@chromium.org/>. Thanks, Jann. At first glance, leaving alloca() handling in stackleak instrumentation logic would allow to integrate stackleak and this version of random_kstack_offset. Kees, Elena, did you try random_kstack_offset with upstream stackleak? It looks to me that without stackleak erasing random_kstack_offset can be weaker. I mean, if next syscall has a bigger stack randomization gap, the data on thread stack from the previous syscall is not overwritten and can be used. Am I right? Another aspect: CONFIG_STACKLEAK_METRICS can be used for guessing kernel stack offset, which is bad. It should be disabled if random_kstack_offset is on. Best regards, Alexander
Re: [PATCH 0/5] Improvements of the stackleak gcc plugin
On 09.06.2020 22:15, Kees Cook wrote: > On Thu, Jun 04, 2020 at 04:49:52PM +0300, Alexander Popov wrote: >> In this patch series I collected various improvements of the stackleak >> gcc plugin. > > Thanks! > >> Alexander Popov (5): >> gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic >> gcc-plugins/stackleak: Use asm instrumentation to avoid useless >> register saving > > These look like they might need tweaks (noted in their separate > replies). Thanks for the review, Kees. >> gcc-plugins/stackleak: Add 'verbose' plugin parameter >> gcc-plugins/stackleak: Don't instrument itself > > If you wanted to reorder the series and move these first, I could take > these into my tree right away (they're logically separate from the other > fixes). Ok, I will put "don't instrument itself" at the beginning of v2. The patch adding 'verbose' plugin parameter depends on the previous patches, so I will not move it. >> gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO > > This seems good -- though I'm curious about 32-bit ARM and the other > HAVE_GCC_PLUGINS architectures with vDSOs (which appears to be all of > them except um). (going to reply in a separate email) Best regards, Alexander
Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
On 10.06.2020 10:30, Will Deacon wrote: > On Tue, Jun 09, 2020 at 12:09:27PM -0700, Kees Cook wrote: >> On Thu, Jun 04, 2020 at 02:58:06PM +0100, Will Deacon wrote: >>> On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote: >>>> Don't try instrumenting functions in >>>> arch/arm64/kernel/vdso/vgettimeofday.c. >>>> Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin >>>> is disabled. >>>> >>>> Signed-off-by: Alexander Popov >>>> --- >>>> arch/arm64/kernel/vdso/Makefile | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/arm64/kernel/vdso/Makefile >>>> b/arch/arm64/kernel/vdso/Makefile >>>> index 3862cad2410c..9b84cafbd2da 100644 >>>> --- a/arch/arm64/kernel/vdso/Makefile >>>> +++ b/arch/arm64/kernel/vdso/Makefile >>>> @@ -32,7 +32,8 @@ UBSAN_SANITIZE := n >>>> OBJECT_FILES_NON_STANDARD := y >>>> KCOV_INSTRUMENT := n >>>> >>>> -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables >>>> +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \ >>>> + $(DISABLE_STACKLEAK_PLUGIN) >>> >>> I can pick this one up via arm64, thanks. Are there any other plugins we >>> should be wary of? It looks like x86 filters out $(GCC_PLUGINS_CFLAGS) >>> when building the vDSO. >> >> I didn't realize/remember that arm64 retained the kernel build flags for >> vDSO builds. (I'm used to x86 throwing all its flags away for its vDSO.) >> >> How does 32-bit ARM do its vDSO? >> >> My quick run-through on plugins: >> >> arm_ssp_per_task_plugin.c >> 32-bit ARM only (but likely needs disabling for 32-bit ARM vDSO?) > > On arm64, the 32-bit toolchain is picked up via CC_COMPAT -- does that still > get the plugins? > >> cyc_complexity_plugin.c >> compile-time reporting only >> >> latent_entropy_plugin.c >> this shouldn't get triggered for the vDSO (no __latent_entropy >> nor __init attributes in vDSO), but perhaps explicitly disabling >> it would be a sensible thing to do, just for robustness? >> >> randomize_layout_plugin.c >> this shouldn't get triggered (again, lacking attributes), but >> should likely be disabled too. >> >> sancov_plugin.c >> This should be tracking the KCOV directly (see >> scripts/Makefile.kcov), which is already disabled here. >> >> structleak_plugin.c >> This should be fine in the vDSO, but there's not security >> boundary here, so it wouldn't be important to KEEP it enabled. > > Thanks for going through these. In general though, it seems like an > opt-in strategy would make more sense, as it doesn't make an awful lot > of sense to me for the plugins to be used to build the vDSO. > > So I would prefer that this patch filters out $(GCC_PLUGINS_CFLAGS). Ok, I will do that in the v2 of the patch series. Best regards, Alexander
Re: [PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic
On 09.06.2020 21:39, Kees Cook wrote: > On Thu, Jun 04, 2020 at 06:23:38PM +0300, Alexander Popov wrote: >> On 04.06.2020 17:01, Jann Horn wrote: >>> On Thu, Jun 4, 2020 at 3:51 PM Alexander Popov wrote: >>>> Some time ago Variable Length Arrays (VLA) were removed from the kernel. >>>> The kernel is built with '-Wvla'. Let's exclude alloca() from the >>>> instrumentation logic and make it simpler. The build-time assertion >>>> against alloca() is added instead. >>> [...] >>>> + /* Variable Length Arrays are forbidden in the >>>> kernel */ >>>> + gcc_assert(!is_alloca(stmt)); >>> >>> There is a patch series from Elena and Kees on the kernel-hardening >>> list that deliberately uses __builtin_alloca() in the syscall entry >>> path to randomize the stack pointer per-syscall - see >>> <https://lore.kernel.org/kernel-hardening/20200406231606.37619-4-keesc...@chromium.org/>. >> >> Thanks, Jann. >> >> At first glance, leaving alloca() handling in stackleak instrumentation logic >> would allow to integrate stackleak and this version of random_kstack_offset. > > Right, it seems there would be a need for this coverage to remain, > otherwise the depth of stack erasure might be incorrect. > > It doesn't seem like the other patches strictly depend on alloca() > support being removed, though? Ok, I will leave alloca() support, reorganize the patch series and send v2. >> Kees, Elena, did you try random_kstack_offset with upstream stackleak? > > I didn't try that combination yet, no. It seemed there would likely > still be further discussion about the offset series first (though the > thread has been silent -- I'll rebase and resend it after rc2). Ok, please add me to CC list. Best regards, Alexander >> It looks to me that without stackleak erasing random_kstack_offset can be >> weaker. I mean, if next syscall has a bigger stack randomization gap, the >> data >> on thread stack from the previous syscall is not overwritten and can be >> used. Am >> I right? > > That's correct. I think the combination is needed, but I don't think > they need to be strictly tied together. > >> Another aspect: CONFIG_STACKLEAK_METRICS can be used for guessing kernel >> stack >> offset, which is bad. It should be disabled if random_kstack_offset is on. > > Agreed.
Re: [PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving
On 09.06.2020 21:46, Kees Cook wrote: > On Thu, Jun 04, 2020 at 04:49:54PM +0300, Alexander Popov wrote: >> Let's improve the instrumentation to avoid this: >> >> 1. Make stackleak_track_stack() save all register that it works with. >> Use no_caller_saved_registers attribute for that function. This attribute >> is available for x86_64 and i386 starting from gcc-7. >> >> 2. Insert calling stackleak_track_stack() in asm: >> asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer)) >> Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h. >> The input constraint is taken into account during gcc shrink-wrapping >> optimization. It is needed to be sure that stackleak_track_stack() call is >> inserted after the prologue of the containing function, when the stack >> frame is prepared. > > Very cool; nice work! > >> +static void add_stack_tracking(gimple_stmt_iterator *gsi) >> +{ >> +/* >> + * The 'no_caller_saved_registers' attribute is used for >> + * stackleak_track_stack(). If the compiler supports this attribute for >> + * the target arch, we can add calling stackleak_track_stack() in asm. >> + * That improves performance: we avoid useless operations with the >> + * caller-saved registers in the functions from which we will remove >> + * stackleak_track_stack() call during the stackleak_cleanup pass. >> + */ >> +if (lookup_attribute_spec(get_identifier("no_caller_saved_registers"))) >> +add_stack_tracking_gasm(gsi); >> +else >> +add_stack_tracking_gcall(gsi); >> +} > > The build_for_x86 flag is only ever used as an assert() test against > no_caller_saved_registers, but we're able to test for that separately. > Why does the architecture need to be tested? (i.e. when this flag > becomes supported o other architectures, why must it still be x86-only?) The inline asm statement that is used for instrumentation is arch-specific. Trying to add asm volatile("call stackleak_track_stack") in gcc plugin on aarch64 makes gcc break spectacularly. I pass the target arch name to the plugin and check it explicitly to avoid that. Moreover, I'm going to create a gcc enhancement request for supporting no_caller_saved_registers attribute on aarch64. Best regards, Alexander
Re: [PATCH 3/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter
On 09.06.2020 21:47, Kees Cook wrote: > On Thu, Jun 04, 2020 at 04:49:55PM +0300, Alexander Popov wrote: >> Add 'verbose' plugin parameter for stackleak gcc plugin. >> It can be used for printing additional info about the kernel code >> instrumentation. >> >> For using it add the following to scripts/Makefile.gcc-plugins: >> gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \ >> += -fplugin-arg-stackleak_plugin-verbose >> >> Signed-off-by: Alexander Popov > > Acked-by: Kees Cook I see that I will change this patch after leaving alloca() support. I'm going to add debug printing about functions that call alloca(). I have to omit your 'acked-by' for the changed patch, right? Best regards, Alexander
Re: [PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving
On 10.06.2020 23:03, Kees Cook wrote: > On Wed, Jun 10, 2020 at 06:47:14PM +0300, Alexander Popov wrote: >> On 09.06.2020 21:46, Kees Cook wrote: >> The inline asm statement that is used for instrumentation is arch-specific. >> Trying to add >> asm volatile("call stackleak_track_stack") >> in gcc plugin on aarch64 makes gcc break spectacularly. > > Ah! Thank you, that eluded my eyes. :) > >> I pass the target arch name to the plugin and check it explicitly to avoid >> that. >> >> Moreover, I'm going to create a gcc enhancement request for supporting >> no_caller_saved_registers attribute on aarch64. > > For arm64 right now it looks like the plugin will just remain > "inefficient" in these cleanup, as before, yes? Yes, for arm64 the instrumentation didn't change in this patch series. I checked the disasm and see the similar issue with useless register saving. I'm going to add asm instrumentation for arm64 when (I hope) the no_caller_saved_registers attribute becomes available for that platform. Best regards, Alexander
Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
On 10.06.2020 10:30, Will Deacon wrote: > On Tue, Jun 09, 2020 at 12:09:27PM -0700, Kees Cook wrote: >> arm_ssp_per_task_plugin.c >> 32-bit ARM only (but likely needs disabling for 32-bit ARM vDSO?) I tested: on 32-bit arm vDSO is built with plugin flags. I will filter them out in a separate patch in v2 of the series. > On arm64, the 32-bit toolchain is picked up via CC_COMPAT -- does that still > get the plugins? I tested it with this command: make ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi- CROSS_COMPILE=aarch64-linux-gnu- V=1 I see that COMPAT_VDSO is built without plugin flags. So it's ok. Best regards, Alexander
[PATCH v2 1/5] gcc-plugins/stackleak: Don't instrument itself
There is no need to try instrumenting functions in kernel/stackleak.c. Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin is disabled. Signed-off-by: Alexander Popov Acked-by: Kees Cook --- kernel/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/Makefile b/kernel/Makefile index f3218bc5ec69..155b5380500a 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -125,6 +125,7 @@ obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o +CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN) obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o KASAN_SANITIZE_stackleak.o := n KCSAN_SANITIZE_stackleak.o := n -- 2.25.4
[PATCH v2 0/5] Improvements of the stackleak gcc plugin
This is the v2 of the patch series with various improvements of the stackleak gcc plugin. The first three patches disable unneeded gcc plugin instrumentation for some files. The fourth patch is the main improvement. It eliminates an unwanted side-effect of kernel code instrumentation performed by stackleak gcc plugin. This patch is a deep reengineering of the idea described on grsecurity blog: https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction The final patch adds 'verbose' stackleak parameter for printing additional info about the kernel code instrumentation during kernel building. I would like to thank Alexander Monakov for his advisory on gcc internals. This patch series was tested for gcc version 4.8, 5, 6, 7, 8, 9, and 10 on x86_64, i386 and arm64. That was done using the project 'kernel-build-containers': https://github.com/a13xp0p0v/kernel-build-containers Changes from v1: - rebase onto 5.8.0-rc2; - don't exclude alloca() from the instrumentation logic, because it will be used in kernel stack offset randomization; - reorder patches in the series; - don't use gcc plugins for building vgettimeofday.c in arm and arm64 vDSO; - follow alphabetic order in include/linux/compiler_attributes.h. Link to v1: https://lore.kernel.org/lkml/20200604134957.505389-1-alex.po...@linux.com/ Alexander Popov (5): gcc-plugins/stackleak: Don't instrument itself ARM: vdso: Don't use gcc plugins for building vgettimeofday.c arm64: vdso: Don't use gcc plugins for building vgettimeofday.c gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving gcc-plugins/stackleak: Add 'verbose' plugin parameter arch/arm/vdso/Makefile | 2 +- arch/arm64/kernel/vdso/Makefile| 2 +- include/linux/compiler_attributes.h| 13 ++ kernel/Makefile| 1 + kernel/stackleak.c | 16 +- scripts/Makefile.gcc-plugins | 2 + scripts/gcc-plugins/stackleak_plugin.c | 248 + 7 files changed, 239 insertions(+), 45 deletions(-) -- 2.25.4
[PATCH v2 3/5] arm64: vdso: Don't use gcc plugins for building vgettimeofday.c
Don't use gcc plugins for building arch/arm64/kernel/vdso/vgettimeofday.c to avoid unneeded instrumentation. Signed-off-by: Alexander Popov --- arch/arm64/kernel/vdso/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 556d424c6f52..0f1ad63b3326 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -29,7 +29,7 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \ ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18 ccflags-y += -DDISABLE_BRANCH_PROFILING -CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) +CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) KBUILD_CFLAGS += $(DISABLE_LTO) KASAN_SANITIZE := n UBSAN_SANITIZE := n -- 2.25.4
[PATCH v2 2/5] ARM: vdso: Don't use gcc plugins for building vgettimeofday.c
Don't use gcc plugins for building arch/arm/vdso/vgettimeofday.c to avoid unneeded instrumentation. Signed-off-by: Alexander Popov --- arch/arm/vdso/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile index d3c9f03e7e79..a54f70731d9f 100644 --- a/arch/arm/vdso/Makefile +++ b/arch/arm/vdso/Makefile @@ -29,7 +29,7 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH) CFLAGS_REMOVE_vdso.o = -pg # Force -O2 to avoid libgcc dependencies -CFLAGS_REMOVE_vgettimeofday.o = -pg -Os +CFLAGS_REMOVE_vgettimeofday.o = -pg -Os $(GCC_PLUGINS_CFLAGS) ifeq ($(c-gettimeofday-y),) CFLAGS_vgettimeofday.o = -O2 else -- 2.25.4
[PATCH v2 5/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter
Add 'verbose' plugin parameter for stackleak gcc plugin. It can be used for printing additional info about the kernel code instrumentation. For using it add the following to scripts/Makefile.gcc-plugins: gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \ += -fplugin-arg-stackleak_plugin-verbose Signed-off-by: Alexander Popov --- scripts/gcc-plugins/stackleak_plugin.c | 47 +++--- 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c index a18b0d4af456..48e141e07956 100644 --- a/scripts/gcc-plugins/stackleak_plugin.c +++ b/scripts/gcc-plugins/stackleak_plugin.c @@ -34,6 +34,8 @@ __visible int plugin_is_GPL_compatible; static int track_frame_size = -1; static bool build_for_x86 = false; static const char track_function[] = "stackleak_track_stack"; +static bool disable = false; +static bool verbose = false; /* * Mark these global variables (roots) for gcc garbage collector since @@ -46,6 +48,7 @@ static struct plugin_info stackleak_plugin_info = { .help = "track-min-size=nn\ttrack stack for functions with a stack frame size >= nn bytes\n" "arch=target_arch\tspecify target build arch\n" "disable\t\tdo not activate the plugin\n" + "verbose\t\tprint info about the instrumentation\n" }; static void add_stack_tracking_gcall(gimple_stmt_iterator *gsi, bool after) @@ -102,6 +105,10 @@ static tree get_current_stack_pointer_decl(void) return var; } + if (verbose) { + fprintf(stderr, "stackleak: missing current_stack_pointer in %s()\n", + DECL_NAME_POINTER(current_function_decl)); + } return NULL_TREE; } @@ -195,6 +202,11 @@ static unsigned int stackleak_instrument_execute(void) if (!is_alloca(stmt)) continue; + if (verbose) { + fprintf(stderr, "stackleak: be careful, alloca() in %s()\n", + DECL_NAME_POINTER(current_function_decl)); + } + /* Insert stackleak_track_stack() call after alloca() */ add_stack_tracking(&gsi, true); if (bb == entry_bb) @@ -384,13 +396,31 @@ static bool remove_stack_tracking_gasm(void) */ static unsigned int stackleak_cleanup_execute(void) { + const char *fn = DECL_NAME_POINTER(current_function_decl); bool removed = false; - if (cfun->calls_alloca) + /* +* Leave stack tracking in functions that call alloca(). +* Additional case: +* gcc before version 7 called allocate_dynamic_stack_space() from +* expand_stack_vars() for runtime alignment of constant-sized stack +* variables. That caused cfun->calls_alloca to be set for functions +* that in fact don't use alloca(). +* For more info see gcc commit 7072df0aae0c59ae437e. +* Let's leave such functions instrumented as well. +*/ + if (cfun->calls_alloca) { + if (verbose) + fprintf(stderr, "stackleak: instrument %s(): calls_alloca\n", fn); return 0; + } - if (large_stack_frame()) + /* Leave stack tracking in functions with large stack frame */ + if (large_stack_frame()) { + if (verbose) + fprintf(stderr, "stackleak: instrument %s()\n", fn); return 0; + } if (lookup_attribute_spec(get_identifier("no_caller_saved_registers"))) removed = remove_stack_tracking_gasm(); @@ -516,9 +546,6 @@ __visible int plugin_init(struct plugin_name_args *plugin_info, /* Parse the plugin arguments */ for (i = 0; i < argc; i++) { - if (!strcmp(argv[i].key, "disable")) - return 0; - if (!strcmp(argv[i].key, "track-min-size")) { if (!argv[i].value) { error(G_("no value supplied for option '-fplugin-arg-%s-%s'"), @@ -541,6 +568,10 @@ __visible int plugin_init(struct plugin_name_args *plugin_info, if (!strcmp(argv[i].value, "x86")) build_for_x86 = true; + } else if (!strcmp(argv[i].key, "disable")) { + disable = true; + } else if (!strcmp(argv[i].key, "verbose")) { + verbose = true; } else { error(G_("unknown option '-fplugin-arg-%s-%s'"),
[PATCH v2 4/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving
The kernel code instrumentation in stackleak gcc plugin works in two stages. At first, stack tracking is added to GIMPLE representation of every function (except some special cases). And later, when stack frame size info is available, stack tracking is removed from the RTL representation of the functions with small stack frame. There is an unwanted side-effect for these functions: some of them do useless work with caller-saved registers. As an example of such case, proc_sys_write without() instrumentation: 55 push %rbp 41 b8 01 00 00 00 mov$0x1,%r8d 48 89 e5mov%rsp,%rbp e8 11 ff ff ff callq 81284610 5d pop%rbp c3 retq 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 00 00 00 proc_sys_write() with instrumentation: 55 push %rbp 48 89 e5mov%rsp,%rbp 41 56 push %r14 41 55 push %r13 41 54 push %r12 53 push %rbx 49 89 f4mov%rsi,%r12 48 89 fbmov%rdi,%rbx 49 89 d5mov%rdx,%r13 49 89 cemov%rcx,%r14 4c 89 f1mov%r14,%rcx 4c 89 eamov%r13,%rdx 4c 89 e6mov%r12,%rsi 48 89 dfmov%rbx,%rdi 41 b8 01 00 00 00 mov$0x1,%r8d e8 f2 fe ff ff callq 81298e80 5b pop%rbx 41 5c pop%r12 41 5d pop%r13 41 5e pop%r14 5d pop%rbp c3 retq 66 0f 1f 84 00 00 00nopw 0x0(%rax,%rax,1) 00 00 Let's improve the instrumentation to avoid this: 1. Make stackleak_track_stack() save all register that it works with. Use no_caller_saved_registers attribute for that function. This attribute is available for x86_64 and i386 starting from gcc-7. 2. Insert calling stackleak_track_stack() in asm: asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer)) Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h. The input constraint is taken into account during gcc shrink-wrapping optimization. It is needed to be sure that stackleak_track_stack() call is inserted after the prologue of the containing function, when the stack frame is prepared. This work is a deep reengineering of the idea described on grsecurity blog https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction Signed-off-by: Alexander Popov Acked-by: Miguel Ojeda --- include/linux/compiler_attributes.h| 13 ++ kernel/stackleak.c | 16 +- scripts/Makefile.gcc-plugins | 2 + scripts/gcc-plugins/stackleak_plugin.c | 205 + 4 files changed, 196 insertions(+), 40 deletions(-) diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index cdf016596659..551ea8cb70b1 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -37,6 +37,7 @@ # define __GCC4_has_attribute___copy__0 # define __GCC4_has_attribute___designated_init__ 0 # define __GCC4_has_attribute___externally_visible__ 1 +# define __GCC4_has_attribute___no_caller_saved_registers__ 0 # define __GCC4_has_attribute___noclone__ 1 # define __GCC4_has_attribute___nonstring__ 0 # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8) @@ -175,6 +176,18 @@ */ #define __mode(x) __attribute__((__mode__(x))) +/* + * Optional: only supported since gcc >= 7 + * + * gcc: https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-no_005fcaller_005fsaved_005fregisters-function-attribute_002c-x86 + * clang: https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers + */ +#if __has_attribute(__no_caller_saved_registers__) +# define __no_caller_saved_registers __attribute__((__no_caller_saved_registers__)) +#else +# define __no_caller_saved_registers +#endif + /* * Optional: not supported by clang * diff --git a/kernel/stackleak.c b/kernel/stackleak.c index b193a59fc05b..a8fc9ae1d03d 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -104,19 +104,9 @@ asmlinkage void notrace stackleak_erase(void) } NOKPROBE_SYMBOL(stackleak_erase); -void __used notrace stackleak_track_stack(void) +void __used __no_caller_saved_registers notrace stackleak_track_stack(void) { - /* -* N.B. stackleak_erase() fills the kernel stack with the poison value, -* which has the register width. That code assumes that the value -* of 'lowest_stack'
Re: [PATCH v2 2/5] ARM: vdso: Don't use gcc plugins for building vgettimeofday.c
On 24.06.2020 15:52, Luis Chamberlain wrote: > On Wed, Jun 24, 2020 at 03:33:27PM +0300, Alexander Popov wrote: >> Don't use gcc plugins for building arch/arm/vdso/vgettimeofday.c to >> avoid unneeded instrumentation. >> >> Signed-off-by: Alexander Popov > > But why is skipping it safe? Hello Luis, Kees and Will discussed that in detail in v1 of the series: https://lore.kernel.org/lkml/20200610073046.GA15939@willie-the-truck/ Best regards, Alexander
Re: [PATCH v2 5/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter
On 24.06.2020 15:53, Luis Chamberlain wrote: > On Wed, Jun 24, 2020 at 03:33:30PM +0300, Alexander Popov wrote: >> Add 'verbose' plugin parameter for stackleak gcc plugin. >> It can be used for printing additional info about the kernel code >> instrumentation. >> >> For using it add the following to scripts/Makefile.gcc-plugins: >> gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \ >> += -fplugin-arg-stackleak_plugin-verbose > > Would be nice if we instead could pass an argument to make which lets > us enable this. This feature is useful only for debugging stackleak gcc plugin. The cflag that enables it is similar to -fplugin-arg-structleak_plugin-verbose, which is used for debugging the structleak plugin. This debugging feature clutters the kernel build output, I don't think that many people will use it. So IMO creating a separate argument for make is not really needed. Thanks! Best regards, Alexander
Clarifying attribute-const
Hello, I'd like to ask for community input regarding __attribute__((const)) (and "pure", where applicable). My main goal is to clarify unclear cases and improve existing documentation, if possible. First, a belated follow-up to https://gcc.gnu.org/PR66512 . The bug is asking why attribute-const appears to have a weaker effect in C++, compared to C. The answer in that bug is that GCC assumes that attribute-const function can terminate by throwing an exception. That doesn't actually seem reasonable. Consider that C counterpart to throwing is longjmp; it seems to me that GCC should behave consistently: either assume that attribute-const may both longjmp and throw (I guess nobody wants that), or that it may not longjmp nor throw. Intuitively, if "const" means "free of side effects so that calls can be moved speculatively or duplicated", then non-local control flow transfer via throwing should be disallowed as well. In any case, it would be nice the intended compiler behavior could be explicitely stated in the manual. Second, there is an interesting mismatch between documentation and existing usage. Among most prominent users of the attribute there are two glibc functions: __errno_location(void) and pthread_self(void). Both return a pointer to thread-local storage, so the functions are not "const" globally in a multi-threaded process. A sufficiently advanced compiler can cause the following testcase to abort: #include #include static void *errno_pointer; static void *thr(void *unused) { errno_pointer = &errno; return 0; } int main() { errno_pointer = &errno; pthread_t t; pthread_create(&t, 0, thr, 0); pthread_join(t, 0); if (errno_pointer == &errno) abort(); } (errno_pointer is static, so the compiler can observe that it does not escape the translation unit, and all stores in the TU assign the same "const" value) Does GCC need to be concerned about eventually "miscompiling" such cases? If not, can we document an explicit promise that attribute-const may include pointers-to-TLS? Thanks. Alexander
Re: Clarifying attribute-const
On Fri, 25 Sep 2015, Eric Botcazou wrote: > > First, a belated follow-up to https://gcc.gnu.org/PR66512 . The bug is > > asking why attribute-const appears to have a weaker effect in C++, compared > > to C. The answer in that bug is that GCC assumes that attribute-const > > function can terminate by throwing an exception. > > FWIW there is an equivalent semantics in Ada: the "const" functions can throw > and the language explicitly allows them to be CSEd in this case, etc. Can you expand on the "etc." a bit, i.e., may the compiler ... - move a call to a "const" function above a conditional branch, causing a conditional throw to happen unconditionally? - move a call to a "const" function below a conditional branch, causing an unconditional throw to happen only conditionally? - reorder calls to "const" functions w.r.t. code with side effects, or other throwing functions? (all of the above in the context of Ada) Thanks. Alexander
GCC-Bridge: A Gimple Compiler targeting the JVM
I wanted to share a project we've been working on for sometime within the context of Renjin, a new interpreter for the R language running on the JVM. We basically needed a way to compile C and Fortran code to JVM classes, and for the last year or two we've been working on tool chain that's composed of a GCC plugin which dumps Gimple trees out to a JSON file, and a Java program which reads the JSON and compiles it to Java classfiles. I've written a bit more about it today here: http://www.renjin.org/blog/2016-01-31-introducing-gcc-bridge.html And you can find the whole project here: https://github.com/bedatadriven/renjin/tree/master/tools/gcc-bridge The compiler is part of the Renjin project, but can also be used in a standalone way to compile arbitrary C/Fortran code to Java classfiles, though the focus has been on pure scientific code, so we haven't bothered with some rather obvious things like fopen(). Anyway, using the GCC plugin interface has been terrific, and the gimple trees have been great to work with! -Alex
Re: GCC-Bridge: A Gimple Compiler targeting the JVM
Thanks Mikhail and Manuel for the reactions! Mikhail, thanks for the tip on xmalloc, will take a look if that can help clean up the plugin code. Manuel, 0) Yes, we hope to make it faster! 1) Initially coding within GCC would have been too intimidating, but I think i've started to get a feel for the system and porting the compiler to some sort of C++ module might be a good long term to strategy to avoid drift as GCC internal evolve. I'm not sure exactly where it would fit in however- I don't think it could be described with the machine description language. There is alot of complexity involved in handling things like addressable local variables, which have to be allocated as unit length arrays so that we can pass around a reference to them. Would it be possible to write a backend that generates code from Gimple and not RTL? 2) I'm hoping we can get within 10-20% slowdown. However, the ultimate goal is to be include "native" code in Renjin's auto parallelization feature, which operates more on the level of aSQL query planner than at the instruction level, which would lead to a net speedup. That's the working hypothesis at least. I'm planning on doing a round of benchmarking in the next 1-2 months along with a comparison of the assembly generated by GCC on one hand for a given source, and the assembly ultimately generated by the JVM's JIT compiler. 3) Good to know! Is this right the mailing list to ask questions about some of the internal structure? The GCC Internals manual is very useful, and the source provides a lot of answers, but sometimes I run into questions, for example, on how exactly UNORDERED_EXPR is defined, or how to access the byte offset for COMPONENT_REF expression when field names vary. Best, Alex On Mon, Feb 1, 2016 at 10:25 PM, Manuel López-Ibáñez wrote: > On 01/02/16 12:34, Bertram, Alexander wrote: >> >> I wanted to share a project we've been working on for sometime within >> the context of Renjin, >> a new interpreter for the R language running on the JVM. >> >> We basically needed a way to compile C and Fortran code to JVM >> classes, and for the last year or two we've been working on tool chain >> that's composed of a GCC plugin which dumps Gimple trees out to a JSON >> file, and a Java program which reads the JSON and compiles it to Java >> classfiles. > > > This sounds interesting! (R is so slow) > > 1) Wouldn't it be better to directly compile to Java bytecode? I think GJC > was able to do that in the past, but I'm not sure how bit-rotted that code > is by now. https://gcc.gnu.org/ml/gcc/2000-02/msg00161.html It may be a more > robust solution in the long run. > > 2) The reason that R uses so much C and Fortran code is because when > compiled, that code is much faster than R. How much is lost by compiling and > running in JVM? > > 3) Gimple is not completely target independent, but I think we would like to > move towards that direction, so reporting bugs about that with specific > testcases may be helpful. Of course, a JVM target would make this concern > irrelevant. > >> Anyway, using the GCC plugin interface has been terrific, and the >> gimple trees have been great to work with! > > > Please, note that the plugin API is mainly driven by its users. If you want > to see improvements, don't hesitate to propose patches. > > Cheers, > > Manuel. > -- Alex Bertram Technical Director bedatadriven Web: http://bedatadriven.com Email: a...@bedatadriven.com Tel. Nederlands: +31(0)647205388 Skype: akbertram
Re: nonnull, -Wnonnull, and do/while
On Tue, 16 Feb 2016, Marek Polacek wrote: > Well, it's just that "s" has the nonnull attribute so the compiler thinks it > should never be null in which case comparing it to null should be redundant. > Doesn't seem like a false positive to me, but maybe someone else feels > otherwise. Please look at the posted code again: static void f(const char *s) { do { printf("%s\n",s); s = NULL; } while (s != NULL); } Since 's' is assigned to, the constraint from 'printf' is no longer useful for warning at the point of comparison. It clearly looks like a false positive to me. Alexander
Re: who owns stack args?
On Wed, 24 Feb 2016, DJ Delorie wrote: > The real question is: are stack arguments call-clobbered or > call-preserved? Does the answer depend on the "pure" attribute? Stack area holding stack arguments should belong to the callee for tail-calls to work (the callee will trash that area when laying out arguments for the tail call; thanks to Rich Felker for pointing that out to me). Thus it cannot depend on attribute-pure. Alexander
GCC Bugzilla whines broken?
Hello, Can anyone quickly confirm whether "whining" feature in the GCC Bugzilla is supposed to be functioning at the moment? The lastest thread I could find indicates that it is actually supposed to be working: https://gcc.gnu.org/ml/gcc/2010-09/msg00569.html . However I've tried to setup a whine for myself a week ago, and it never produced the emails. Actually, I want a different feature than whining: notifications for bugs matching a certain predicate, e.g. for a specific target; ideally being automatically Cc'ed to such bugs, with an option to un-cc myself if needed. I can somewhat emulate that with whine searches restricted to "last N days". Is anybody doing something like that? Thanks. Alexander
Re: out of bounds access in insn-automata.c
Hi, On Thu, 24 Mar 2016, Bernd Schmidt wrote: > On 03/24/2016 11:17 AM, Aldy Hernandez wrote: > > On 03/23/2016 10:25 AM, Bernd Schmidt wrote: > > > It looks like this block of code is written by a helper function that is > > > really intended for other purposes than for maximal_insn_latency. Might > > > be worth changing to > > > int insn_code = dfa_insn_code (as_a (insn)); > > > gcc_assert (insn_code <= DFA__ADVANCE_CYCLE); > > > > dfa_insn_code_* and friends can return > DFA__ADVANCE_CYCLE so I can't > > put that assert on the helper function. > > So don't use the helper function? Just emit the block above directly. Let me chime in :) The function under scrutiny, maximal_insn_latency, was added as part of selective scheduling merge; at the same time, output_default_latencies was factored out of output_internal_insn_latency_func, and the pair of new functions output_internal_maximal_insn_latency_func/output_maximal_insn_latency_func tried to mirror existing pair of output_internal_insn_latency_func/output_insn_latency_func. In particular, output_insn_latency_func also invokes output_internal_insn_code_evaluation (twice, for each argument). This means that generated 'insn_latency' can also call 'internal_insn_latency' with DFA__ADVANCE_CYCLE in arguments. However, 'internal_insn_latency' then has a specially emitted 'if' statement that checks if either of the arguments is ' >= DFA__ADVANCE_CYCLE', and returns 0 in that case. So ultimately pre-existing code was checking ' > DFA__ADVANCE_CYCLE' first and ' >= DFA_ADVANCE_CYCLE' second (for no good reason as far as I can see), and when the new '_maximal_' functions were introduced, the second check was not duplicated in the new copy. So as long we are not looking for hacking it up further, I'd like to clean up both functions at the same time. If calling the 'internal_' variants with DFA__ADVANCE_CYCLE is rare, extending 'default_insn_latencies' by 1 zero element corresponding to DFA__ADVANCE_CYCLE is a simple suitable fix. If either DFA__ADVANCE_CYCLE is not guaranteed to be rare, or extending the table in that style is undesired, I suggest creating a variant of 'output_internal_insn_code_evaluation' that performs a '>=' rather than '>' test in the first place, and use it in both output_insn_latency_func and output_maximal_insn_latency_func. If acknowledged, I volunteer to regstrap on x86_64 and submit that in stage1. Thoughts? Thanks. Alexander
[PATCH] clean up insn-automata.c (was: Re: out of bounds access in insn-automata.c)
On Wed, 30 Mar 2016, Bernd Schmidt wrote: > On 03/25/2016 04:43 AM, Aldy Hernandez wrote: > > If Bernd is fine with this, I'm happy to retract my patch and any > > possible followups. I'm just interested in having no path causing a > > possible out of bounds access. If your patch will do that, I'm cool. > > I'll need to see that patch first to comment :-) Here's the proposed patch. I've found that there's only one user of the current fancy logic in output_internal_insn_code_evaluation: handling of NULL_RTX and const0_rtx is only useful for 'state_transition' (generated by output_trans_func), so it's possible to inline the extended handling there, and handle only plain non-null rtx_insns in output_internal_insn_code_evaluation. This change allows to remove extra checks and casting in output_internal_insn_latency_func, as done by the patch. As a nice bonus, it constrains prototypes of three automata functions to accept insn_rtx rather than just rtx. Bootstrapped and regtested on x86_64, OK? Thanks. Alexander * genattr.c (main): Change 'rtx' to 'rtx_insn *' in prototypes of 'insn_latency', 'maximal_insn_latency', 'min_insn_conflict_delay'. * genautomata.c (output_internal_insn_code_evaluation): Simplify. Move handling of non-insn arguments inline into the sole user: (output_trans_func): ...here. (output_min_insn_conflict_delay_func): Change 'rtx' to 'rtx_insn *' in emitted function prototype. (output_internal_insn_latency_func): Ditto. Simplify. (output_internal_maximal_insn_latency_func): Ditto. Delete always-unused argument. (output_insn_latency_func): Ditto. (output_maximal_insn_latency_func): Ditto. diff --git a/gcc/genattr.c b/gcc/genattr.c index 656a9a7..77380e7 100644 --- a/gcc/genattr.c +++ b/gcc/genattr.c @@ -240,11 +240,11 @@ main (int argc, const char **argv) printf ("/* Insn latency time on data consumed by the 2nd insn.\n"); printf (" Use the function if bypass_p returns nonzero for\n"); printf (" the 1st insn. */\n"); - printf ("extern int insn_latency (rtx, rtx);\n\n"); + printf ("extern int insn_latency (rtx_insn *, rtx_insn *);\n\n"); printf ("/* Maximal insn latency time possible of all bypasses for this insn.\n"); printf (" Use the function if bypass_p returns nonzero for\n"); printf (" the 1st insn. */\n"); - printf ("extern int maximal_insn_latency (rtx);\n\n"); + printf ("extern int maximal_insn_latency (rtx_insn *);\n\n"); printf ("\n#if AUTOMATON_ALTS\n"); printf ("/* The following function returns number of alternative\n"); printf (" reservations of given insn. It may be used for better\n"); @@ -290,8 +290,8 @@ main (int argc, const char **argv) printf ("state_transition should return negative value for\n"); printf ("the insn and the state). Data dependencies between\n"); printf ("the insns are ignored by the function. */\n"); - printf - ("extern int min_insn_conflict_delay (state_t, rtx, rtx);\n"); + printf ("extern int " + "min_insn_conflict_delay (state_t, rtx_insn *, rtx_insn *);\n"); printf ("/* The following function outputs reservations for given\n"); printf (" insn as they are described in the corresponding\n"); printf (" define_insn_reservation. */\n"); diff --git a/gcc/genautomata.c b/gcc/genautomata.c index dcde604..92c8b5c 100644 --- a/gcc/genautomata.c +++ b/gcc/genautomata.c @@ -8113,14 +8113,10 @@ output_internal_trans_func (void) /* Output code - if (insn != 0) -{ - insn_code = dfa_insn_code (insn); - if (insn_code > DFA__ADVANCE_CYCLE) -return code; -} - else -insn_code = DFA__ADVANCE_CYCLE; + gcc_checking_assert (insn != 0); + insn_code = dfa_insn_code (insn); + if (insn_code >= DFA__ADVANCE_CYCLE) +return code; where insn denotes INSN_NAME, insn_code denotes INSN_CODE_NAME, and code denotes CODE. */ @@ -8129,21 +8125,12 @@ output_internal_insn_code_evaluation (const char *insn_name, const char *insn_code_name, int code) { - fprintf (output_file, "\n if (%s == 0)\n", insn_name); - fprintf (output_file, "%s = %s;\n\n", - insn_code_name, ADVANCE_CYCLE_VALUE_NAME); - if (collapse_flag) -{ - fprintf (output_file, "\n else if (%s == const0_rtx)\n", insn_name); - fprintf (output_file, "%s = %s;\n\
_Bool and trap representations
Hi! If a variable of type _Bool contains something different from 0 and 1 its use amounts to UB in gcc and clang. There is a couple of examples in [1] ([2] is also interesting). [1] https://github.com/TrustInSoft/tis-interpreter/issues/39 [2] https://github.com/TrustInSoft/tis-interpreter/issues/100 But my question is about the following example: -- #include int main() { _Bool b; *(char *)&b = 123; printf("%d\n", *(char *)&b); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 123 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 1 -- gcc version: gcc (GCC) 7.0.0 20160604 (experimental) It seems that padding in _Bool is treated as permanently unspecified. Is this behavior intentional? What's the theory behind it? One possible explanations is C11, 6.2.6.2p1, which reads: "The values of any padding bits are unspecified." But it's somewhat a stretch to conclude from it that the values of padding bits cannot be specified even with explicit assignment. Another possible approach is to refer to Committee Response for Question 1 in DR 260 which reads: "Values may have any bit-pattern that validly represents them and the implementation is free to move between alternate representations (for example, it may normalize pointers, floating-point representations etc.). [...] the actual bit-pattern may change without direct action of the program." Is similar behavior expected from other types of padding (padding in long double, padding bytes/bits in structs/unions) in the future? Maybe even normalization of pointers (randomly aligning misaligned pointers)? -- Alexander Cherepanov
Re: _Bool and trap representations
On 2016-06-08 10:29, Richard Biener wrote: On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanov wrote: [skip] But my question is about the following example: -- #include int main() { _Bool b; *(char *)&b = 123; printf("%d\n", *(char *)&b); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 123 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 1 -- [skip] Another explanation is that this is a bug. It manifests itself at the time we re-write 'b' into SSA form, disregarding the fact that we access it via a type that while matching in size does not match in precision. Oh, that's much more boring outcome:-) Can you open a bugreport? Sure, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71452 . -- Alexander Cherepanov
Re: _Bool and trap representations
On 2016-06-08 17:37, Martin Sebor wrote: On 06/08/2016 12:36 AM, Alexander Cherepanov wrote: Hi! If a variable of type _Bool contains something different from 0 and 1 its use amounts to UB in gcc and clang. There is a couple of examples in [1] ([2] is also interesting). [1] https://github.com/TrustInSoft/tis-interpreter/issues/39 [2] https://github.com/TrustInSoft/tis-interpreter/issues/100 But my question is about the following example: -- #include int main() { _Bool b; *(char *)&b = 123; printf("%d\n", *(char *)&b); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 123 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 1 -- gcc version: gcc (GCC) 7.0.0 20160604 (experimental) Similar example with long double: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71522 It seems that padding in _Bool is treated as permanently unspecified. Is this behavior intentional? What's the theory behind it? One possible explanations is C11, 6.2.6.2p1, which reads: "The values of any padding bits are unspecified." But it's somewhat a stretch to conclude from it that the values of padding bits cannot be specified even with explicit assignment. Another possible approach is to refer to Committee Response for Question 1 in DR 260 which reads: "Values may have any bit-pattern that validly represents them and the implementation is free to move between alternate representations (for example, it may normalize pointers, floating-point representations etc.). [...] the actual bit-pattern may change without direct action of the program." There has been quite a bit of discussion among the committee on this subject lately (the last part is the subject of DR #451, though it's discussed in the context of uninitialized objects with indeterminate values). Are there notes from these discussions or something? I would hesitate to call it consensus but I think it would be fair to say that the opinion of the vocal majority is that implementations aren't intended to spontaneously change valid (i.e., determinate) representations of objects in the absence of an access to the value of the object. Thanks for the info. IMHO this part of DR 260 has even more serious consequences than the part about pointer provenance. It effectively prohibits manual byte-by-byte (or any non-atomic) copying of objects for types like long double. If an implementation decides to normalize a value in a variable during copying it will see an inconsistent representation, e.g. a trap representation. It's a sure way to get total garbage. I don't know if allowing implementations to normalize values is useful but the current language in DR 260 allows too much. As for valid/determinate representation this is another place where distinction between a value and a representation is worth stressing. Uninitialized variables are a clear case -- both its value and representation are indeterminate. But what if we set some part of representation of a variable -- it doesn't yet have a determinate value but we want the part that we have set to be preserved. Another interesting example is a pointer after free() -- its representation is kinda determinate but its value is indeterminate. -- Alexander Cherepanov
Re: _Bool and trap representations
On 2016-06-13 22:51, Joseph Myers wrote: On Mon, 13 Jun 2016, Alexander Cherepanov wrote: Thanks for the info. IMHO this part of DR 260 has even more serious consequences than the part about pointer provenance. It effectively prohibits manual byte-by-byte (or any non-atomic) copying of objects for types like long double. If an implementation decides to normalize a value in a variable during copying it will see an inconsistent representation, e.g. a trap representation. It's a sure way to get total garbage. I don't know if allowing No, that's not the case; even if representations can change during byte-by-byte copying, such copying of long double values is *still* safe. All long double values for x86 long double have exactly one valid representation in the value bits, and if the padding bits change during copying it doesn't matter; it's only representations that are already trap representations (unnormals, pseudo-* etc.) that might be interpreted inconsistently. The problem is that parts of representations of two different ordinary values can form a trap representation. Suppose x = 1.0 and y = 0.0, i.e. they have the following representations (from high bytes to low bytes): padding signint & frac & exp |---| |---| |-| x: 00 00 00 00 00 00 3f ff 80 00 00 00 00 00 00 00 y: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Suppose that we copy from x to y byte-by-byte starting from high bytes. And suppose the normalization kicks in after copying 8 bytes. We have already copied the sign and the exponent but haven't yet overwritten the 'Integer' bit of Significand so we have the following representation: z: 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00 This is an unnormal and current gcc normalization converts it into 0.0 throwing the exponent away. Copying the remaining 8 bytes leads to a pseudo-denormal: w: 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 But this is already a minor detail. The code to see how gcc normalizes 'z': -- #include #include int main() { long double d0, d; memcpy(&d0, "\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", sizeof d0); d = d0; printf("d = %Lf\n", d); for (unsigned char *p = (unsigned char *)&d + sizeof d; p > (unsigned char *)&d;) printf("%02x ", *--p); printf("\n"); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out d = 0.00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 -- gcc version: gcc (GCC) 7.0.0 20160613 (experimental) -- Alexander Cherepanov
Re: _Bool and trap representations
On 2016-06-14 00:13, Joseph Myers wrote: On Tue, 14 Jun 2016, Alexander Cherepanov wrote: The problem is that parts of representations of two different ordinary values can form a trap representation. Oh, you're talking about normalizing the destination rather than the source of the copy? Yes. I don't see this problem with a current gcc so the problem is hypothetical AFAICT. -- Alexander Cherepanov
Re: _Bool and trap representations
On 2016-06-15 17:15, Martin Sebor wrote: There has been quite a bit of discussion among the committee on this subject lately (the last part is the subject of DR #451, though it's discussed in the context of uninitialized objects with indeterminate values). Are there notes from these discussions or something? Notes from discussions during committee meetings are in the meeting minutes that are posted along with other committee documents on the public site. Those that relate to aspects of defect reports are usually captured in the Committee Discussion and Committee Response to the DR. Other than that, committee discussions that take place on the committee mailing list (such as the recent ones on this topic) are archived for reference of committee members (unlike C++, the C archives are not intended to be open to the public). So it seems the discussion you referred to is not public, that's unfortunate. And to clarify what you wrote about stability of valid representations, is padding expected to be stable when it's not specifically set? I.e. is the following optimization supposed to be conforming or not? Source code: -- #include int main(int argc, char **argv) { (void)argv; struct { char c; int i; } s = {0, 0}; printf("%d\n", argc ? ((unsigned char *)&s)[1] : 5); printf("%d\n", argc ? ((unsigned char *)&s)[1] : 7); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 5 7 -- gcc version: gcc (GCC) 7.0.0 20160616 (experimental) Of course, clang does essentially the same but the testcase is a bit more involved (I can post it if somebody is interested). OTOH clang is more predictable in this area because rules for dealing with undefined values in llvm are more-or-less documented -- http://llvm.org/docs/LangRef.html#undefined-values . I don't see gcc treating padding in long double as indeterminate in the same way as in structs but clang seems to treat them equally. -- Alexander Cherepanov
Re: [RFD] Extremely large alignment of read-only strings
On Wed, 27 Jul 2016, Thorsten Glaser wrote: First of all, I think option -malign-data=abi (new in GCC 5) addresses your need: it can be used to reduce the default (excessive) alignment to just the psABI-dictated value (you can play with this at https://gcc.godbolt.org even if you can't install GCC-5 locally). Note that like with other ABI-affecting options you need to consider implications for linking with code you're not building yourself: if the other code expects bigger alignment, you'll have a bug. One comment to your email below. > After some (well, lots) more debugging, I eventually > discovered -fdump-translation-unit (which, in the version > I was using, also worked for C, not just C++), which showed > me that the alignment was 256 even (only later reduced to > 32 as that’s the maximum alignment for i386). Most likely the quoted figures from GCC dumps are in bits, not bytes. HTH Alexander
Re: [libgomp] No references to env.c -> no libgomp construction
Hello, On Tue, 29 Nov 2016, Sebastian Huber wrote: > * env.c: Split out ICV definitions into... > * icv.c: ...here (new file) and... > * icv-device.c: ...here. New file. > > the env.c contains now only local symbols (at least for target *-rtems*-*): > [...] > > Thus the libgomp constructor is not linked in into executables. Thanks for the report. This issue affects only static libgomp.a (and not on NVPTX where env.c is deliberately empty). I think the minimal solution here is to #include from icv.c instead of compiling it separately (using <> inclusion rather than "" so in case of NVPTX we pick up the empty config/nvptx/env.c from toplevel icv.c). A slightly more involved but perhaps a preferable approach is to remove config/nvptx/env.c, introduce LIBGOMP_OFFLOADED_ONLY macro, and use it to guard inclusion of env.c from icv.c (which then can use the #include "env.c" form). Thanks. Alexander
Aliasing of arrays
Hi! Pascal Cuoq communicated to me the following example: int ar1(int (*p)[3], int (*q)[3]) { (*p)[0] = 1; (*q)[1] = 2; return (*p)[0]; } gcc of versions 4.9.2 and 7.0.0 20161129 optimize it with -O2 on the premise that elements with different indices don't alias: : 0: c7 47 0c 01 00 00 00movl $0x1,0xc(%rdi) 7: b8 01 00 00 00 mov$0x1,%eax c: c7 46 10 02 00 00 00movl $0x2,0x10(%rsi) 13: c3 retq That's fine. But then I expect that gcc will also assume that arrays of different known lengths don't alias, i.e. that gcc will optimize this example: int ar2(int (*p)[8], int (*q)[7]) { (*p)[3] = 1; (*q)[3] = 2; return (*p)[3]; } But this is not optimized: 0020 : 20: c7 47 0c 01 00 00 00movl $0x1,0xc(%rdi) 27: c7 46 0c 02 00 00 00movl $0x2,0xc(%rsi) 2e: 8b 47 0cmov0xc(%rdi),%eax Is this behavior fully intentional, is the first example optimized too aggressively, is an optimization missed in the second example, or is the situation more complex? -- Alexander Cherepanov
Re: How to avoid constant propagation into functions?
[adding gcc@ for the compiler-testsuite-related discussion, please drop either gcc@ or gcc-help@ from Cc: as appropriate in replies] On Wed, 7 Dec 2016, Segher Boessenkool wrote: > > For example, this might have impact on writing test for GCC: > > > > When I am writing a test with noinline + noclone then my > > expectation is that no such propagation happens, because > > otherwise a test might turn trivial... > > The usual ways to prevent that are to add some volatile, or an > asm("" : "+g"(some_var)); etc. No, that doesn't sound right. As far as I can tell from looking that the GCC testsuite, the prevailing way is actually the noinline+noclone combo, not the per-argument asms or volatiles. This behavior is new in gcc-7 due to new IPA-VRP functionality. So -fno-ipa-vrp gets the old behavior. I think from the testsuite perspective the situation got a bit worse due to this, as now in existing testcases stuff can get propagated where the testcase used noinline+noclone to suppress propagation. This means that some testcases may get weaker and no longer test what they were supposed to. And writing new testcases gets less convenient too. However, this actually demonstrates how the noinline+noclone was not future-proof, and in a way backfired now. Should there be, ideally, a single 'noipa' attribute encompassing noinline, noclone, -fno-ipa-vrp, -fno-ipa-ra and all future transforms using inter-procedural knowledge? Alexander
Re: How to avoid constant propagation into functions?
On Wed, 7 Dec 2016, Richard Biener wrote: > >Agreed, that's what I've been using in the past for glibc test cases. > > > >If that doesn't work, we'll need something else. Separate compilation > >of test cases just to thwart compiler optimizations is a significant > >burden, and will stop working once we have LTO anyway. > > > >What about making the function definitions weak? Would that be more > >reliable? > > Adding attribute((used)) should do the trick. It introduces unknown callers > and thus without cloning disables IPA. Hm, depending on the case I think this may be not enough: it thwarts IPA on the callee side, but still allows the compiler to optimize the caller: for example, deduce that callee is pure/const (in which case optimizations in the caller may cause it to be called fewer times than intended or never at all), apply IPA-RA, or (perhaps in future) deduce that the callee always returns non-NULL and optimize the caller accordingly. I think attribute-weak works to suppress IPA on the caller side, but that is not a good solution because it also has an effect on linking semantics, may be not available on non-ELF platforms, etc. Alexander
Re: How to avoid constant propagation into functions?
On Fri, 9 Dec 2016, Richard Biener wrote: > Right, 'used' thwarts IPA on the callee side only. noclone and noinline are > attributes affecting the caller side but we indeed miss attributes for the > properties you mention above. I suppose adding a catch-all attribute for > caller side effects (like we have 'used' for the callee side) would be a good > idea. For general uses, i.e. for testcases that ought to be portable across different compilers, I believe making a call through a volatile pointer already places a sufficient compiler barrier to prevent both caller- and callee-side analysis. That is, where you have int foo(int); foo(arg); you could transform it to int foo(int); int (* volatile vpfoo)(int) = foo; vpfoo(arg); While this also has an effect of forcing the call to be indirect, I think usually that should be acceptable. But for uses in the gcc testsuite, I believe an attribute is still needed. Alexander
Re: [RFC] noipa attribute (was Re: How to avoid constant propagation into functions?)
On Thu, 15 Dec 2016, Jakub Jelinek wrote: > So here is a proof of concept of an attribute that disables inlining, > cloning, ICF, IPA VRP, IPA bit CCP, IPA RA, pure/const/throw discovery. > Does it look reasonable? Anything still missing? I'd like to suggest some additions to the extend.texi entry: > --- gcc/doc/extend.texi.jj2016-12-15 11:26:07.0 +0100 > +++ gcc/doc/extend.texi 2016-12-15 12:19:32.738996533 +0100 > @@ -2955,6 +2955,15 @@ asm (""); > (@pxref{Extended Asm}) in the called function, to serve as a special > side-effect. > > +@item noipa > +@cindex @code{noipa} function attribute > +Disable interprocedural optimizations between the function with this > +attribute and its callers, as if the body of the function is not available > +when optimizing callers and the callers are unavailable when optimizing > +the body. This attribute implies @code{noinline}, @code{noclone}, > +@code{no_icf} and @code{used} attributes and in the future might > +imply further newly added attributes. 1. I believe the last sentence should call out that the effect of this attribute is not reducible to just the existing attributes, because suppression of IPA-RA and pure/const discovery is not expressible that way, and that is actually intended. Can this be added to clarify the intent: However, this attribute is not equivalent to a combination of other attributes, because its purpose is to suppress existing and future optimizations employing interprocedural analysis, including those that do not have an attribute suitable for disabling them individually. (and perhaps remove ' ... and in the future might imply ...' from the quoted snippet, because the clarification makes it partially redundant) 2. Can we gently suggest to readers of documentation that this was invented for use in the GCC testsuite, and encourage them to seek proper alternatives, e.g.: This attribute is exposed for the purpose of testing the compiler. In general it should be preferable to properly constrain code generation using the language facilities: for example, using separate compilation or calling through a volatile pointer achieves a similar effect in a portable way [ except in case of a sufficiently advanced compiler indistinguishable from an adversary ;) ] Thanks. Alexander
LTO remapping/deduction of machine modes of types/decls
Hello, Richard, Jakub, community, May I join/restart the old discussion about machine mode remapping at LTO stream-in time. To recap, when offloading to NVPTX was introduced, there was a problem due to differences in the set of supported modes (e.g. there was no 'XFmode' on NVPTX that would correspond to 'long double' tree type node in GIMPLE LTO streams produced by x86 host compiler). The current solution in GCC is to additionally stream a 'mode table' and use it to remap numeric mode identifiers during LTO stream-in in all trees that have modes. This is the solution initially outlined by Jakub in the message https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00226.html . In response to that, Richard said, > I think (also communicated that on IRC) we should instead try not streaming > machine-modes at all but generating them at stream-in time via layout_type > or layout_decl. and later in the thread also: > I'm just looking for a way to make this less of a hack (and the LTO IL > less target dependent). Not for GCC 5 for which something like your > patch is probably ok, but for the future. Now that we're in the future, I've asked Vlad Ivanishin (Cc'ed) to try and implement Richard's approach. The motivation is enhancing LTO for offloaded code, in particular to expose library code for inlining. In that scenario, the current scheme has a problem that WPA can arbitrarily mix LTO sections coming from libraries (where the modes don't need remapping) and LTO sections produced by the host compiler. Thus, mode_table would need to be only selectively applied during stream-in, based on the origin of the section. And, we'd need to ensure that WPA duplicates mode tables across all ltrans units. In light of that, I felt that trying Richard's approach would be proper. Actually, I don't know why gimple/tree representation carries machine modes in the first place; it seems to be redundant information deducible from type information. Vlad's current patch is adding mode deduction for types and decls, matches the deduced mode against the streamed-in mode, and ICEs in case of mismatch. To be clear, he's checking this for native LTO via lto-bootstrap, but nevertheless it's a nice way of giving confidence that mode inference works as intended. This seems to be fine for C, but in C++ we are seeing some hard-to-explain cases where the deduced BLKmode for 7-byte-sized/4-byte-aligned base-class decl is mismatching against deduced DImode. The DImode would be obviously correct for 8-byte-sized decl of the corresponding type, but the base class decl does not have 1 byte of padding in the tail. What's worse, the issue is just for the mode of the decl: the mode of the type is BLKmode, as we'd expect. Unfortunately, just adjusting the C++ frontend to place BLKmode on the decl too doesn't lead to immediate success, because decl modes have implications for debug info generation, and the compiler starts ICE'ing there instead. So we're hitting under-documented places in the compiler here, and personally I don't have the confidence to judge how they're intended to work. Basically for now my questions are: 1. Is there an intended invariant that decl modes should match type modes? It appears that if there was, the above situation with C++ base objects would be a violation. 2. Do you think we should continue digging in this direction? I'm not sure how much it'd help a discussion, but for completeness Vlad's current patchset is provided as attachments. Patch 1/3 adds mode inference for types (only), patch 2 just reverts Jakub's additions of mode_table handling, and finally patch 3 adds mode inference for decls, adds checking against streamed-in modes, and shows where the attempted adjustments in the C++ frontend and debug info generation were. There are a few coding style violations; sorry; I hope they are not too distracting. Thanks. AlexanderFrom 58ad9d4d75cbc057c003c701ff3f0e6b8fa35e39 Mon Sep 17 00:00:00 2001 From: Vladislav Ivanishin Date: Tue, 13 Dec 2016 14:58:26 +0300 Subject: [PATCH 1/3] Infer modes from types after LTO streaming-in * gcc/lto/lto.c: New function lto_infer_mode () which calls ... * gcc/stor-layout.c: ... the new function set_mode_for_type (). * gcc/stor-layout.h: Declare set_mode_for_type (). --- gcc/lto/lto.c | 20 + gcc/stor-layout.c | 127 ++ gcc/stor-layout.h | 2 + 3 files changed, 149 insertions(+) diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index 6718fbbe..cec54e3 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -1656,6 +1656,25 @@ unify_scc (struct data_in *data_in, unsigned from, return unified_p; } +static void +lto_infer_mode (tree type) +{ + if (!TYPE_P (type)) +return; + + if (!COMPLETE_TYPE_P (type) && TYPE_MODE (type) == VOIDmode) +return; + + /* C++ FE has complex logic for laying out classes. We don't have + the information here to reproduce the decision process (nor do we + w
Re: LTO remapping/deduction of machine modes of types/decls
On Mon, 2 Jan 2017, Jakub Jelinek wrote: > In my view mode is essential part of the type system. It (sadly, but still) > participates in many ABI decisions, but more importantly especially for > floating point types it is the main source of information of what the type > actually is, as just size and precision are nowhere near enough. > The precision/size isn't able to carry information like whether the type is > decimal or binary floating, what padding it has and where, what NaN etc. > conventions it uses. So trying to throw away modes and reconstruct them > looks conceptually wrong to me. I wonder if it's possible to have a small tag in tree types to distinguish between binary/decimal/fixed-point types. For prototyping, I was thinking about just looking at the type name, because non-ieee-binary types have an easily recognizable prefix. For padding and NaN variability, can you point me on which targets the modes affect that? The "Machine Modes" chapter in the documentation doesn't give a hint (IFmode/KFmode are not documented there either). Alternatively, is reconstructing all modes necessary in the first place? On tree level GCC has explicit trees for all fundamental types like float_type_node. Is it possible to remap just those trees? Modes of composite types should be deducible, and modes of decls may be deducible from their types (not sure; why do decls have modes separately from their types, anyway?) > One can also just use > float __attribute__((mode (XFmode))) or float __attribute__((mode (TFmode))) > or float __attribute__((mode (KFmode))) or IFmode etc., how do you want to > differentiate between those? And I don't see how this can help with the > long double stuff for NVPTX offloading. If user uses 80-bit long double > (or mode(XFmode) floats/doubles) in his source, then as PTX only has SFmode > and DFmode (perhaps also HFmode?), the only way to get it working is through > emulation (whether soft-fp, or writing some emulation using double, > whatever). Pretending long double on the host is DFmode on the PTX side > just won't work, they have different representation. (yes, PTX spec has half floats, but GCC doesn't implement those on PTX today, and thus doesn't have HFmode now) For attribute-mode, I wasn't aware of KFmode/IFmode stuff; wherever the modes affect semantics without leaving any other trace in the type, leaving out the mode loses information. So either one keeps the modes, or adds sufficient tagging in the type tree. For long double, I think offloading to PTX should have the following semantics: size/alignment of long double matches those on host. Otherwise, storage layout of composite types won't match, and that's really bad. But otherwise long double is the same as double on PTX (so for offloading from x86-64 it has 64 bits of padding). This means that long double data is not transferable between accelerator and host, but otherwise gives the most sane semantics I can imagine. I think this implies that XFmode/TFmode don't need to exist on NVPTX to back long_double_type_node. Thanks. Alexander
Re: LTO remapping/deduction of machine modes of types/decls
On Mon, 2 Jan 2017, Jakub Jelinek wrote: > If the host has long double the same as double, sure, PTX can use its native > DFmode even for long double. But otherwise, the storage must be > transferable between accelerator and host. Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB would find only this implementation behavior acceptable? Apart from floating-point types, are there other situations where modes carry information not deducible from the rest of the tree node? Thanks. Alexander
Re: LTO remapping/deduction of machine modes of types/decls
On Mon, 2 Jan 2017, Jakub Jelinek wrote: > On Mon, Jan 02, 2017 at 09:49:55PM +0300, Alexander Monakov wrote: > > On Mon, 2 Jan 2017, Jakub Jelinek wrote: > > > If the host has long double the same as double, sure, PTX can use its > > > native > > > DFmode even for long double. But otherwise, the storage must be > > > transferable between accelerator and host. > > > > Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB > > would find only this implementation behavior acceptable? > > long double is not non-mappable type in the spec, so it is supposed to work. > The implementation may choose not to offload whenever it sees long > double/__float128/_Float128/_Float128x etc. But this is not something the implementation can properly enforce; consider long double v; char buf[sizeof v]; #pragma omp target map(from:buf) sscanf ("1.0", "%Lf", buf); memcpy(&v, buf, sizeof v); The offloading compiler wouldn't see a 'long double' anywhere, it gets brought in at linking stage. So the implementation would have to tag individual translation units and see only in the end of linking if the offloaded image touches a long double datum anywhere. And as the example shows, it would prevent using printf-like functions. Alexander
Re: TARGET_MACRO_FUSION_PAIR for something besides compare-and-branch ?
On Wed, 28 May 2014, Kyrill Tkachov wrote: > Hi all, > > The documentation for TARGET_MACRO_FUSION_PAIR says that it can be used to > tell the scheduler that two insns should not be scheduled apart. It doesn't > specify what kinds of insns those can be. > > Yet from what I can see in sched-deps.c it can only be used on compares and > conditional branches, as implemented in i386. Please note that it's not only restricted to conditional branches, but also to keeping the instructions together if they were consecutive in the first place (i.e. it does not try to move a compare insn closer to the branch). Doing it that way allowed to solve the issue at hand at that time without a separate scan of the whole RTL instruction stream. > Say I want to specify two other types of instruction that I want to force > together, would it be worth generalising the TARGET_MACRO_FUSION_PAIR usage > to achieve that? I'd say yes, but that would be the least of the problems; the more important question is how to trigger the hook (you probably want to integrate it into the existing scheduler dependencies evaluation loop rather than adding a new loop just to discover macro-fusable pairs). You'll also have to invent something new if you want to move non-consecutive fusable insns together if they are apart. HTH. Alexander
clang 3.4.1 (and 3.3) compilation failed with gcc 4.7.4
and/components/clang/build/i86/tools/clang/tools/clang-check/Release+Asserts/ClangCheck.o ld: fatal: symbol referencing errors. No output written to /export/home/alp/srcs/tests/oi-userland/components/clang/build/i86/Release+Asserts/bin/clang-check Undefined first referenced symbol in file Can someone help me to debug this issue? -- Best regards, Alexander Pyhalov, system administrator of Computer Center of Southern Federal University
Question about sysroot and fixincludes
Hi, I have a question about sysroot and fixincludes. On Android there are different API levels (like android-9, android-10 etc) that match different versions of OS. Gcc from NDK is configured using sysroot for android-9 and the convenient way for compiling for, say, android-19 was by providing the sysroot to android-19 as a command line option (--sysroot). However, the header from the sysroot with which gcc was configured could be "fixincluded", and, when I provide a different sysroot as a command line option, "fixincluded" header could replace the actual header from the specified sysroot - that is the root-cause of certain problems. Should search in 'include-fixed' be disabled when sysroot command line option is specified? --Alexander
locale support on illumos (Solaris)
Hello. Is there any way to make std::locale work on illumos? I see the following bug report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15992 and it is still actual for gcc 4.8.3. (Don't know why it's marked "INVALID"). -- System Administrator of Southern Federal University Computer Center
Trying to fix #61880, what characters are valid in assembler/symbol names
Hi everyone, I am trying to fix #61880 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61880 but will need some guidance as I am a complete newbie. The problem is concerns gccgo and the way the binaries it generates link with the rest of the objects. I have given a really tiny test case in the bug, but the relevant part is this: We try to use a function written in Go from C. Thus we have something written in Go that is compiled, something written in C that is compiled and them gluing them together with the layer provided with the cgo tool. Symbol names from the different parts need to be the same in order for the linker to match them. The names do not match. I am not sure which side of the naming the problem stands - i.e. if A!=B, should we change A, B or both to match. The symbol/assembler name (I do not know what is the right word) that is generated in the .h header file is: extern int Cgoexp_Dummy (int p0) __asm__("cgo_problem_example_com_demo.Cgoexp_Dummy"); -^ The symbol that exists in the object file is: cgo_problem_example.com_demo.Cgoexp_Dummy ---^ Thus we can either fix __asm__("cgo_problem_example_com_demo.Cgoexp_Dummy") to be __asm__("cgo_problem_example_com.demo.Cgoexp_Dummy") or fix the object name to be cgo_problem_example_com_demo.Cgoexp_Dummy This probably depends on which characters are valid in symbol names. If having two or more '.' is OK then I could patch the cgo tool that generates the __asm__ part If dots have special meaning or there is a custom mangling that has to be considered - I would need to find the part of gcc that issues the object names (which I still cannot identify, any guides welcome) Kind regards: al_shopov
Re: ASAN test failures make compare_tests useless
Not sure I understand what the problem is. Responded inline. On Mon, Aug 18, 2014 at 9:43 AM, Yury Gribov wrote: > On 08/18/2014 09:42 AM, Yury Gribov wrote: >> >> On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote: >>> >>> On the compile farm, ASAN tests seem to fail a lot like: >>> >>> FAIL: c-c++-common/asan/global-overflow-1.c -O0 output pattern >>> test, is ==31166==ERROR: AddressSanitizer failed to allocate >>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >>> 12) >>> ==31166==ReserveShadowMemoryRange failed while trying to map >>> 0xdfff0001000 bytes. Perhaps you're using ulimit -v >>> , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*( Sounds like the tests do not even start up properly. No mmap failures should be reported. >>> The problem is that those addresses and sizes are very random, The output pattern that must be printed has these addresses masked out (note "0x[0-9a-f]+" in your report). No other lines with varying addresses should be printed. >>> so when >>> I compare the test results of a pristine trunk with a patched one, I >>> get: >>> >>> New tests that FAIL: >>> >>> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >>> pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate >>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >>> 12) >>> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >>> pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate >>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >>> 12) >>> [... hundreds of ASAN tests that failed...] >>> >>> Old tests that failed, that have disappeared: (Eeek!) >>> >>> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >>> pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate >>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >>> 12) >>> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >>> pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate >>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >>> 12) >>> [... the same hundreds of tests that already failed before...] >>> >>> The above makes very difficult to identify failures caused by my patch. >>> >>> Can we remove the "==" part of the error? This way compare_tests >>> will ignore the failures. Am I understanding correctly that "==" in the test stdout has some special meaning for compare_tests (whatever they are, I'm not really familiar with GCC testing infrastructure)? If so, this is quite a questionable choice (e.g. Valgrind also prefixes the report lines with "==12345=="), and I don't see the point in removing PIDs/addresses to please this script. >>> Alternatively, I could patch compare_tests to sed out that part before >>> comparing. Would that be acceptable? >>> >>> Cheers, >>> >>> Manuel. >>> >> >> Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and >> addresses could be turned off. >> > > Ok, this time actually added them. > -- Alexander Potapenko Software Engineer Google Moscow
Re: ASAN test failures make compare_tests useless
On Mon, Aug 18, 2014 at 9:42 AM, Yury Gribov wrote: > On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote: >> >> On the compile farm, ASAN tests seem to fail a lot like: >> >> FAIL: c-c++-common/asan/global-overflow-1.c -O0 output pattern >> test, is ==31166==ERROR: AddressSanitizer failed to allocate >> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >> 12) >> ==31166==ReserveShadowMemoryRange failed while trying to map >> 0xdfff0001000 bytes. Perhaps you're using ulimit -v >> , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*( >> >> The problem is that those addresses and sizes are very random, so when >> I compare the test results of a pristine trunk with a patched one, I >> get: >> >> New tests that FAIL: >> >> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >> pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate >> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >> 12) >> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >> pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate >> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >> 12) >> [... hundreds of ASAN tests that failed...] >> >> Old tests that failed, that have disappeared: (Eeek!) >> >> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >> pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate >> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >> 12) >> unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output >> pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate >> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: >> 12) >> [... the same hundreds of tests that already failed before...] >> >> The above makes very difficult to identify failures caused by my patch. >> >> Can we remove the "==" part of the error? This way compare_tests >> will ignore the failures. >> >> Alternatively, I could patch compare_tests to sed out that part before >> comparing. Would that be acceptable? >> >> Cheers, >> >> Manuel. >> > > Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and addresses > could be turned off. > Could you please name a reason for that? Doing so complicates the debugging of multi-process applications but doesn't bring any obvious advantages. -- Alexander Potapenko Software Engineer Google Moscow
Re: Trying to fix #61880, what characters are valid in assembler/symbol names
>> __asm__("cgo_problem_example_com_demo.Cgoexp_Dummy"); >> cgo_problem_example.com_demo.Cgoexp_Dummy > Normally the first name looks more right. I suppose that the reason is that the dot character ('.') while allowed is a bit more special than the rest, right? OK I will keep this in mind. > The go tool will be passing a -fgo-pkgpath option to gccgo and a -gccgopkgpath > option to cgo. You can use "go build -x" to see the exact commands being run. Yes, I knew that but many thanks for reminding - it helped me locate code in gcc right away. > First make sure that those options are the same. I checked and they are the same. > They need to do the same thing. They do not do the same thing. The difference stems from these two chunks of code: https://code.google.com/p/go/source/browse/src/cmd/cgo/out.go?name=go1.3.1#985 The definition of the 'clean' function in (p *Package) gccgoSymbolPrefix() string and https://github.com/gcc-mirror/gcc/blob/gcc-4_9_0-release/gcc/go/gofrontend/gogo.cc#L250 The definition of std::string Gogo::pkgpath_for_symbol(const std::string& pkgpath) The second is allowing two more characters to appear as themselves in the symbol names in addition to the a-z, A-Z, 0-9 ranges. These are '.' and '$'. '$' should be a rare case for the name of a directory/file, but '.' is frequent. Additionally this https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Dollar-Signs.html cautions against '$' - so if they are trouble it will be gcc again to fix, not cgo. I am sending a quick one liner patch via the code review: https://codereview.appspot.com/125470043 Sorry for spamming with several change sets, it was only several attempts that I found the sentence "A file may only belong to a single active CL at a time." in http://golang.org/doc/contribute.html#tmp_11 I wanted to remove '$' and '_' from special treatment in an follow up patch. '$' should be replaced by '_' as cgo does and '_' could be removed to make the two functions: cgo's clear and Gogo's pkgpath_for_symbol sintactically more similar for ease of maintenance. But you can review and point it out there. Kind regards: al_shopov
Re: Branch taken rate of Linux kernel compiled with GCC 4.9
On Tue, 13 Jan 2015, Pengfei Yuan wrote: > I use perf with rbf88:k,rff88:k events (Haswell specific) to profile > the taken rate of conditional branches in the kernel. Here are the > results: [...] > > The results are very strange because all the taken rates are greater > than 50%. Why not reverse the basic block reordering heuristics to > make them under 50%? Is there anything wrong with GCC? Your measurement includes the conditional branches at the end of loop bodies. When loops iterate, those branches are taken, and it doesn't make sense to reverse them. HTH Alexander
colors in `-fsanitize=bounds' and UBSAN_OPTIONS
I compile and run my program inside Emacs (M-x compile) with gcc 5.0.0 20150112 using `-fsanitize=bounds'. Although this Emacs buffer is technically a tty, but it is `dumb' and does not handle colors properly. The gcc info concerning `-fsanitize=address' hints that there is a parameter ASAN_OPTIONS and the referenced wiki of address-sanitizer says that this environment variable can be used to disable the colors. Unfortunately and very confusingly, this hint does not help with `-fsanitize=bounds', since it uses UBSAN_OPTIONS instead. I guess the gcc info should be updated to mention UBSAN_OPTIONS. It would be nice to port the terminal color capability detection code from gcc to sanitizer, since gcc itself colorizes the error messages properly: by default the colors are on in `xterm', and off in `dumb'. -- Regards, ASK
Contributing to GCC and question about PR64744
Dear GCC team, I would like to contribute to the project. I have a background in embedded systems programming, but few experience in compiler development. I'd like to try with fixing PR64744. Would some one help me to understand what should be correct compilers behaviour with an example below: __attribute__((naked)) void foo() { char [2] = {0}; }; Now gcc (trunk for aarch64 target) goes to ICE while compiling this code: cc1 -O0 test.c But I think that it should report something like: "local frame unavailable (naked function?)" Thanks in advance -- Alexander
Re: Failure to dlopen libgomp due to static TLS data
There's a pending patch for glibc that addresses this issue among others: https://sourceware.org/ml/libc-alpha/2014-11/msg00469.html ([BZ#17090/17620/17621]: fix DTV race, assert, and DTV_SURPLUS Static TLS limit) Alexander
Re: Android native build of GCC
> Given that info...and in spite of my aforementioned limited knowledge I > went back to take another look at the source and found this in > libfakechroot.c > > /bld/fakechrt/fakechroot-2.16 $ grep -C 4 dlsym src/libfakechroot.c > /* Lazily load function */ > LOCAL fakechroot_wrapperfn_t fakechroot_loadfunc (struct fakechroot_wrapper * > w) > { > char *msg; > if (!(w->nextfunc = dlsym(RTLD_NEXT, w->name))) {; > msg = dlerror(); > fprintf(stderr, "%s: %s: %s\n", PACKAGE, w->name, msg != NULL ? msg : > "unresolved symbol"); > exit(EXIT_FAILURE); > } > > I'm fairly certain I remember reading something about Android and lazy > function loadinghow it doesn't handle it well or does so differently > from standard Linux builds. At any rate, I believe the above code is > responsible for those annoying 'fakechroot: undefined reference to dlopen' > errors, so I'll see if I can fix that. In Android's Bionic libc, the implementation of dlopen() resides in the dynamic loader, and not present in libdl.so. So to obtain the pointer to dlopen the code like above can use dlsym(RTLD_DEFAULT, "dlopen"), but not RTLD_NEXT (the loader precedes the fakeroot library in the lookup chain). The preceding discussion seems to have libc and libdl switched. Normally the implementation of dlopen is found in libdl.so, but not in libc.so. Hope that helps, Alexander
Re: Android native build of GCC
On Sun, 15 Feb 2015, Cyd Haselton wrote: > On Sun, Feb 15, 2015 at 12:41 PM, Cyd Haselton wrote: > > *snip* > > > >> So to obtain the pointer to > >> dlopen the code like above can use dlsym(RTLD_DEFAULT, "dlopen"), but not > >> RTLD_NEXT (the loader precedes the fakeroot library in the lookup chain). > >> > *snip* > > Just a quick update: RTLD_DEFAULT is definitely not the solution here > as it results in a segfault when libfakechroot loads. Perhaps a > different RTLD_FLAG was meant? I think you need to use RTLD_DEFAULT only when resolving "dlopen", and keep RTLD_NEXT for all other symbol names (unless later on you run into other symbols with similar behavior). Something like ... = dlsym(strcmp(name, "dlopen") ? RTLD_NEXT : RTLD_DEFAULT, name); Alexander
Re: Android native build of GCC
On Sun, 15 Feb 2015, Cyd Haselton wrote: > On Sun, Feb 15, 2015 at 11:53 AM, Alexander Monakov > wrote: > >> Given that info...and in spite of my aforementioned limited knowledge I > >> went back to take another look at the source and found this in > >> libfakechroot.c > >> > >> /bld/fakechrt/fakechroot-2.16 $ grep -C 4 dlsym src/libfakechroot.c > >> /* Lazily load function */ > >> LOCAL fakechroot_wrapperfn_t fakechroot_loadfunc (struct > >> fakechroot_wrapper * w) > >> { > >> char *msg; > >> if (!(w->nextfunc = dlsym(RTLD_NEXT, w->name))) {; > >> msg = dlerror(); > >> fprintf(stderr, "%s: %s: %s\n", PACKAGE, w->name, msg != NULL ? > >> msg : "unresolved symbol"); > >> exit(EXIT_FAILURE); > >> } > >> > >> I'm fairly certain I remember reading something about Android and lazy > >> function loadinghow it doesn't handle it well or does so differently > >> from standard Linux builds. At any rate, I believe the above code is > >> responsible for those annoying 'fakechroot: undefined reference to dlopen' > >> errors, so I'll see if I can fix that. > > > > In Android's Bionic libc, the implementation of dlopen() resides in the > > dynamic loader, and not present in libdl.so. > > Yet in Android's NDK documentation, they state that in order to use > dlopen() functionality in native code you must link against libdl and > include dlfcn.h. Why would this be the case if the dlopen() > implementation is not in libdl? > (documentation link: http://www.kandroid.org/ndk/docs/STABLE-APIS.html) That's the standard way of using dlopen, i.e. same as you would do it on Linux with glibc for example. So the link merely says that you can get dlopen the same way as usual. The difference is that Android's libdl only contains stub symbols for dlopen&co, and the real symbols can be looked up in the dynamic loader. That RTLD_NEXT does not work for obtaining a pointer for dlopen, as it works on glibc, is quite unfortunate, and probably a bug in Bionic. Alexander
Broken test gcc.target/i386/sibcall-2.c
Hello, Last year's x86 sibcall improvements added a currently xfailed test: /* { dg-do compile { target ia32 } } */ /* { dg-options "-O2" } */ extern int doo1 (int); extern int doo2 (int); extern void bar (char *); int foo (int a) { char s[256]; bar (s); return (a < 0 ? doo1 : doo2) (a); } /* { dg-final { scan-assembler-not "call\[ \t\]*.%eax" { xfail *-*-* } } } */ It was xfailed by https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00016.html Can you tell me what the test is supposed to test? A tail call is impossible here, because 'bar' might save the address of 's' in a global variable, and therefore 's' must be live when 'doo1' or 'doo2' are invoked. Should we remove or unbreak this test? Thanks. Alexander
Re: Broken test gcc.target/i386/sibcall-2.c
Ah. I realize it's most likely for testing sibcall_[value]_pop_memory peepholes, right? In which case the testcase might look like this: /* { dg-do compile } */ /* { dg-options "-O2" } */ void foo (int a, void (**doo1) (void), void (**doo2) (void)) { char s[16] = {0}; do s[a] = 1; while (a &= a-1); (*(s[8] ? doo1 : doo2)) (); } /* { dg-final { scan-assembler-not "call" } } */ However on the above testcase memory-indirect jump is currently generated only for 64-bit x86. With -mx32 it's impossible, but with -m32 the peephole doesn't match. Is that expected? Can you also tell me why ..._pop call and sibcall instructions are predicated on !TARGET_64BIT? Thanks. Alexander
Re: May 2015 Toolchain Update
Hello, A couple of comments below. On Mon, 18 May 2015, Nick Clifton wrote: > val |= ~0 << loaded;// Generates warning > val |= (unsigned) ~0 << loaded; // Does not warn To reduce verbosity, '~0u' can be used here instead of a cast. > * GCC supports a new option: -fno-plt > > When compiling position independent code this tells the compiler > not to use PLT for external function calls. Instead the address > is loaded from the GOT and then branched to directly. This > leads to more efficient code by eliminating PLT stubs and > exposing GOT load to optimizations. > > Not all architectures support this option, and some other > optimization features, such as lazy binding, may disable it. The last paragraph looks confusing to be on both points. '-fno-plt' is implemented as a transformation during TreeSSA-to-RTL expansion, so it works in a machine-independent manner; it's a no-op only if the target has no way to turn on '-fPIC'. Is that what you meant? Second, lazy binding is not an optimization feature of GCC (it's implemented as part of (e.g. glibc's) dynamic linker), so it's not quite right to say that -fno-plt would be disabled by it. Text I've added to the documentation says: Lazy binding requires PLT: with -fno-plt all external symbols are resolved at load time. Thus, for code compiled with -fno-plt the dynamic linker would not be able to perform lazy binding (even if it was otherwise possible, e.g. -z now -z relro weren't in effect, and profitable, i.e. the library was not already prelinked). Alexander
Re: RFC: Creating a more efficient sincos interface
On Thu, 13 Sep 2018, Wilco Dijkstra wrote: > What do people think? Ideally I'd like to support this in a generic way so > all targets can > benefit, but it's also feasible to enable it on a per-target basis. Also > since not all libraries > will support the new interface, there would have to be a flag or configure > option to switch > the new interface off if not supported (maybe automatically based on the > math.h header). GCC already has __builtin_cexpi for this, so I think you can introduce cexpi implementation in libc, and then adjust expand_builtin_cexpi appropriately. I wonder if it would be possible to add a fallback cexpi implementation to libgcc.a that would be picked by the linker if there's no such symbol in libm? Alexander
libgcov as shared library and other issues
Hello, Here's the promised "libgcov summary"; sorry about the delay. So libgcov has a bit unusual design where: - on the one hand, the library is static-only, has PIC code and may be linked into shared libraries, - almost all gcov symbols have "hidden" visibility so they don't participate in dynamic linking - on the other hand, the __gcov_master symbol deliberately has default visibility, presumably with the intention that a running program has exactly one instance of this symbol, although the exact motivation is unclear to me. This latter point does not reliably work as intended though: there are scenarios where a dynamically linked program will have multiple __gcov_masters anyway: - via repeated dlopen(RTLD_LOCAL) with main executable not linked against libgcov or not exporting libgcov symbols (as in PR 83879) - when shared libraries have version scripts that hide their __gcov_master - when -Bsymbolic is in effect Additionally, indirect call profiling symbols are not hidden either, and that leads to extra complications. Since there are multiple symbols, during dynamic linking they may be partially interposed. PR 84107 demonstrates how this leads to libgcov segfaulting in a fairly simple and legitimate program. Bottom line: static linking code with default-visibility symbols into shared libraries is problematic. So one strategy is to ensure all gcov symbols have hidden visibility. That would isolate gcov instances in each shared library loaded in the program, and each library would have the responsibility to write out its counters when unloaded. Also, __gcov_dump would dump only the counters specific to the current library. I may be missing something here so it might be nice to unearth why exactly __gcov_master is intended to be global. Another strategy is to introduce libgcov.so and have it host either all libgcov symbols or just those that by design are required to exist once in the program. When talking to Richi at the Cauldron I got the impression he'd question if shared libgcov is worth the cost, e.g. would it make any easier for users to mix two libraries, one linked against older libgcov, and another with a newer (something that doesn't work at all now, but would be nice to support if I understood Richard correctly). Alexander
Re: Backporting gcc_qsort
On Mon, 1 Oct 2018, Jeff Law wrote: > To add a bit more context for Cory. > > Generally backports are limited to fixing regressions and serious code > generation bugs. While we do make some exceptions, those are good > general guidelines. > > I don't think the qsort changes warrant an exception. Personally I think in this case there isn't a strong reason to backport, the patch is fairly isolated, so individuals or companies that need it should have no problem backporting it on their own. Previously, Franz Sirl reported back in June they've used the patch to achieve matching output on their Linux-hosted vs Cygwin-hosted cross-compilers based on GCC 8: https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00751.html Alexander
Re: movmem pattern and missed alignment
On Mon, 8 Oct 2018, Michael Matz wrote: > > Ok, but why is that not a bug? The whole point of passing alignment to > > the movmem pattern is to let it generate code that takes advantage of > > the alignment. So we get a missed optimization. > > Only if you somewhere visibly add accesses to *i and *j. Without them you > only have the "accesses" via memcpy, and as Richi says, those don't imply > any alignment requirements. The i and j pointers might validly be char* > pointers in disguise and hence be in fact only 1-aligned. I.e. there's > nothing in your small example program from which GCC can infer that those > two global pointers are in fact 2-aligned. Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an 'int *' value that is not suitably aligned: A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. So in addition to what you said, we should probably say that GCC decides not to exploit this UB in order to allow code to round-trip pointer values via arbitrary pointer types? To put Michael's explanation in different words: This is not obviously a bug, because static pointer type does not imply the dynamic pointed-to type. The caller of 'f1' could look like void call_f1(void) { short ibuf[20] = {0}, jbuf[20] = {0}; i = (void *) ibuf; j = (void *) jbuf; f1(); } and it's valid to memcpy from jbuf to ibuf, memcpy does not "see" the static pointer type, and works as if by dereferencing 'char *' pointers. (although as mentioned above it's more subtly invalid when assigning to i and j). If 'f1' dereferences 'i', GCC may deduce that dynamic type of '*i' is 'int' and therefore 'i' must be suitably aligned. But in absence of dereferences GCC does not make assumptions about dynamic type and alignment. Alexander
Re: movmem pattern and missed alignment
On Tue, 9 Oct 2018, Richard Biener wrote: > >This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK > >for every language on strict-alignment platforms. This was changed only > >because of SSE on x86. > > And because we ended up ignoring all pointer casts. It's not quite obvious what SSE has to do with this - any hint please? (according to my quick check this changed between gcc-4.5 and gcc-4.6) Alexander
Re: movmem pattern and missed alignment
On Tue, 9 Oct 2018, Richard Biener wrote: > > then we cannot set the alignment of i_1 at/after k = *i_1 because doing so > would > affect the alignment test which we'd then optimize away. We'd need to > introduce > a SSA copy to get a new SSA name but that would be optimized away quickly. We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it work to emit it just before the memcpy i_2 = __builtin_assume_aligned(i_1, 4); __builtin_memcpy(j, i_2, 32); in theory? Alexander
Re: avoidance of lea after 5 operations?
On Thu, 11 Oct 2018, Jason A. Donenfeld wrote: > > I realize this is probably a fairly trivial matter, but I am very > curious if somebody knows which heuristic gcc is applying here, and > why exactly. It's not something done by any other compiler I could > find, and it only started happening with gcc 6. It's a change in register allocation, gcc selects eax instead of esi for the shifts. Doesn't appear to be obviously intentional, could be a bug or bad luck. Alexander
[wwwdocs] Typo in description of __builtin_expect_with_probability
Hello, I'd like to report a typo in description of «__builtin_expect_with_probability»: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins The description starts with "The built-in has same semantics as *__builtin_expect_with_probability*", but it seems like *__builtin_expect* should be there. Thanks!
Re: [PATCH 1/1] stackleak: Register the 'stackleak_cleanup' pass before the 'mach' pass
On 30.11.2018 20:12, Kees Cook wrote: > On Fri, Nov 30, 2018 at 9:09 AM Kees Cook wrote: >> >> On Fri, Nov 30, 2018 at 5:20 AM Alexander Popov wrote: >>> >>> Currently the 'stackleak_cleanup' pass deleting a CALL insn is executed >>> after the 'reload' pass. That allows gcc to do some weird optimization in >>> function prologues and epilogues, which are generated later [1]. >>> >>> Let's avoid that by registering the 'stackleak_cleanup' pass before >>> the 'mach' pass, which performs the machine dependent code transformations. >>> It's the moment when the stack frame size is final and function prologues >>> and epilogues are already generated. >>> >>> [1] https://www.openwall.com/lists/kernel-hardening/2018/11/23/2 >>> >>> Reported-by: kbuild test robot >>> Signed-off-by: Alexander Popov >> >> Thanks, applied! > > Eek, no, this is breaking my build badly: > > *** WARNING *** there are active plugins, do not report this as a bug > unless you can reproduce it without enabling any plugins. > Event| Plugins > PLUGIN_START_UNIT| stackleak_plugin > kernel/exit.c: In function ‘release_task’: > kernel/exit.c:228:1: internal compiler error: Segmentation fault > } > > Failing with: > > gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0 I've done debugging of gcc with gdb and now understand my mistake. It turned out that I register the 'stackleak_cleanup' pass deleting CALL insn for that particular moment when the control flow graph is inconsistent. That's what the machine-specific reorg passes do on various architectures: /* We are freeing block_for_insn in the toplev to keep compatibility with old MDEP_REORGS that are not CFG based. Recompute it now. */ compute_bb_for_insn (); So recomputing basic block info for insns before calling delete_insn_and_edges() fixes the issue. But I think it's better to register the 'stackleak_cleanup' pass just one pass earlier -- before the '*free_cfg' pass. I'll double check it for different versions of gcc on all supported architectures and return with a new patch. Best regards, Alexander
Re: [PATCH 1/1] stackleak: Register the 'stackleak_cleanup' pass before the 'mach' pass
On 03.12.2018 21:25, Alexander Popov wrote: > But I think it's better to register the 'stackleak_cleanup' pass just one pass > earlier -- before the '*free_cfg' pass. I'll double check it for different > versions of gcc on all supported architectures and return with a new patch. I've tested this idea for gcc-5,6,7,8 on x86_64, x86_32, and arm64. I'll send the patch soon. Best regards, Alexander