from:"Alexander"

getting spam

2007-02-26 Thread Alexander



Hi.
These days I began to get spam in my mail box.
I found, that my mail address ([EMAIL PROTECTED]) is published on :
http://gcc.gnu.org/ml/gcc/2006-08/msg00227.html
Please, remove it from there, thanks.

How implemented "typeof"

2005-11-28 Thread Alexander

Hello!

   How I can know more about implementation at 'tree' level such
   extension as 'typeof'? I am not want to explore and change sources
   now, maybe someone cam help?

-- 
Best regards,
 Alexander  mailto:[EMAIL PROTECTED]

How can I get access to tree representation

2005-12-11 Thread Alexander

I started exploring code base of cc1plus, and now I have little
question - how I can get access to tree representation of program (I
should do it after gcc/cp/parser.c:cp_parser_translation unit(...), isnt it?)
If I wasnt mistaken, RTL began build only if parser says that syntax
OK?

-- 
Best regards,
 Alexander  mailto:[EMAIL PROTECTED]

Re[2]: How can I get access to tree representation

2005-12-12 Thread Alexander

Hello Gabriel,

Monday, December 12, 2005, 12:47:17 PM, you wrote:

> Alexander <[EMAIL PROTECTED]> writes:

> | I started exploring code base of cc1plus, and now I have little
> | question - how I can get access to tree representation of program (I
> | should do it after gcc/cp/parser.c:cp_parser_translation unit(...), isnt 
> it?)
> | If I wasnt mistaken, RTL began build only if parser says that syntax
> | OK?

> In an ideal world, that is what we would like to have.  But currently,
> things are a bit intertwined -- and historically, it has been worse :-).
> In the current state, the gimplifiers run even if -fsyntax-only.
> Worse, they run even when we know the translation unit contains an error.

> However, you might get a starting point from the codes that call hooks for
> tree dumping.

> -- Gaby

Uhh... sad :-)

When cc1plus make tree dumping can we be sure, that tree correctly
represented program structure, i.e. symbolic information unharmed, and we could 
make
back-transformation - from tree to source-without-comments level?

-- 
Best regards,
 Alexandermailto:[EMAIL PROTECTED]

Re: Fwd: Re: gcc 4.1.1 for mcore

2007-01-08 Thread Alexander Grobman

Hello Nick,

Thanks for your support.

this is the error message I'm getting:
"
...
make[4]: Leaving directory
`/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/gcc'
/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/./gcc/xgcc
-B/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/./gcc/
-B/usr/local/mcore-elf/bin/
-B/usr/local/mcore-elf/lib/ -isystem
/usr/local/mcore-elf/include -isystem
/usr/local/mcore-elf/sys-include -O2  -O2 -g -O2 
-DIN_GCC -DCROSS_COMPILE  -DDONT_HAVE_STDIO
-DDONT_HAVE_SETJMP  -Dinhibit_libc  -W -Wall
-Wwrite-strings -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem
./include  -O3 -DNO_FLOATLIB_FIXUNSDFSI  -g 
-DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc
-I. -I. -I../../gcc-4.1.1/gcc -I../../gcc-4.1.1/gcc/.
-I../../gcc-4.1.1/gcc/../include
-I../../gcc-4.1.1/gcc/../libcpp/include  -DL_floatdisf
-c ../../gcc-4.1.1/gcc/libgcc2.c -o
libgcc/./_floatdisf.o
/tmp/ccvk5vjH.s: Assembler messages:
/tmp/ccvk5vjH.s:38: Error: operand must be absolute in
range 1..32, not 53
make[3]: *** [libgcc/./_floatdisf.o] Error 1
make[3]: Leaving directory
`/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/gcc'
make[2]: *** [stmp-multilib] Error 2
make[2]: Leaving directory
`/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1/gcc'
make[1]: *** [all-gcc] Error 2
make[1]: Leaving directory
`/proj/tec/alpeca_lite/users/alexgr/gcc/objdir1'
make: *** [all] Error 2

attached are the library file compiled with -S option
with the 


   bmaski  r4,53

wrong instruction form.

and my configuration file.

I run on a Linux machine with AMD CPU (x86_64).

It runs 3.1.3-6.10 RedHat KDE release 2.4.21-32.EL


Thanks again
Alex.
--- Nick Clifton <[EMAIL PROTECTED]> wrote:

> Hi Alex,
> 
> > I have some troubles with the MCORE GCC target,
> (see
> > below). Do have any idea what it can be and how to
> > solve the problem?
> 
> Yes - First unless you are constrained to only use
> the 4.1.1 sources 
> please try using the current mainline gcc
> development sources.  It is 
> possible, but unlikely, that the bug has already
> been fixed.
> 
> Assuming that the problem persists, please could you
> put together a 
> small test case to demonstrate it, including details
> of how you 
> configured the toolchain and what sort of host
> machine you are using. 
> If you could send this to me, but, *important*, CC
> the email to the gcc 
> mailing list as well, then I can investigate and try
> to fix the problem.
> 
> >>> The compiler generates illigal form of bmaski
> >>> instruction with operand out of range.
> 
> Interesting.  I have looked at the code locations
> where this instruction 
> is generated and it appears to be correct.  I will
> be intrigued to see 
> how this bug is produced.
> 
> Cheers
>Nick
> 
> 
> 

libgcc2.s
Description: 846145001-libgcc2.s


config.status
Description: 881791177-config.status

Pointer addition/subtraction tree node

2007-03-18 Thread Alexander Lamaison

As part of adding a new pass to GCC I am intercepting addition to and
subtraction from pointers.  These are represented by PLUS_EXPR and
MINUS_EXPR tree nodes.  I need to be able to find out which of the node's
two operands is the actual pointer and which is the integer that has been
added to it.

Subtraction is not a problem.  The real pointer always seems to be the first
operand (makes sense).

If the integer is a constant (e.g. p = p + 4) I can catch it by checking the
type: 
  TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST

But in other cases (e.g. p = p + i) I can't do this by looking at the type
as they both have the same type as the result pointer.  

Is there another way to find out which is which?

Thanks.
Alex

RE: Pointer addition/subtraction tree node

2007-03-19 Thread Alexander Lamaison

Code of the form

int[10] a;
int* p = a; int* q = a;
int i = 3;

p = q + i;

is transformed into

int * D.900;
unsigned int D.899;
unsigned int i.0;

:
i = 3;
p = &a;
q = &a;
i.0 = (unsigned int) i;
D.899 = i.0 * 4;
D.900 = (int *) D.899;
p = D.900 + q;

by the time it reaches the fixupcfg pass.

It has been suggested to me that a solution might be to trace back through the 
tree to find which of the operands is derived from a non-pointer variable.  I 
am new to GCC development.  How might I go about doing this?

Another approach I tried was to detect which of the operands was a compiler 
intermediate (my theory being that this would always be the non-pointer 
operand) by using DECL_ARTIFICIAL (TREE_OPERAND (t, 0)) but this breaks if 
tried on an operand that is not a VAR_DECL.  I don't think my theory is sounds 
but if it is, is there a way to get this to work?

Thanks.
Alex.

> -Original Message-
> From: Andrew Pinski [mailto:[EMAIL PROTECTED]
> Sent: 19 March 2007 00:47
> To: Alexander Lamaison
> Cc: gcc@gcc.gnu.org
> Subject: Re: Pointer addition/subtraction tree node
> 
> On 3/18/07, Alexander Lamaison <[EMAIL PROTECTED]> wrote:
> > As part of adding a new pass to GCC I am intercepting addition to and
> > subtraction from pointers.  These are represented by PLUS_EXPR and
> > MINUS_EXPR tree nodes.  I need to be able to find out which of the
> node's
> > two operands is the actual pointer and which is the integer that has
> been
> > added to it.
> >
> > Is there another way to find out which is which?
> 
> Not right now, I have been working on a new representation of pointer
> arithmetic for the tree level.  The basic implementation is already
> done, see the pointer_plus branch.
> 
> Thanks,
> Andrew Pinski

RE: Pointer addition/subtraction tree node

2007-03-19 Thread Alexander Lamaison

> -Original Message-
> From: Andrew Pinski [mailto:[EMAIL PROTECTED]
> Sent: 19 March 2007 00:47
> To: Alexander Lamaison
> Cc: gcc@gcc.gnu.org
> Subject: Re: Pointer addition/subtraction tree node
> 
> On 3/18/07, Alexander Lamaison <[EMAIL PROTECTED]> wrote:
> > As part of adding a new pass to GCC I am intercepting addition to and
> > subtraction from pointers.  These are represented by PLUS_EXPR and
> > MINUS_EXPR tree nodes.  I need to be able to find out which of the
> node's
> > two operands is the actual pointer and which is the integer that has
> been
> > added to it.
> >
> > Is there another way to find out which is which?
> 
> Not right now, I have been working on a new representation of pointer
> arithmetic for the tree level.  The basic implementation is already
> done, see the pointer_plus branch.
> 
> Thanks,
> Andrew Pinski

Apologies for top-posting before:

Code of the form

int[10] a;
int* p = a; int* q = a;
int i = 3;

p = q + i;

is transformed into

int * D.900;
unsigned int D.899;
unsigned int i.0;

:
i = 3;
p = &a;
q = &a;
i.0 = (unsigned int) i;
D.899 = i.0 * 4;
D.900 = (int *) D.899;
p = D.900 + q;

by the time it reaches the fixupcfg pass.

It has been suggested to me that a solution might be to trace back through the 
tree to find which of the operands is derived from a non-pointer variable.  I 
am new to GCC development.  How might I go about doing this?

Another approach I tried was to detect which of the operands was a compiler 
intermediate (my theory being that this would always be the non-pointer 
operand) by using DECL_ARTIFICIAL (TREE_OPERAND (t, 0)) but this breaks if 
tried on an operand that is not a VAR_DECL.  I don't think my theory is sounds 
but if it is, is there a way to get this to work?

Thanks.
Alex.

Using SSA

2007-03-21 Thread Alexander Lamaison

I am adding a new optimisation pass to GCC and I have found that I probably
need to make use of SSA's definition-finding.  The problem I am having is
that the trees I am working on don't seem to be in SSA form (i.e. not
SSA_NAME nodes).

I have looked endlessly and can't find any documentation on the basics of
getting set up to use SSA. I've tried looking at existing optimisation
passes but I can't find anything that they do in common in the way of
set-up.

The tree_opt_pass for my pass has PROP_ssa set in the properties_required
field.  Is this all I need to do?

Any help is greatly appreciated.
Alex

RE: Using SSA

2007-03-22 Thread Alexander Lamaison

> > The tree_opt_pass for my pass has PROP_ssa set in the
> properties_required
> > field.  Is this all I need to do?
> 
> You need to put your pass after pass_build_ssa.  Setting PROP_ssa does
> not build SSA itself, but it will cause an assertion failure if the
> pass is run while SSA is (not yet) available.
> 
> Paolo

I think (if I'm correctly interpreting the list in passes.c) it is.  It's
right after pass_warn_function_noreturn, just before pass_mudflap_2.  Is
this right?  I don't get any assertion about SSA not being available.

Thanks.
Alex

RE: Using SSA

2007-03-22 Thread Alexander Lamaison

> > I think (if I'm correctly interpreting the list in passes.c) it is.
> It's
> > right after pass_warn_function_noreturn, just before pass_mudflap_2.
> Is
> > this right?  I don't get any assertion about SSA not being available.
> 
> In this case, it is also after pass_del_ssa, which means SSA has
> already
> been destroyed.

Oh, ok. Thanks! I had assumed the mudflap passes would have SSA enabled as
the 'Tree-SSA passes' section of the GCC internal manual listed them:
http://gcc.gnu.org/onlinedocs/gccint/Tree_002dSSA-passes.html#Tree_002dSSA-p
asses. 

Thanks.
Alex.

SoC Project: Propagating array data dependencies from Tree-SSA to RTL

2007-03-23 Thread Alexander Monakov

 Melnik:  
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01518.html


--
Alexander Monakov

Re: SoC Project: Propagating array data dependencies from Tree-SSA to RTL

2007-03-26 Thread Alexander Monakov


On Sun, 25 Mar 2007, Daniel Berlin wrote:


Ayal has not signed up to be a mentor (as of yet). If he doesn't, i'd
be happy to mentor you here, since i wrote part of tree-data-ref.c


  Thanks, I'll be very glad to work with you.

On Mon, 26 Mar 2007, Ayal Zaks wrote:

Sorry, I fear I may have too little time to devote to this; plus, it  
would

be very useful to start with a good understanding of tree-data-ref from
which to start propagating the dep info. Vladimir Yanovsky and I will be
able to help when it comes to what/how to feed the modulo scheduler.


  Thank you for your attention. I hope I will have a chance to ask you for  
help in the frame of GSoC project.


--
Alexander Monakov

libcc_s.so and libc.so curcular dependency on FreeBSD

2007-04-02 Thread Alexander Kabaev

Hi,

I am working on integrating GCC 4.1.x series into FreeBSD src/ tree.
I've been running with the new compiler on FreeBSD 7.0 for quite a while
now, but was hesitant to commit my changes because of a couple of
unsolved issues. I would really appreciate your input on the way to
overcome them.

One of the main goals for the upcoming compiler refresh in FreeBSD was
to start using shared libgcc_s.so.1, something we did not do before.
Ideally I want libgcc_s.so.1 built from FreeBSD src/ tree to be 100%
binary compatible with the library build from stock GCC sources as
checked out from FSF SVN repository. This is where the first problem
lies.

libgcc_s.so.1 depends on libc.so.1 (and libpthread.so) for symbols like
malloc, free, pthread_once, pthread_mutex_lock, etc.

libc in turn depends on libgcc_s.so.1 due to the default way exception
frame info information registration is implemented in FreeBSD.  Each
shared object is expected to issue calls to __register_frame_info and
__deregister_frame_info in its startup/shutdown code in order for
exception handlers to work across shared library boundaries.

This creates a dependency cycle that I need to break. The simplest way
to go appears to follow Linux's lead and eliminate the need for shared
modules to have explicit frame into registration calls at startup and
allow exception handling code to locate necessary info with the help of
dynamic loader using dl_iterate_phdr call. I went ahead and implemented
necessary code in FreeBSD's ld-elf.so.1 and with little changes in gcc
crtstuff.c and unwind-dw2-fde-glibc.c I was able to get things working.
I did not find any breakage yet and everything seems to run smoothly so
far.

I wonder if you could confirm that this is a good resolution for the
circular dependency issue or is there a better way. Since both libc and
libgcc_s will both have symbol version support turned on on them, I
cannot import the new compiler into FreeBSD until this dilemma is
solved in one way or another.

I also noticed that on Linux glibc implements some of the libgcc
symbols, namely _Unwind_Find_FDE and __register_frame_info_bases family
of functions. I wonder why it is done and if I missed something obvious
here.

I wonder if GCC team will be willing to accept gcc part of my changes
into mainstream sources if I submit a patch. What are chances of
something like this to be committed into not only trunk, but also GCC
4.1, 4.2 and 4.3 branches? FreeBSD 7.0 wants to ship with GCC 4.1, but
newer version is likely to be imported into trunk shortly after CVS is
branched for 7.0 release.

All I had to do is to extend checks for GLIBC 2.2.4 in above mentioned
GCC files to also check for suitable __FreeBSD_version__ and they
compile just fine. I also had to add Linux-compatible definition for
struct dl_phdr_info along with dl_iterate_phdr function prototype into
FreeBSD's link.h header file.

-- 
Alexander Kabaev


signature.asc
Description: PGP signature

Re: Questions about trampolines

2005-03-14 Thread Alexander Strange

On Mar 14, 2005, at 8:11 AM, Marc Espie wrote:
In article <[EMAIL PROTECTED]> you write:
Well as I said above, trampolines or an equivalent are currently 
critically
needed by some front ends (and of course by anyone using the (very 
useful IMO)
extension of nested functions in C).
This is your opinion, but I've yet to find an actual piece of code in a
real project that uses that extension.
I use it; it involves writing less code than a userdata pointer would. 
I admit that I've never encountered any other code that takes the 
address of local functions, and I could easily use inlining to stop it 
using trampolines.

Re: GCC 4.0, Fast Math, and Acovea

2005-05-03 Thread Alexander Strange

On May 3, 2005, at 4:54 PM, Diego Novillo wrote:
On Tue, May 03, 2005 at 04:45:55PM -0400, Scott Robert Ladd wrote:

If you have a suggestion for better benchmarks, I'm listening. Is  
your
ray tracer available?


I recently heard of Openbench, a project to create an open
version of the SPEC benchmarks http://www.exactcode.de/oss/openbench/
There's also this benchmark project, although it's nowhere near  
complete yet: http://arsware.org/cms/showpage.php?cid=104

Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Alexander Monakov

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> I've not seen any follow-up to this version.  Should we go ahead and adopt
> this?

Can we please go with 'committed' (lowercase) rather than all-caps COMMITTED?
Spelling this with all-caps seems like a recent thing on gcc-patches, before
everyone used the lowercase version, which makes more sense (no need to shout
about the thing that didn't need any discussion before applying the patch).

Also, while tools like 'git format-patch' will automatically put [PATCH] in
the subject, for '[COMMITTED]' it will be the human typing that out, and
it makes little sense to require people to meticulously type that out in caps.
Especially when the previous practice was opposite.

Thanks.
Alexander

Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Alexander Monakov

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> Upper case is what glibc has, though it appears that it's a rule that is not
> strictly followed.  If we change it, then it becomes another friction point
> between developer groups.  Personally, I'd leave it as is, then turn a blind
> eye to such minor non-conformance.

In that case can we simply say that both 'committed' and 'COMMITTED' are okay,
if we know glibc doesn't follow that rule and anticipate we will not follow
it either?

Thanks.
Alexander

Re: Missed optimization with endian and alignment independent memory access on x64

2020-02-06 Thread Alexander Monakov

On Thu, 6 Feb 2020, Moritz Strübe wrote:
> Why is this so hard optimize? As it's quite a common pattern I'd expect that
> there would be at least some hand-coded special case optimizer. (This isn't
> criticism - I'm honestly curious.) Or is there a reason gcc shouldn't optimize
> this / Why it doesn't matter that this is missed?

The compiler would need to exploit the fact that signed overflow is undefined,
or deduce it cannot happen.
Imagine what happens in a more general case if i is INT_MAX
(so without undefined overflow i+1 would be INT_MIN):

int f(unsigned char *ptr, int i)
{
return ptr[i] | ptr[i+1] << 8;
}

With 64-bit address space this might access two bytes 4GB apart.

But you're right that it's a missed optimization in GCC, so you can file it
to the GCC Bugzilla.

> Is there a way to write such code that gcc optimizes?

Simply write a function that accepts one pointer:

int load_16be(unsigned char *ptr)
{
return ptr[0] << 8 | ptr[1];
}

and use it as load_16be(data+i) or load_16be(&data[i]).

> From a performance point of view: If I actually need two consecutive bytes,
> wouldn't it be better to load them as word and split them at the register
> level?

The question is not entirely clear to me, but usually the answer is that
it depends on the microarchitecture and details of the computations that
need to be done with loaded values. Often you'd need more than one instruction
to "split" the wide load, so it wouldn't be profitable.

Alexander

Re: Branch instructions that depend on target distance

2020-02-24 Thread Alexander Monakov

On Mon, 24 Feb 2020, Andreas Schwab wrote:

> On Feb 24 2020, Petr Tesarik wrote:
> 
> > On Mon, 24 Feb 2020 12:29:40 +0100
> > Andreas Schwab  wrote:
> >
> >> On Feb 24 2020, Petr Tesarik wrote:
> >> 
> >> > This works great ... until there's some inline asm() statement, for
> >> > which gcc cannot keep track of the length attribute, so it is probably
> >> > taken as zero.  
> >> 
> >> GCC computes it by counting the number of asm insns.  You can use
> >> ADJUST_INSN_LENGTH to adjust this as needed.
> >
> > Hmm, that's interesting, but does it work for inline asm() statements?
> 
> Yes, for a suitable definition of work.
> 
> > The argument is essentially a free-form string (with some
> > substitution), and the compiler cannot know how many bytes they occupy.
> 
> That's why ADJUST_INSN_LENGTH can adjust it.

I think Petr might be unaware of the fact that GCC counts the **number of
instructions in an inline asm statement** by counting separators in the
asm string. This may overcount when a separator appears in a string literal
for example, but triggering under-counting is trickier.

Petr, please see https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html
for some more discussion.

Alexander

GSoC topic: precise lifetimes in GIMPLE

2020-03-02 Thread Alexander Monakov

Hi,

following the conversation in PR 90348, I wonder if it would make sense
to suggest the idea presented there as a potential GSoC topic? Like this:

**Enhance GIMPLE IR to represent lifetimes explicitly**  At the moment,
GCC internal representation GIMPLE lacks precise lifetime information
for addressable variables: GIMPLE marks the end of the lifetime by the
so-called "GIMPLE clobber" statement, corresponding to the point where
the variable goes out of scope in the original program. However, the
event of the "birth" of a variable (where it appears in scope) is lost,
making the IR ambiguous and opening the door for invalid optimizations,
as shown in bug #90348. The project would entail inventing a way to
represent "lifetime start" in GIMPLE, adjusting front-ends to emit it,
implementing a verifier to check that optimizations do not move
references outside of the variable's lifetime, and potentially
enhancing optimizations to move lifetime markers, expanding the
lifetime, where necessary.

I know we already have good project ideas, and I suspect this idea may
be too complicated for GSoC, but on the other hand it sounds useful,
and gives an "experimental" topic that may be interesting for students.

What do you think?

Thanks.
Alexander

Re: GSoC topic: precise lifetimes in GIMPLE

2020-03-02 Thread Alexander Monakov

On Mon, 2 Mar 2020, Richard Biener wrote:

> PR90348 is certainly entertaining.  But I guess for a GSoC project
> we need a more elaborate implementation plan.  The above suggesting
> of a "lifetime start" is IMHO a no-go btw.  Instead I think the
> only feasible way is to make all references indirect and thus
> make both "allocation" and "deallocation" points explicit.  Then
> there's a data dependence on the "allocation" statement which
> provides upward safety and the "deallocation" statement would
> need to act as a barrier in some to be determined way.  That is,
> how to make optimizers preserve the lifetime end is still unsolved.

I think a verifier that ensures that all references are dominated by
"lifetime start" and post-dominated by clobbers/lifetime-end would
be a substantial step towards that.

Agreed that data dependence on allocation would automatically ensure
part of that verification, but then the problem with deallocation
remains, as you say.

> IMHO whatever we do should combine with the CLOBBERs we have now,
> not be yet another mechanism.

This seems contradictory with the ideas in your previous paragraph.
I agree though, CLOBBER-as-lifetime-end makes sense and does not
call for a replacement.

Thanks.
Alexander

[PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic

2020-06-04 Thread Alexander Popov

Some time ago Variable Length Arrays (VLA) were removed from the kernel.
The kernel is built with '-Wvla'. Let's exclude alloca() from the
instrumentation logic and make it simpler. The build-time assertion
against alloca() is added instead.

Unfortunately, for that assertion we can't simply check cfun->calls_alloca
during RTL phase. It turned out that gcc before version 7 called
allocate_dynamic_stack_space() from expand_stack_vars() for runtime
alignment of constant-sized stack variables. That caused cfun->calls_alloca
to be set for functions that don't use alloca().

Signed-off-by: Alexander Popov 
---
 scripts/gcc-plugins/stackleak_plugin.c | 51 +++---
 1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
b/scripts/gcc-plugins/stackleak_plugin.c
index cc75eeba0be1..1ecfe50d0bf5 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -9,10 +9,9 @@
  * any of the gcc libraries
  *
  * This gcc plugin is needed for tracking the lowest border of the kernel 
stack.
- * It instruments the kernel code inserting stackleak_track_stack() calls:
- *  - after alloca();
- *  - for the functions with a stack frame size greater than or equal
- * to the "track-min-size" plugin parameter.
+ * It instruments the kernel code inserting stackleak_track_stack() calls
+ * for the functions with a stack frame size greater than or equal to
+ * the "track-min-size" plugin parameter.
  *
  * This plugin is ported from grsecurity/PaX. For more information see:
  *   https://grsecurity.net/
@@ -46,7 +45,7 @@ static struct plugin_info stackleak_plugin_info = {
"disable\t\tdo not activate the plugin\n"
 };
 
-static void stackleak_add_track_stack(gimple_stmt_iterator *gsi, bool after)
+static void stackleak_add_track_stack(gimple_stmt_iterator *gsi)
 {
gimple stmt;
gcall *stackleak_track_stack;
@@ -56,12 +55,7 @@ static void stackleak_add_track_stack(gimple_stmt_iterator 
*gsi, bool after)
/* Insert call to void stackleak_track_stack(void) */
stmt = gimple_build_call(track_function_decl, 0);
stackleak_track_stack = as_a_gcall(stmt);
-   if (after) {
-   gsi_insert_after(gsi, stackleak_track_stack,
-   GSI_CONTINUE_LINKING);
-   } else {
-   gsi_insert_before(gsi, stackleak_track_stack, GSI_SAME_STMT);
-   }
+   gsi_insert_before(gsi, stackleak_track_stack, GSI_SAME_STMT);
 
/* Update the cgraph */
bb = gimple_bb(stackleak_track_stack);
@@ -87,14 +81,13 @@ static bool is_alloca(gimple stmt)
 
 /*
  * Work with the GIMPLE representation of the code. Insert the
- * stackleak_track_stack() call after alloca() and into the beginning
- * of the function if it is not instrumented.
+ * stackleak_track_stack() call into the beginning of the function.
  */
 static unsigned int stackleak_instrument_execute(void)
 {
basic_block bb, entry_bb;
-   bool prologue_instrumented = false, is_leaf = true;
-   gimple_stmt_iterator gsi;
+   bool is_leaf = true;
+   gimple_stmt_iterator gsi = { 0 };
 
/*
 * ENTRY_BLOCK_PTR is a basic block which represents possible entry
@@ -111,27 +104,17 @@ static unsigned int stackleak_instrument_execute(void)
 */
FOR_EACH_BB_FN(bb, cfun) {
for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
-   gimple stmt;
-
-   stmt = gsi_stmt(gsi);
+   gimple stmt = gsi_stmt(gsi);
 
/* Leaf function is a function which makes no calls */
if (is_gimple_call(stmt))
is_leaf = false;
 
-   if (!is_alloca(stmt))
-   continue;
-
-   /* Insert stackleak_track_stack() call after alloca() */
-   stackleak_add_track_stack(&gsi, true);
-   if (bb == entry_bb)
-   prologue_instrumented = true;
+   /* Variable Length Arrays are forbidden in the kernel */
+   gcc_assert(!is_alloca(stmt));
}
}
 
-   if (prologue_instrumented)
-   return 0;
-
/*
 * Special cases to skip the instrumentation.
 *
@@ -168,7 +151,7 @@ static unsigned int stackleak_instrument_execute(void)
bb = single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun));
}
gsi = gsi_after_labels(bb);
-   stackleak_add_track_stack(&gsi, false);
+   stackleak_add_track_stack(&gsi);
 
return 0;
 }
@@ -185,12 +168,20 @@ static bool large_stack_frame(void)
 /*
  * Work with the RTL representation of the code.
  * Remove the unneeded stackleak_track_stack()

[PATCH 0/5] Improvements of the stackleak gcc plugin

2020-06-04 Thread Alexander Popov

In this patch series I collected various improvements of the stackleak
gcc plugin.

The first patch excludes alloca() from the stackleak instrumentation logic
to make it simpler.

The second patch is the main improvement. It eliminates an unwanted
side-effect of kernel code instrumentation. This patch is a deep
reengineering of the idea described on grsecurity blog:
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

The third patch adds 'verbose' plugin parameter for printing additional
info about the kernel code instrumentation.

Two other patches disable unneeded stackleak instrumentation for some
files.

I would like to thank Alexander Monakov  for his
advisory on gcc internals.

This patch series was tested for gcc version 4.8, 5, 6, 7, 8, 9, and 10
on x86_64, i386 and arm64.
That was done using the project 'kernel-build-containers':
  https://github.com/a13xp0p0v/kernel-build-containers


Alexander Popov (5):
  gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic
  gcc-plugins/stackleak: Use asm instrumentation to avoid useless
register saving
  gcc-plugins/stackleak: Add 'verbose' plugin parameter
  gcc-plugins/stackleak: Don't instrument itself
  gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

 arch/arm64/kernel/vdso/Makefile|   3 +-
 include/linux/compiler_attributes.h|  13 ++
 kernel/Makefile|   1 +
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 260 -
 6 files changed, 232 insertions(+), 63 deletions(-)

-- 
2.25.2

[PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-04 Thread Alexander Popov

The kernel code instrumentation in stackleak gcc plugin works in two stages.
At first, stack tracking is added to GIMPLE representation of every function
(except some special cases). And later, when stack frame size info is
available, stack tracking is removed from the RTL representation of the
functions with small stack frame. There is an unwanted side-effect for these
functions: some of them do useless work with caller-saved registers.

As an example of such case, proc_sys_write without instrumentation:
55  push   %rbp
41 b8 01 00 00 00   mov$0x1,%r8d
48 89 e5mov%rsp,%rbp
e8 11 ff ff ff  callq  81284610 
5d  pop%rbp
c3  retq
0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
00 00 00

proc_sys_write with instrumentation:
55  push   %rbp
48 89 e5mov%rsp,%rbp
41 56   push   %r14
41 55   push   %r13
41 54   push   %r12
53  push   %rbx
49 89 f4mov%rsi,%r12
48 89 fbmov%rdi,%rbx
49 89 d5mov%rdx,%r13
49 89 cemov%rcx,%r14
4c 89 f1mov%r14,%rcx
4c 89 eamov%r13,%rdx
4c 89 e6mov%r12,%rsi
48 89 dfmov%rbx,%rdi
41 b8 01 00 00 00   mov$0x1,%r8d
e8 f2 fe ff ff  callq  81298e80 
5b  pop%rbx
41 5c   pop%r12
41 5d   pop%r13
41 5e   pop%r14
5d  pop%rbp
c3  retq
66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
00 00

Let's improve the instrumentation to avoid this:

1. Make stackleak_track_stack() save all register that it works with.
Use no_caller_saved_registers attribute for that function. This attribute
is available for x86_64 and i386 starting from gcc-7.

2. Insert calling stackleak_track_stack() in asm:
  asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer))
Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h.
The input constraint is taken into account during gcc shrink-wrapping
optimization. It is needed to be sure that stackleak_track_stack() call is
inserted after the prologue of the containing function, when the stack
frame is prepared.

This work is a deep reengineering of the idea described on grsecurity blog
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

Signed-off-by: Alexander Popov 
---
 include/linux/compiler_attributes.h|  13 ++
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 206 +
 4 files changed, 196 insertions(+), 41 deletions(-)

diff --git a/include/linux/compiler_attributes.h 
b/include/linux/compiler_attributes.h
index cdf016596659..522d57ae8532 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -41,6 +41,7 @@
 # define __GCC4_has_attribute___nonstring__   0
 # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
 # define __GCC4_has_attribute___fallthrough__ 0
+# define __GCC4_has_attribute___no_caller_saved_registers__ 0
 #endif
 
 /*
@@ -175,6 +176,18 @@
  */
 #define __mode(x)   __attribute__((__mode__(x)))
 
+/*
+ * Optional: only supported since gcc >= 7
+ *
+ *   gcc: 
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-no_005fcaller_005fsaved_005fregisters-function-attribute_002c-x86
+ * clang: 
https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers
+ */
+#if __has_attribute(__no_caller_saved_registers__)
+# define __no_caller_saved_registers   
__attribute__((__no_caller_saved_registers__))
+#else
+# define __no_caller_saved_registers
+#endif
+
 /*
  * Optional: not supported by clang
  *
diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index b193a59fc05b..a8fc9ae1d03d 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -104,19 +104,9 @@ asmlinkage void notrace stackleak_erase(void)
 }
 NOKPROBE_SYMBOL(stackleak_erase);
 
-void __used notrace stackleak_track_stack(void)
+void __used __no_caller_saved_registers notrace stackleak_track_stack(void)
 {
-   /*
-* N.B. stackleak_erase() fills the kernel stack with the poison value,
-* which has the register width. That code assumes that the value
-* of 'lowest_stack' is aligned on the register width boundary.
-*
-* That is true for x86 and x86_64 because of the kernel stack
-* alignment on these platforms (for details, see 'c

[PATCH 4/5] gcc-plugins/stackleak: Don't instrument itself

2020-06-04 Thread Alexander Popov

There is no need to try instrumenting functions in kernel/stackleak.c.
Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
is disabled.

Signed-off-by: Alexander Popov 
---
 kernel/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 4cb4130ced32..d372134ac9ec 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -118,6 +118,7 @@ obj-$(CONFIG_RSEQ) += rseq.o
 
 obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
 
+CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN)
 obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
 KASAN_SANITIZE_stackleak.o := n
 KCOV_INSTRUMENT_stackleak.o := n
-- 
2.25.2

[PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-04 Thread Alexander Popov

Don't try instrumenting functions in arch/arm64/kernel/vdso/vgettimeofday.c.
Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
is disabled.

Signed-off-by: Alexander Popov 
---
 arch/arm64/kernel/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 3862cad2410c..9b84cafbd2da 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -32,7 +32,8 @@ UBSAN_SANITIZE:= n
 OBJECT_FILES_NON_STANDARD  := y
 KCOV_INSTRUMENT:= n
 
-CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables
+CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \
+   $(DISABLE_STACKLEAK_PLUGIN)
 
 ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
-- 
2.25.2

[PATCH 3/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-04 Thread Alexander Popov

Add 'verbose' plugin parameter for stackleak gcc plugin.
It can be used for printing additional info about the kernel code
instrumentation.

For using it add the following to scripts/Makefile.gcc-plugins:
  gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
+= -fplugin-arg-stackleak_plugin-verbose

Signed-off-by: Alexander Popov 
---
 scripts/gcc-plugins/stackleak_plugin.c | 31 +-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
b/scripts/gcc-plugins/stackleak_plugin.c
index 0769c5b9156d..19358712d4ed 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -33,6 +33,8 @@ __visible int plugin_is_GPL_compatible;
 static int track_frame_size = -1;
 static bool build_for_x86 = false;
 static const char track_function[] = "stackleak_track_stack";
+static bool disable = false;
+static bool verbose = false;
 
 /*
  * Mark these global variables (roots) for gcc garbage collector since
@@ -45,6 +47,7 @@ static struct plugin_info stackleak_plugin_info = {
.help = "track-min-size=nn\ttrack stack for functions with a stack 
frame size >= nn bytes\n"
"arch=target_arch\tspecify target build arch\n"
"disable\t\tdo not activate the plugin\n"
+   "verbose\t\tprint info about the instrumentation\n"
 };
 
 static void add_stack_tracking_gcall(gimple_stmt_iterator *gsi)
@@ -98,6 +101,10 @@ static tree get_current_stack_pointer_decl(void)
return var;
}
 
+   if (verbose) {
+   fprintf(stderr, "stackleak: missing current_stack_pointer in 
%s()\n",
+   DECL_NAME_POINTER(current_function_decl));
+   }
return NULL_TREE;
 }
 
@@ -366,6 +373,7 @@ static bool remove_stack_tracking_gasm(void)
  */
 static unsigned int stackleak_cleanup_execute(void)
 {
+   const char *fn = DECL_NAME_POINTER(current_function_decl);
bool removed = false;
 
/*
@@ -376,11 +384,17 @@ static unsigned int stackleak_cleanup_execute(void)
 * For more info see gcc commit 7072df0aae0c59ae437e.
 * Let's leave such functions instrumented.
 */
-   if (cfun->calls_alloca)
+   if (cfun->calls_alloca) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s() old\n", fn);
return 0;
+   }
 
-   if (large_stack_frame())
+   if (large_stack_frame()) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s()\n", fn);
return 0;
+   }
 
if (lookup_attribute_spec(get_identifier("no_caller_saved_registers")))
removed = remove_stack_tracking_gasm();
@@ -506,9 +520,6 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
/* Parse the plugin arguments */
for (i = 0; i < argc; i++) {
-   if (!strcmp(argv[i].key, "disable"))
-   return 0;
-
if (!strcmp(argv[i].key, "track-min-size")) {
if (!argv[i].value) {
error(G_("no value supplied for option 
'-fplugin-arg-%s-%s'"),
@@ -531,6 +542,10 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
if (!strcmp(argv[i].value, "x86"))
build_for_x86 = true;
+   } else if (!strcmp(argv[i].key, "disable")) {
+   disable = true;
+   } else if (!strcmp(argv[i].key, "verbose")) {
+   verbose = true;
} else {
error(G_("unknown option '-fplugin-arg-%s-%s'"),
plugin_name, argv[i].key);
@@ -538,6 +553,12 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
}
}
 
+   if (disable) {
+   if (verbose)
+   fprintf(stderr, "stackleak: disabled for this 
translation unit\n");
+   return 0;
+   }
+
/* Give the information about the plugin */
register_callback(plugin_name, PLUGIN_INFO, NULL,
&stackleak_plugin_info);
-- 
2.25.2

Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-04 Thread Alexander Popov

On 04.06.2020 17:14, Jann Horn wrote:
> On Thu, Jun 4, 2020 at 3:58 PM Will Deacon  wrote:
>> On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote:
>>> Don't try instrumenting functions in arch/arm64/kernel/vdso/vgettimeofday.c.
>>> Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
>>> is disabled.
>>>
>>> Signed-off-by: Alexander Popov 
>>> ---
>>>  arch/arm64/kernel/vdso/Makefile | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/vdso/Makefile 
>>> b/arch/arm64/kernel/vdso/Makefile
>>> index 3862cad2410c..9b84cafbd2da 100644
>>> --- a/arch/arm64/kernel/vdso/Makefile
>>> +++ b/arch/arm64/kernel/vdso/Makefile
>>> @@ -32,7 +32,8 @@ UBSAN_SANITIZE  := n
>>>  OBJECT_FILES_NON_STANDARD:= y
>>>  KCOV_INSTRUMENT  := n
>>>
>>> -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables
>>> +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \
>>> + $(DISABLE_STACKLEAK_PLUGIN)
>>
>> I can pick this one up via arm64, thanks. Are there any other plugins we
>> should be wary of? 

I can't tell exactly. I'm sure Kees has the whole picture.

>> It looks like x86 filters out $(GCC_PLUGINS_CFLAGS)
>> when building the vDSO.

Yes, that's why building x86 vDSO doesn't need $(DISABLE_STACKLEAK_PLUGIN).

> Maybe at some point we should replace exclusions based on
> GCC_PLUGINS_CFLAGS and KASAN_SANITIZE and UBSAN_SANITIZE and
> OBJECT_FILES_NON_STANDARD and so on with something more generic...
> something that says "this file will not be built into the normal
> kernel, it contains code that runs in realmode / userspace / some
> similarly weird context, and none of our instrumentation
> infrastructure is available there"...

Good idea. I would also add 'notrace' to that list.

Best regards,
Alexander

Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-04 Thread Alexander Popov

On 04.06.2020 17:25, Jann Horn wrote:
> On Thu, Jun 4, 2020 at 4:21 PM Alexander Popov  wrote:
>> On 04.06.2020 17:14, Jann Horn wrote:
>>> Maybe at some point we should replace exclusions based on
>>> GCC_PLUGINS_CFLAGS and KASAN_SANITIZE and UBSAN_SANITIZE and
>>> OBJECT_FILES_NON_STANDARD and so on with something more generic...
>>> something that says "this file will not be built into the normal
>>> kernel, it contains code that runs in realmode / userspace / some
>>> similarly weird context, and none of our instrumentation
>>> infrastructure is available there"...
>>
>> Good idea. I would also add 'notrace' to that list.
> 
> Hm? notrace code should definitely still be subject to sanitizer
> instrumentation.

I mean ftrace is sometimes disabled for functions that are executed in those
weird contexts. As well as kcov instrumentation.

It would be nice if that generic mechanism could help with choosing which kernel
code instrumentation technologies should be disabled in the given context.

Best regards,
Alexander

Re: [PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic

2020-06-04 Thread Alexander Popov

On 04.06.2020 17:01, Jann Horn wrote:
> On Thu, Jun 4, 2020 at 3:51 PM Alexander Popov  wrote:
>> Some time ago Variable Length Arrays (VLA) were removed from the kernel.
>> The kernel is built with '-Wvla'. Let's exclude alloca() from the
>> instrumentation logic and make it simpler. The build-time assertion
>> against alloca() is added instead.
> [...]
>> +   /* Variable Length Arrays are forbidden in the 
>> kernel */
>> +   gcc_assert(!is_alloca(stmt));
> 
> There is a patch series from Elena and Kees on the kernel-hardening
> list that deliberately uses __builtin_alloca() in the syscall entry
> path to randomize the stack pointer per-syscall - see
> <https://lore.kernel.org/kernel-hardening/20200406231606.37619-4-keesc...@chromium.org/>.

Thanks, Jann.

At first glance, leaving alloca() handling in stackleak instrumentation logic
would allow to integrate stackleak and this version of random_kstack_offset.

Kees, Elena, did you try random_kstack_offset with upstream stackleak?

It looks to me that without stackleak erasing random_kstack_offset can be
weaker. I mean, if next syscall has a bigger stack randomization gap, the data
on thread stack from the previous syscall is not overwritten and can be used. Am
I right?

Another aspect: CONFIG_STACKLEAK_METRICS can be used for guessing kernel stack
offset, which is bad. It should be disabled if random_kstack_offset is on.

Best regards,
Alexander

Re: [PATCH 0/5] Improvements of the stackleak gcc plugin

2020-06-10 Thread Alexander Popov

On 09.06.2020 22:15, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 04:49:52PM +0300, Alexander Popov wrote:
>> In this patch series I collected various improvements of the stackleak
>> gcc plugin.
> 
> Thanks!
> 
>> Alexander Popov (5):
>>   gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic
>>   gcc-plugins/stackleak: Use asm instrumentation to avoid useless
>> register saving
> 
> These look like they might need tweaks (noted in their separate
> replies).

Thanks for the review, Kees.

>>   gcc-plugins/stackleak: Add 'verbose' plugin parameter
>>   gcc-plugins/stackleak: Don't instrument itself
> 
> If you wanted to reorder the series and move these first, I could take
> these into my tree right away (they're logically separate from the other
> fixes).

Ok, I will put "don't instrument itself" at the beginning of v2.

The patch adding 'verbose' plugin parameter depends on the previous patches, so
I will not move it.

>>   gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
> 
> This seems good -- though I'm curious about 32-bit ARM and the other
> HAVE_GCC_PLUGINS architectures with vDSOs (which appears to be all of
> them except um).

(going to reply in a separate email)

Best regards,
Alexander

Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-10 Thread Alexander Popov

On 10.06.2020 10:30, Will Deacon wrote:
> On Tue, Jun 09, 2020 at 12:09:27PM -0700, Kees Cook wrote:
>> On Thu, Jun 04, 2020 at 02:58:06PM +0100, Will Deacon wrote:
>>> On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote:
>>>> Don't try instrumenting functions in 
>>>> arch/arm64/kernel/vdso/vgettimeofday.c.
>>>> Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
>>>> is disabled.
>>>>
>>>> Signed-off-by: Alexander Popov 
>>>> ---
>>>>  arch/arm64/kernel/vdso/Makefile | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/vdso/Makefile 
>>>> b/arch/arm64/kernel/vdso/Makefile
>>>> index 3862cad2410c..9b84cafbd2da 100644
>>>> --- a/arch/arm64/kernel/vdso/Makefile
>>>> +++ b/arch/arm64/kernel/vdso/Makefile
>>>> @@ -32,7 +32,8 @@ UBSAN_SANITIZE   := n
>>>>  OBJECT_FILES_NON_STANDARD := y
>>>>  KCOV_INSTRUMENT   := n
>>>>  
>>>> -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables
>>>> +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \
>>>> +  $(DISABLE_STACKLEAK_PLUGIN)
>>>
>>> I can pick this one up via arm64, thanks. Are there any other plugins we
>>> should be wary of? It looks like x86 filters out $(GCC_PLUGINS_CFLAGS)
>>> when building the vDSO.
>>
>> I didn't realize/remember that arm64 retained the kernel build flags for
>> vDSO builds. (I'm used to x86 throwing all its flags away for its vDSO.)
>>
>> How does 32-bit ARM do its vDSO?
>>
>> My quick run-through on plugins:
>>
>> arm_ssp_per_task_plugin.c
>>  32-bit ARM only (but likely needs disabling for 32-bit ARM vDSO?)
> 
> On arm64, the 32-bit toolchain is picked up via CC_COMPAT -- does that still
> get the plugins?
> 
>> cyc_complexity_plugin.c
>>  compile-time reporting only
>>
>> latent_entropy_plugin.c
>>  this shouldn't get triggered for the vDSO (no __latent_entropy
>>  nor __init attributes in vDSO), but perhaps explicitly disabling
>>  it would be a sensible thing to do, just for robustness?
>>
>> randomize_layout_plugin.c
>>  this shouldn't get triggered (again, lacking attributes), but
>>  should likely be disabled too.
>>
>> sancov_plugin.c
>>  This should be tracking the KCOV directly (see
>>  scripts/Makefile.kcov), which is already disabled here.
>>
>> structleak_plugin.c
>>  This should be fine in the vDSO, but there's not security
>>  boundary here, so it wouldn't be important to KEEP it enabled.
> 
> Thanks for going through these. In general though, it seems like an
> opt-in strategy would make more sense, as it doesn't make an awful lot
> of sense to me for the plugins to be used to build the vDSO.
> 
> So I would prefer that this patch filters out $(GCC_PLUGINS_CFLAGS).

Ok, I will do that in the v2 of the patch series.

Best regards,
Alexander

Re: [PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic

2020-06-10 Thread Alexander Popov

On 09.06.2020 21:39, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 06:23:38PM +0300, Alexander Popov wrote:
>> On 04.06.2020 17:01, Jann Horn wrote:
>>> On Thu, Jun 4, 2020 at 3:51 PM Alexander Popov  wrote:
>>>> Some time ago Variable Length Arrays (VLA) were removed from the kernel.
>>>> The kernel is built with '-Wvla'. Let's exclude alloca() from the
>>>> instrumentation logic and make it simpler. The build-time assertion
>>>> against alloca() is added instead.
>>> [...]
>>>> +   /* Variable Length Arrays are forbidden in the 
>>>> kernel */
>>>> +   gcc_assert(!is_alloca(stmt));
>>>
>>> There is a patch series from Elena and Kees on the kernel-hardening
>>> list that deliberately uses __builtin_alloca() in the syscall entry
>>> path to randomize the stack pointer per-syscall - see
>>> <https://lore.kernel.org/kernel-hardening/20200406231606.37619-4-keesc...@chromium.org/>.
>>
>> Thanks, Jann.
>>
>> At first glance, leaving alloca() handling in stackleak instrumentation logic
>> would allow to integrate stackleak and this version of random_kstack_offset.
> 
> Right, it seems there would be a need for this coverage to remain,
> otherwise the depth of stack erasure might be incorrect.
> 
> It doesn't seem like the other patches strictly depend on alloca()
> support being removed, though?

Ok, I will leave alloca() support, reorganize the patch series and send v2.

>> Kees, Elena, did you try random_kstack_offset with upstream stackleak?
> 
> I didn't try that combination yet, no. It seemed there would likely
> still be further discussion about the offset series first (though the
> thread has been silent -- I'll rebase and resend it after rc2).

Ok, please add me to CC list.

Best regards,
Alexander

>> It looks to me that without stackleak erasing random_kstack_offset can be
>> weaker. I mean, if next syscall has a bigger stack randomization gap, the 
>> data
>> on thread stack from the previous syscall is not overwritten and can be 
>> used. Am
>> I right?
> 
> That's correct. I think the combination is needed, but I don't think
> they need to be strictly tied together.
> 
>> Another aspect: CONFIG_STACKLEAK_METRICS can be used for guessing kernel 
>> stack
>> offset, which is bad. It should be disabled if random_kstack_offset is on.
> 
> Agreed.

Re: [PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-10 Thread Alexander Popov

On 09.06.2020 21:46, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 04:49:54PM +0300, Alexander Popov wrote:
>> Let's improve the instrumentation to avoid this:
>>
>> 1. Make stackleak_track_stack() save all register that it works with.
>> Use no_caller_saved_registers attribute for that function. This attribute
>> is available for x86_64 and i386 starting from gcc-7.
>>
>> 2. Insert calling stackleak_track_stack() in asm:
>>   asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer))
>> Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h.
>> The input constraint is taken into account during gcc shrink-wrapping
>> optimization. It is needed to be sure that stackleak_track_stack() call is
>> inserted after the prologue of the containing function, when the stack
>> frame is prepared.
> 
> Very cool; nice work!
> 
>> +static void add_stack_tracking(gimple_stmt_iterator *gsi)
>> +{
>> +/*
>> + * The 'no_caller_saved_registers' attribute is used for
>> + * stackleak_track_stack(). If the compiler supports this attribute for
>> + * the target arch, we can add calling stackleak_track_stack() in asm.
>> + * That improves performance: we avoid useless operations with the
>> + * caller-saved registers in the functions from which we will remove
>> + * stackleak_track_stack() call during the stackleak_cleanup pass.
>> + */
>> +if (lookup_attribute_spec(get_identifier("no_caller_saved_registers")))
>> +add_stack_tracking_gasm(gsi);
>> +else
>> +add_stack_tracking_gcall(gsi);
>> +}
> 
> The build_for_x86 flag is only ever used as an assert() test against
> no_caller_saved_registers, but we're able to test for that separately.
> Why does the architecture need to be tested? (i.e. when this flag
> becomes supported o other architectures, why must it still be x86-only?)

The inline asm statement that is used for instrumentation is arch-specific.
Trying to add
  asm volatile("call stackleak_track_stack")
in gcc plugin on aarch64 makes gcc break spectacularly.
I pass the target arch name to the plugin and check it explicitly to avoid that.

Moreover, I'm going to create a gcc enhancement request for supporting
no_caller_saved_registers attribute on aarch64.

Best regards,
Alexander

Re: [PATCH 3/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-10 Thread Alexander Popov

On 09.06.2020 21:47, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 04:49:55PM +0300, Alexander Popov wrote:
>> Add 'verbose' plugin parameter for stackleak gcc plugin.
>> It can be used for printing additional info about the kernel code
>> instrumentation.
>>
>> For using it add the following to scripts/Makefile.gcc-plugins:
>>   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
>> += -fplugin-arg-stackleak_plugin-verbose
>>
>> Signed-off-by: Alexander Popov 
> 
> Acked-by: Kees Cook 

I see that I will change this patch after leaving alloca() support.
I'm going to add debug printing about functions that call alloca().
I have to omit your 'acked-by' for the changed patch, right?

Best regards,
Alexander

Re: [PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-11 Thread Alexander Popov

On 10.06.2020 23:03, Kees Cook wrote:
> On Wed, Jun 10, 2020 at 06:47:14PM +0300, Alexander Popov wrote:
>> On 09.06.2020 21:46, Kees Cook wrote:
>> The inline asm statement that is used for instrumentation is arch-specific.
>> Trying to add
>>   asm volatile("call stackleak_track_stack")
>> in gcc plugin on aarch64 makes gcc break spectacularly.
> 
> Ah! Thank you, that eluded my eyes. :)
> 
>> I pass the target arch name to the plugin and check it explicitly to avoid 
>> that.
>>
>> Moreover, I'm going to create a gcc enhancement request for supporting
>> no_caller_saved_registers attribute on aarch64.
> 
> For arm64 right now it looks like the plugin will just remain
> "inefficient" in these cleanup, as before, yes?

Yes, for arm64 the instrumentation didn't change in this patch series.
I checked the disasm and see the similar issue with useless register saving.

I'm going to add asm instrumentation for arm64 when (I hope) the
no_caller_saved_registers attribute becomes available for that platform.

Best regards,
Alexander

Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-23 Thread Alexander Popov

On 10.06.2020 10:30, Will Deacon wrote:
> On Tue, Jun 09, 2020 at 12:09:27PM -0700, Kees Cook wrote:
>> arm_ssp_per_task_plugin.c
>>  32-bit ARM only (but likely needs disabling for 32-bit ARM vDSO?)

I tested: on 32-bit arm vDSO is built with plugin flags. I will filter them out
in a separate patch in v2 of the series.

> On arm64, the 32-bit toolchain is picked up via CC_COMPAT -- does that still
> get the plugins?
I tested it with this command:
  make  ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi-
CROSS_COMPILE=aarch64-linux-gnu- V=1

I see that COMPAT_VDSO is built without plugin flags. So it's ok.

Best regards,
Alexander

[PATCH v2 1/5] gcc-plugins/stackleak: Don't instrument itself

2020-06-24 Thread Alexander Popov

There is no need to try instrumenting functions in kernel/stackleak.c.
Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
is disabled.

Signed-off-by: Alexander Popov 
Acked-by: Kees Cook 
---
 kernel/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index f3218bc5ec69..155b5380500a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o
 
 obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
 
+CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN)
 obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
 KASAN_SANITIZE_stackleak.o := n
 KCSAN_SANITIZE_stackleak.o := n
-- 
2.25.4

[PATCH v2 0/5] Improvements of the stackleak gcc plugin

2020-06-24 Thread Alexander Popov

This is the v2 of the patch series with various improvements of the
stackleak gcc plugin.

The first three patches disable unneeded gcc plugin instrumentation for
some files.

The fourth patch is the main improvement. It eliminates an unwanted
side-effect of kernel code instrumentation performed by stackleak gcc
plugin. This patch is a deep reengineering of the idea described on
grsecurity blog:
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

The final patch adds 'verbose' stackleak parameter for printing additional
info about the kernel code instrumentation during kernel building.

I would like to thank Alexander Monakov  for his
advisory on gcc internals.

This patch series was tested for gcc version 4.8, 5, 6, 7, 8, 9, and 10
on x86_64, i386 and arm64.
That was done using the project 'kernel-build-containers':
  https://github.com/a13xp0p0v/kernel-build-containers

Changes from v1:
 - rebase onto 5.8.0-rc2;
 - don't exclude alloca() from the instrumentation logic, because it
   will be used in kernel stack offset randomization;
 - reorder patches in the series;
 - don't use gcc plugins for building vgettimeofday.c in arm and
   arm64 vDSO;
 - follow alphabetic order in include/linux/compiler_attributes.h.

Link to v1:
 https://lore.kernel.org/lkml/20200604134957.505389-1-alex.po...@linux.com/


Alexander Popov (5):
  gcc-plugins/stackleak: Don't instrument itself
  ARM: vdso: Don't use gcc plugins for building vgettimeofday.c
  arm64: vdso: Don't use gcc plugins for building vgettimeofday.c
  gcc-plugins/stackleak: Use asm instrumentation to avoid useless
register saving
  gcc-plugins/stackleak: Add 'verbose' plugin parameter

 arch/arm/vdso/Makefile |   2 +-
 arch/arm64/kernel/vdso/Makefile|   2 +-
 include/linux/compiler_attributes.h|  13 ++
 kernel/Makefile|   1 +
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 248 +
 7 files changed, 239 insertions(+), 45 deletions(-)

-- 
2.25.4

[PATCH v2 3/5] arm64: vdso: Don't use gcc plugins for building vgettimeofday.c

2020-06-24 Thread Alexander Popov

Don't use gcc plugins for building arch/arm64/kernel/vdso/vgettimeofday.c
to avoid unneeded instrumentation.

Signed-off-by: Alexander Popov 
---
 arch/arm64/kernel/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 556d424c6f52..0f1ad63b3326 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -29,7 +29,7 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 
--hash-style=sysv \
 ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18
 ccflags-y += -DDISABLE_BRANCH_PROFILING
 
-CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS)
+CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) 
$(GCC_PLUGINS_CFLAGS)
 KBUILD_CFLAGS  += $(DISABLE_LTO)
 KASAN_SANITIZE := n
 UBSAN_SANITIZE := n
-- 
2.25.4

[PATCH v2 2/5] ARM: vdso: Don't use gcc plugins for building vgettimeofday.c

2020-06-24 Thread Alexander Popov

Don't use gcc plugins for building arch/arm/vdso/vgettimeofday.c to
avoid unneeded instrumentation.

Signed-off-by: Alexander Popov 
---
 arch/arm/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index d3c9f03e7e79..a54f70731d9f 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -29,7 +29,7 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
 CFLAGS_REMOVE_vdso.o = -pg
 
 # Force -O2 to avoid libgcc dependencies
-CFLAGS_REMOVE_vgettimeofday.o = -pg -Os
+CFLAGS_REMOVE_vgettimeofday.o = -pg -Os $(GCC_PLUGINS_CFLAGS)
 ifeq ($(c-gettimeofday-y),)
 CFLAGS_vgettimeofday.o = -O2
 else
-- 
2.25.4

[PATCH v2 5/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-24 Thread Alexander Popov

Add 'verbose' plugin parameter for stackleak gcc plugin.
It can be used for printing additional info about the kernel code
instrumentation.

For using it add the following to scripts/Makefile.gcc-plugins:
  gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
+= -fplugin-arg-stackleak_plugin-verbose

Signed-off-by: Alexander Popov 
---
 scripts/gcc-plugins/stackleak_plugin.c | 47 +++---
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
b/scripts/gcc-plugins/stackleak_plugin.c
index a18b0d4af456..48e141e07956 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -34,6 +34,8 @@ __visible int plugin_is_GPL_compatible;
 static int track_frame_size = -1;
 static bool build_for_x86 = false;
 static const char track_function[] = "stackleak_track_stack";
+static bool disable = false;
+static bool verbose = false;
 
 /*
  * Mark these global variables (roots) for gcc garbage collector since
@@ -46,6 +48,7 @@ static struct plugin_info stackleak_plugin_info = {
.help = "track-min-size=nn\ttrack stack for functions with a stack 
frame size >= nn bytes\n"
"arch=target_arch\tspecify target build arch\n"
"disable\t\tdo not activate the plugin\n"
+   "verbose\t\tprint info about the instrumentation\n"
 };
 
 static void add_stack_tracking_gcall(gimple_stmt_iterator *gsi, bool after)
@@ -102,6 +105,10 @@ static tree get_current_stack_pointer_decl(void)
return var;
}
 
+   if (verbose) {
+   fprintf(stderr, "stackleak: missing current_stack_pointer in 
%s()\n",
+   DECL_NAME_POINTER(current_function_decl));
+   }
return NULL_TREE;
 }
 
@@ -195,6 +202,11 @@ static unsigned int stackleak_instrument_execute(void)
if (!is_alloca(stmt))
continue;
 
+   if (verbose) {
+   fprintf(stderr, "stackleak: be careful, 
alloca() in %s()\n",
+   
DECL_NAME_POINTER(current_function_decl));
+   }
+
/* Insert stackleak_track_stack() call after alloca() */
add_stack_tracking(&gsi, true);
if (bb == entry_bb)
@@ -384,13 +396,31 @@ static bool remove_stack_tracking_gasm(void)
  */
 static unsigned int stackleak_cleanup_execute(void)
 {
+   const char *fn = DECL_NAME_POINTER(current_function_decl);
bool removed = false;
 
-   if (cfun->calls_alloca)
+   /*
+* Leave stack tracking in functions that call alloca().
+* Additional case:
+*   gcc before version 7 called allocate_dynamic_stack_space() from
+*   expand_stack_vars() for runtime alignment of constant-sized stack
+*   variables. That caused cfun->calls_alloca to be set for functions
+*   that in fact don't use alloca().
+*   For more info see gcc commit 7072df0aae0c59ae437e.
+*   Let's leave such functions instrumented as well.
+*/
+   if (cfun->calls_alloca) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s(): 
calls_alloca\n", fn);
return 0;
+   }
 
-   if (large_stack_frame())
+   /* Leave stack tracking in functions with large stack frame */
+   if (large_stack_frame()) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s()\n", fn);
return 0;
+   }
 
if (lookup_attribute_spec(get_identifier("no_caller_saved_registers")))
removed = remove_stack_tracking_gasm();
@@ -516,9 +546,6 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
/* Parse the plugin arguments */
for (i = 0; i < argc; i++) {
-   if (!strcmp(argv[i].key, "disable"))
-   return 0;
-
if (!strcmp(argv[i].key, "track-min-size")) {
if (!argv[i].value) {
error(G_("no value supplied for option 
'-fplugin-arg-%s-%s'"),
@@ -541,6 +568,10 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
if (!strcmp(argv[i].value, "x86"))
build_for_x86 = true;
+   } else if (!strcmp(argv[i].key, "disable")) {
+   disable = true;
+   } else if (!strcmp(argv[i].key, "verbose")) {
+   verbose = true;
} else {
error(G_("unknown option '-fplugin-arg-%s-%s'"),

[PATCH v2 4/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-24 Thread Alexander Popov

The kernel code instrumentation in stackleak gcc plugin works in two stages.
At first, stack tracking is added to GIMPLE representation of every function
(except some special cases). And later, when stack frame size info is
available, stack tracking is removed from the RTL representation of the
functions with small stack frame. There is an unwanted side-effect for these
functions: some of them do useless work with caller-saved registers.

As an example of such case, proc_sys_write without() instrumentation:
55  push   %rbp
41 b8 01 00 00 00   mov$0x1,%r8d
48 89 e5mov%rsp,%rbp
e8 11 ff ff ff  callq  81284610 
5d  pop%rbp
c3  retq
0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
00 00 00

proc_sys_write() with instrumentation:
55  push   %rbp
48 89 e5mov%rsp,%rbp
41 56   push   %r14
41 55   push   %r13
41 54   push   %r12
53  push   %rbx
49 89 f4mov%rsi,%r12
48 89 fbmov%rdi,%rbx
49 89 d5mov%rdx,%r13
49 89 cemov%rcx,%r14
4c 89 f1mov%r14,%rcx
4c 89 eamov%r13,%rdx
4c 89 e6mov%r12,%rsi
48 89 dfmov%rbx,%rdi
41 b8 01 00 00 00   mov$0x1,%r8d
e8 f2 fe ff ff  callq  81298e80 
5b  pop%rbx
41 5c   pop%r12
41 5d   pop%r13
41 5e   pop%r14
5d  pop%rbp
c3  retq
66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
00 00

Let's improve the instrumentation to avoid this:

1. Make stackleak_track_stack() save all register that it works with.
Use no_caller_saved_registers attribute for that function. This attribute
is available for x86_64 and i386 starting from gcc-7.

2. Insert calling stackleak_track_stack() in asm:
  asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer))
Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h.
The input constraint is taken into account during gcc shrink-wrapping
optimization. It is needed to be sure that stackleak_track_stack() call is
inserted after the prologue of the containing function, when the stack
frame is prepared.

This work is a deep reengineering of the idea described on grsecurity blog
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

Signed-off-by: Alexander Popov 
Acked-by: Miguel Ojeda 
---
 include/linux/compiler_attributes.h|  13 ++
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 205 +
 4 files changed, 196 insertions(+), 40 deletions(-)

diff --git a/include/linux/compiler_attributes.h 
b/include/linux/compiler_attributes.h
index cdf016596659..551ea8cb70b1 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -37,6 +37,7 @@
 # define __GCC4_has_attribute___copy__0
 # define __GCC4_has_attribute___designated_init__ 0
 # define __GCC4_has_attribute___externally_visible__  1
+# define __GCC4_has_attribute___no_caller_saved_registers__ 0
 # define __GCC4_has_attribute___noclone__ 1
 # define __GCC4_has_attribute___nonstring__   0
 # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
@@ -175,6 +176,18 @@
  */
 #define __mode(x)   __attribute__((__mode__(x)))
 
+/*
+ * Optional: only supported since gcc >= 7
+ *
+ *   gcc: 
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-no_005fcaller_005fsaved_005fregisters-function-attribute_002c-x86
+ * clang: 
https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers
+ */
+#if __has_attribute(__no_caller_saved_registers__)
+# define __no_caller_saved_registers   
__attribute__((__no_caller_saved_registers__))
+#else
+# define __no_caller_saved_registers
+#endif
+
 /*
  * Optional: not supported by clang
  *
diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index b193a59fc05b..a8fc9ae1d03d 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -104,19 +104,9 @@ asmlinkage void notrace stackleak_erase(void)
 }
 NOKPROBE_SYMBOL(stackleak_erase);
 
-void __used notrace stackleak_track_stack(void)
+void __used __no_caller_saved_registers notrace stackleak_track_stack(void)
 {
-   /*
-* N.B. stackleak_erase() fills the kernel stack with the poison value,
-* which has the register width. That code assumes that the value
-* of 'lowest_stack'

Re: [PATCH v2 2/5] ARM: vdso: Don't use gcc plugins for building vgettimeofday.c

2020-06-24 Thread Alexander Popov

On 24.06.2020 15:52, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 03:33:27PM +0300, Alexander Popov wrote:
>> Don't use gcc plugins for building arch/arm/vdso/vgettimeofday.c to
>> avoid unneeded instrumentation.
>>
>> Signed-off-by: Alexander Popov 
> 
> But why is skipping it safe?

Hello Luis,

Kees and Will discussed that in detail in v1 of the series:
https://lore.kernel.org/lkml/20200610073046.GA15939@willie-the-truck/

Best regards,
Alexander

Re: [PATCH v2 5/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-24 Thread Alexander Popov

On 24.06.2020 15:53, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 03:33:30PM +0300, Alexander Popov wrote:
>> Add 'verbose' plugin parameter for stackleak gcc plugin.
>> It can be used for printing additional info about the kernel code
>> instrumentation.
>>
>> For using it add the following to scripts/Makefile.gcc-plugins:
>>   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
>> += -fplugin-arg-stackleak_plugin-verbose
> 
> Would be nice if we instead could pass an argument to make which lets
> us enable this.

This feature is useful only for debugging stackleak gcc plugin.

The cflag that enables it is similar to -fplugin-arg-structleak_plugin-verbose,
which is used for debugging the structleak plugin.

This debugging feature clutters the kernel build output, I don't think that many
people will use it. So IMO creating a separate argument for make is not really
needed.

Thanks!

Best regards,
Alexander

Clarifying attribute-const

2015-09-25 Thread Alexander Monakov

Hello,

I'd like to ask for community input regarding __attribute__((const)) (and
"pure", where applicable).  My main goal is to clarify unclear cases and
improve existing documentation, if possible.

First, a belated follow-up to https://gcc.gnu.org/PR66512 . The bug is asking
why attribute-const appears to have a weaker effect in C++, compared to C.
The answer in that bug is that GCC assumes that attribute-const function can
terminate by throwing an exception.

That doesn't actually seem reasonable.  Consider that C counterpart to
throwing is longjmp; it seems to me that GCC should behave consistently:
either assume that attribute-const may both longjmp and throw (I guess nobody
wants that), or that it may not longjmp nor throw.  Intuitively, if "const"
means "free of side effects so that calls can be moved speculatively or
duplicated", then non-local control flow transfer via throwing should be
disallowed as well.

In any case, it would be nice the intended compiler behavior could be
explicitely stated in the manual.


Second, there is an interesting mismatch between documentation and existing
usage.  Among most prominent users of the attribute there are two glibc
functions: __errno_location(void) and pthread_self(void).  Both return a
pointer to thread-local storage, so the functions are not "const" globally in
a multi-threaded process.  A sufficiently advanced compiler can cause the
following testcase to abort:

#include 
#include 

static void *errno_pointer;

static void *thr(void *unused)
{
  errno_pointer = &errno;
  return 0;
}

int main()
{
  errno_pointer = &errno;
  pthread_t t;
  pthread_create(&t, 0, thr, 0);
  pthread_join(t, 0);
  if (errno_pointer == &errno)
abort();
}

(errno_pointer is static, so the compiler can observe that it does not escape
the translation unit, and all stores in the TU assign the same "const" value)

Does GCC need to be concerned about eventually "miscompiling" such cases?  If
not, can we document an explicit promise that attribute-const may include
pointers-to-TLS?

Thanks.
Alexander

Re: Clarifying attribute-const

2015-10-01 Thread Alexander Monakov

On Fri, 25 Sep 2015, Eric Botcazou wrote:

> > First, a belated follow-up to https://gcc.gnu.org/PR66512 . The bug is
> > asking why attribute-const appears to have a weaker effect in C++, compared
> > to C. The answer in that bug is that GCC assumes that attribute-const
> > function can terminate by throwing an exception.
> 
> FWIW there is an equivalent semantics in Ada: the "const" functions can throw 
> and the language explicitly allows them to be CSEd in this case, etc.

Can you expand on the "etc." a bit, i.e., may the compiler ...

  - move a call to a "const" function above a conditional branch, 
causing a conditional throw to happen unconditionally?

  - move a call to a "const" function below a conditional branch, 
causing an unconditional throw to happen only conditionally?

  - reorder calls to "const" functions  w.r.t. code with side effects, or
other throwing functions?

(all of the above in the context of Ada)

Thanks.
Alexander

GCC-Bridge: A Gimple Compiler targeting the JVM

2016-02-01 Thread Bertram, Alexander

I wanted to share a project we've been working on for sometime within
the context of Renjin,
a new interpreter for the R language running on the JVM.

We basically needed a way to compile C and Fortran code to JVM
classes, and for the last year or two we've been working on tool chain
that's composed of a GCC plugin which dumps Gimple trees out to a JSON
file, and a Java program which reads the JSON and compiles it to Java
classfiles.

I've written a bit more about it today here:
http://www.renjin.org/blog/2016-01-31-introducing-gcc-bridge.html

And you can find the whole project here:
https://github.com/bedatadriven/renjin/tree/master/tools/gcc-bridge

The compiler is part of the Renjin project, but can also be used in a
standalone way to compile arbitrary C/Fortran code to Java classfiles,
though the focus has been on pure scientific code, so we haven't
bothered with some rather obvious things like fopen().

Anyway, using the GCC plugin interface has been terrific, and the
gimple trees have been great to work with!

-Alex

Re: GCC-Bridge: A Gimple Compiler targeting the JVM

2016-02-01 Thread Bertram, Alexander

Thanks Mikhail and Manuel for the reactions!

Mikhail, thanks for the tip on xmalloc, will take a look if that can
help clean up the plugin code.

Manuel,
0) Yes, we hope to make it faster!

1) Initially coding within GCC would have been too intimidating, but I
think i've started to get a feel for the system and porting the
compiler to some sort of C++ module might be a good long term to
strategy to avoid drift as GCC internal evolve.

I'm not sure exactly where it would fit in however- I don't think it
could be described with the machine description language. There is
alot of complexity involved in handling things like addressable local
variables, which have to be allocated as unit length arrays so that we
can pass around a reference to them. Would it be possible to write a
backend that generates code from Gimple and not RTL?

2) I'm hoping we can get within 10-20% slowdown. However, the ultimate
goal is to be include "native" code in Renjin's auto parallelization
feature, which operates more on the level of aSQL query planner than
at the instruction level, which would lead to a net speedup.

That's the working hypothesis at least. I'm planning on doing a round
of benchmarking in the next 1-2 months along with a comparison of the
assembly generated by GCC on one hand for a given source, and the
assembly ultimately generated by the JVM's JIT compiler.

3) Good to know! Is this right the mailing list to ask questions about
some of the internal structure? The GCC Internals manual is very
useful, and the source provides a lot of answers, but sometimes I run
into questions, for example, on how exactly UNORDERED_EXPR is defined,
or how to access the byte offset for COMPONENT_REF expression when
field names vary.

Best,
Alex

On Mon, Feb 1, 2016 at 10:25 PM, Manuel López-Ibáñez
 wrote:
> On 01/02/16 12:34, Bertram, Alexander wrote:
>>
>> I wanted to share a project we've been working on for sometime within
>> the context of Renjin,
>> a new interpreter for the R language running on the JVM.
>>
>> We basically needed a way to compile C and Fortran code to JVM
>> classes, and for the last year or two we've been working on tool chain
>> that's composed of a GCC plugin which dumps Gimple trees out to a JSON
>> file, and a Java program which reads the JSON and compiles it to Java
>> classfiles.
>
>
> This sounds interesting! (R is so slow)
>
> 1) Wouldn't it be better to directly compile to Java bytecode? I think GJC
> was able to do that in the past, but I'm not sure how bit-rotted that code
> is by now. https://gcc.gnu.org/ml/gcc/2000-02/msg00161.html It may be a more
> robust solution in the long run.
>
> 2) The reason that R uses so much C and Fortran code is because when
> compiled, that code is much faster than R. How much is lost by compiling and
> running in JVM?
>
> 3) Gimple is not completely target independent, but I think we would like to
> move towards that direction, so reporting bugs about that with specific
> testcases may be helpful. Of course, a JVM target would make this concern
> irrelevant.
>
>> Anyway, using the GCC plugin interface has been terrific, and the
>> gimple trees have been great to work with!
>
>
> Please, note that the plugin API is mainly driven by its users. If you want
> to see improvements, don't hesitate to propose patches.
>
> Cheers,
>
> Manuel.
>

-- 
Alex Bertram
Technical Director
bedatadriven

Web: http://bedatadriven.com
Email: a...@bedatadriven.com
Tel. Nederlands: +31(0)647205388
Skype: akbertram

Re: nonnull, -Wnonnull, and do/while

2016-02-16 Thread Alexander Monakov

On Tue, 16 Feb 2016, Marek Polacek wrote:
> Well, it's just that "s" has the nonnull attribute so the compiler thinks it
> should never be null in which case comparing it to null should be redundant.
> Doesn't seem like a false positive to me, but maybe someone else feels
> otherwise.

Please look at the posted code again:

static void f(const char *s)
{
  do {
printf("%s\n",s);
s = NULL;
  } while (s != NULL);
}

Since 's' is assigned to, the constraint from 'printf' is no longer useful for
warning at the point of comparison.  It clearly looks like a false positive to
me.

Alexander

Re: who owns stack args?

2016-02-24 Thread Alexander Monakov

On Wed, 24 Feb 2016, DJ Delorie wrote:
> The real question is: are stack arguments call-clobbered or
> call-preserved?  Does the answer depend on the "pure" attribute?

Stack area holding stack arguments should belong to the callee for tail-calls
to work (the callee will trash that area when laying out arguments for the
tail call; thanks to Rich Felker for pointing that out to me).

Thus it cannot depend on attribute-pure.

Alexander

GCC Bugzilla whines broken?

2016-02-29 Thread Alexander Monakov

Hello,

Can anyone quickly confirm whether "whining" feature in the GCC Bugzilla is
supposed to be functioning at the moment?

The lastest thread I could find indicates that it is actually supposed to be
working: https://gcc.gnu.org/ml/gcc/2010-09/msg00569.html .

However I've tried to setup a whine for myself a week ago, and it never
produced the emails.

Actually, I want a different feature than whining: notifications for bugs
matching a certain predicate, e.g. for a specific target; ideally being
automatically Cc'ed to such bugs, with an option to un-cc myself if needed.
I can somewhat emulate that with whine searches restricted to "last N days".
Is anybody doing something like that?

Thanks.
Alexander

Re: out of bounds access in insn-automata.c

2016-03-24 Thread Alexander Monakov

Hi,

On Thu, 24 Mar 2016, Bernd Schmidt wrote:
> On 03/24/2016 11:17 AM, Aldy Hernandez wrote:
> > On 03/23/2016 10:25 AM, Bernd Schmidt wrote:
> > > It looks like this block of code is written by a helper function that is
> > > really intended for other purposes than for maximal_insn_latency. Might
> > > be worth changing to
> > >   int insn_code = dfa_insn_code (as_a  (insn));
> > >   gcc_assert (insn_code <= DFA__ADVANCE_CYCLE);
> >
> > dfa_insn_code_* and friends can return > DFA__ADVANCE_CYCLE so I can't
> > put that assert on the helper function.
> 
> So don't use the helper function? Just emit the block above directly.

Let me chime in :)  The function under scrutiny, maximal_insn_latency, was
added as part of selective scheduling merge; at the same time,
output_default_latencies was factored out of
output_internal_insn_latency_func, and the pair of new functions
output_internal_maximal_insn_latency_func/output_maximal_insn_latency_func
tried to mirror existing pair of
output_internal_insn_latency_func/output_insn_latency_func.

In particular, output_insn_latency_func also invokes
output_internal_insn_code_evaluation (twice, for each argument).  This means
that generated 'insn_latency' can also call 'internal_insn_latency' with
DFA__ADVANCE_CYCLE in arguments.  However, 'internal_insn_latency' then has a
specially emitted 'if' statement that checks if either of the arguments is 
' >= DFA__ADVANCE_CYCLE', and returns 0 in that case.

So ultimately pre-existing code was checking ' > DFA__ADVANCE_CYCLE' first and
' >= DFA_ADVANCE_CYCLE' second (for no good reason as far as I can see), and
when the new '_maximal_' functions were introduced, the second check was not
duplicated in the new copy.

So as long we are not looking for hacking it up further, I'd like to clean up
both functions at the same time.  If calling the 'internal_' variants with
DFA__ADVANCE_CYCLE is rare, extending 'default_insn_latencies' by 1 zero
element corresponding to DFA__ADVANCE_CYCLE is a simple suitable fix. If
either DFA__ADVANCE_CYCLE is not guaranteed to be rare, or extending the table
in that style is undesired, I suggest creating a variant of
'output_internal_insn_code_evaluation' that performs a '>=' rather than '>'
test in the first place, and use it in both output_insn_latency_func and
output_maximal_insn_latency_func.  If acknowledged, I volunteer to regstrap on
x86_64 and submit that in stage1.

Thoughts?

Thanks.
Alexander

[PATCH] clean up insn-automata.c (was: Re: out of bounds access in insn-automata.c)

2016-05-10 Thread Alexander Monakov

On Wed, 30 Mar 2016, Bernd Schmidt wrote:
> On 03/25/2016 04:43 AM, Aldy Hernandez wrote:
> > If Bernd is fine with this, I'm happy to retract my patch and any
> > possible followups.  I'm just interested in having no path causing a
> > possible out of bounds access.  If your patch will do that, I'm cool.
> 
> I'll need to see that patch first to comment :-)

Here's the proposed patch.  I've found that there's only one user of the
current fancy logic in output_internal_insn_code_evaluation: handling of
NULL_RTX and const0_rtx is only useful for 'state_transition' (generated
by output_trans_func), so it's possible to inline the extended handling
there, and handle only plain non-null rtx_insns in
output_internal_insn_code_evaluation.

This change allows to remove extra checks and casting in
output_internal_insn_latency_func, as done by the patch.

As a nice bonus, it constrains prototypes of three automata functions to
accept insn_rtx rather than just rtx.

Bootstrapped and regtested on x86_64, OK?

Thanks.
Alexander

* genattr.c (main): Change 'rtx' to 'rtx_insn *' in prototypes of
'insn_latency', 'maximal_insn_latency', 'min_insn_conflict_delay'.
* genautomata.c (output_internal_insn_code_evaluation): Simplify.
Move handling of non-insn arguments inline into the sole user:
(output_trans_func): ...here.
(output_min_insn_conflict_delay_func): Change 'rtx' to 'rtx_insn *' in
emitted function prototype.
(output_internal_insn_latency_func): Ditto.  Simplify.
(output_internal_maximal_insn_latency_func): Ditto.  Delete
always-unused argument.
(output_insn_latency_func): Ditto.
(output_maximal_insn_latency_func): Ditto.

diff --git a/gcc/genattr.c b/gcc/genattr.c
index 656a9a7..77380e7 100644
--- a/gcc/genattr.c
+++ b/gcc/genattr.c
@@ -240,11 +240,11 @@ main (int argc, const char **argv)
   printf ("/* Insn latency time on data consumed by the 2nd insn.\n");
   printf ("   Use the function if bypass_p returns nonzero for\n");
   printf ("   the 1st insn. */\n");
-  printf ("extern int insn_latency (rtx, rtx);\n\n");
+  printf ("extern int insn_latency (rtx_insn *, rtx_insn *);\n\n");
   printf ("/* Maximal insn latency time possible of all bypasses for this 
insn.\n");
   printf ("   Use the function if bypass_p returns nonzero for\n");
   printf ("   the 1st insn. */\n");
-  printf ("extern int maximal_insn_latency (rtx);\n\n");
+  printf ("extern int maximal_insn_latency (rtx_insn *);\n\n");
   printf ("\n#if AUTOMATON_ALTS\n");
   printf ("/* The following function returns number of alternative\n");
   printf ("   reservations of given insn.  It may be used for better\n");
@@ -290,8 +290,8 @@ main (int argc, const char **argv)
   printf ("state_transition should return negative value for\n");
   printf ("the insn and the state).  Data dependencies between\n");
   printf ("the insns are ignored by the function.  */\n");
-  printf
-   ("extern int min_insn_conflict_delay (state_t, rtx, rtx);\n");
+  printf ("extern int "
+  "min_insn_conflict_delay (state_t, rtx_insn *, rtx_insn *);\n");
   printf ("/* The following function outputs reservations for given\n");
   printf ("   insn as they are described in the corresponding\n");
   printf ("   define_insn_reservation.  */\n");
diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index dcde604..92c8b5c 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -8113,14 +8113,10 @@ output_internal_trans_func (void)
 
 /* Output code
 
-  if (insn != 0)
-{
-  insn_code = dfa_insn_code (insn);
-  if (insn_code > DFA__ADVANCE_CYCLE)
-return code;
-}
-  else
-insn_code = DFA__ADVANCE_CYCLE;
+  gcc_checking_assert (insn != 0);
+  insn_code = dfa_insn_code (insn);
+  if (insn_code >= DFA__ADVANCE_CYCLE)
+return code;
 
   where insn denotes INSN_NAME, insn_code denotes INSN_CODE_NAME, and
   code denotes CODE.  */
@@ -8129,21 +8125,12 @@ output_internal_insn_code_evaluation (const char 
*insn_name,
  const char *insn_code_name,
  int code)
 {
-  fprintf (output_file, "\n  if (%s == 0)\n", insn_name);
-  fprintf (output_file, "%s = %s;\n\n",
-  insn_code_name, ADVANCE_CYCLE_VALUE_NAME);
-  if (collapse_flag)
-{
-  fprintf (output_file, "\n  else if (%s == const0_rtx)\n", insn_name);
-  fprintf (output_file, "%s = %s;\n\

_Bool and trap representations

2016-06-07 Thread Alexander Cherepanov


Hi!

If a variable of type _Bool contains something different from 0 and 1 
its use amounts to UB in gcc and clang. There is a couple of examples in 
[1] ([2] is also interesting).


[1] https://github.com/TrustInSoft/tis-interpreter/issues/39
[2] https://github.com/TrustInSoft/tis-interpreter/issues/100

But my question is about the following example:

--
#include 

int main()
{
  _Bool b;
  *(char *)&b = 123;
  printf("%d\n", *(char *)&b);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

gcc version: gcc (GCC) 7.0.0 20160604 (experimental)

It seems that padding in _Bool is treated as permanently unspecified. Is 
this behavior intentional? What's the theory behind it?


One possible explanations is C11, 6.2.6.2p1, which reads: "The values of 
any padding bits are unspecified." But it's somewhat a stretch to 
conclude from it that the values of padding bits cannot be specified 
even with explicit assignment.


Another possible approach is to refer to Committee Response for Question 
1 in DR 260 which reads: "Values may have any bit-pattern that validly 
represents them and the implementation is free to move between alternate 
representations (for example, it may normalize pointers, floating-point 
representations etc.). [...] the actual bit-pattern may change without 
direct action of the program."


Is similar behavior expected from other types of padding (padding in 
long double, padding bytes/bits in structs/unions) in the future? Maybe 
even normalization of pointers (randomly aligning misaligned pointers)?


--
Alexander Cherepanov

Re: _Bool and trap representations

2016-06-08 Thread Alexander Cherepanov


On 2016-06-08 10:29, Richard Biener wrote:

On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanov
 wrote:

[skip]

But my question is about the following example:

--
#include 

int main()
{
   _Bool b;
   *(char *)&b = 123;
   printf("%d\n", *(char *)&b);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

[skip]

Another explanation is that this is a bug.  It manifests itself at the time
we re-write 'b' into SSA form, disregarding the fact that we access it
via a type that while matching in size does not match in precision.


Oh, that's much more boring outcome:-)


Can you open a bugreport?


Sure, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71452 .

--
Alexander Cherepanov

Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov


On 2016-06-08 17:37, Martin Sebor wrote:

On 06/08/2016 12:36 AM, Alexander Cherepanov wrote:

Hi!

If a variable of type _Bool contains something different from 0 and 1
its use amounts to UB in gcc and clang. There is a couple of examples in
[1] ([2] is also interesting).

[1] https://github.com/TrustInSoft/tis-interpreter/issues/39
[2] https://github.com/TrustInSoft/tis-interpreter/issues/100

But my question is about the following example:

--
#include 

int main()
{
   _Bool b;
   *(char *)&b = 123;
   printf("%d\n", *(char *)&b);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

gcc version: gcc (GCC) 7.0.0 20160604 (experimental)


Similar example with long double:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71522


It seems that padding in _Bool is treated as permanently unspecified. Is
this behavior intentional? What's the theory behind it?

One possible explanations is C11, 6.2.6.2p1, which reads: "The values of
any padding bits are unspecified." But it's somewhat a stretch to
conclude from it that the values of padding bits cannot be specified
even with explicit assignment.

Another possible approach is to refer to Committee Response for Question
1 in DR 260 which reads: "Values may have any bit-pattern that validly
represents them and the implementation is free to move between alternate
representations (for example, it may normalize pointers, floating-point
representations etc.). [...] the actual bit-pattern may change without
direct action of the program."


There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


I would hesitate to call it
consensus but I think it would be fair to say that the opinion
of the vocal majority is that implementations aren't intended
to spontaneously change valid (i.e., determinate) representations
of objects in the absence of an access to the value of the object.


Thanks for the info. IMHO this part of DR 260 has even more serious 
consequences than the part about pointer provenance. It effectively 
prohibits manual byte-by-byte (or any non-atomic) copying of objects for 
types like long double. If an implementation decides to normalize a 
value in a variable during copying it will see an inconsistent 
representation, e.g. a trap representation. It's a sure way to get total 
garbage. I don't know if allowing implementations to normalize values is 
useful but the current language in DR 260 allows too much.


As for valid/determinate representation this is another place where 
distinction between a value and a representation is worth stressing. 
Uninitialized variables are a clear case -- both its value and 
representation are indeterminate. But what if we set some part of 
representation of a variable -- it doesn't yet have a determinate value 
but we want the part that we have set to be preserved. Another 
interesting example is a pointer after free() -- its representation is 
kinda determinate but its value is indeterminate.


--
Alexander Cherepanov

Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov


On 2016-06-13 22:51, Joseph Myers wrote:

On Mon, 13 Jun 2016, Alexander Cherepanov wrote:


Thanks for the info. IMHO this part of DR 260 has even more serious
consequences than the part about pointer provenance. It effectively prohibits
manual byte-by-byte (or any non-atomic) copying of objects for types like long
double. If an implementation decides to normalize a value in a variable during
copying it will see an inconsistent representation, e.g. a trap
representation. It's a sure way to get total garbage. I don't know if allowing


No, that's not the case; even if representations can change during
byte-by-byte copying, such copying of long double values is *still* safe.
All long double values for x86 long double have exactly one valid
representation in the value bits, and if the padding bits change during
copying it doesn't matter; it's only representations that are already trap
representations (unnormals, pseudo-* etc.) that might be interpreted
inconsistently.


The problem is that parts of representations of two different ordinary 
values can form a trap representation.


Suppose x = 1.0 and y = 0.0, i.e. they have the following 
representations (from high bytes to low bytes):


padding  signint & frac
 & exp
   |---| |---| |-|
x: 00 00 00 00 00 00 3f ff 80 00 00 00 00 00 00 00
y: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Suppose that we copy from x to y byte-by-byte starting from high bytes. 
And suppose the normalization kicks in after copying 8 bytes. We have 
already copied the sign and the exponent but haven't yet overwritten the 
'Integer' bit of Significand so we have the following representation:


z: 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00

This is an unnormal and current gcc normalization converts it into 0.0 
throwing the exponent away. Copying the remaining 8 bytes leads to a 
pseudo-denormal:


w: 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00

But this is already a minor detail.

The code to see how gcc normalizes 'z':

--
#include 
#include 

int main()
{
  long double d0, d;

  memcpy(&d0, 
"\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", 
sizeof d0);

  d = d0;

  printf("d = %Lf\n", d);
  for (unsigned char *p = (unsigned char *)&d + sizeof d; p > (unsigned 
char *)&d;)

printf("%02x ", *--p);
  printf("\n");
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
d = 0.00
00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00
--

gcc version: gcc (GCC) 7.0.0 20160613 (experimental)

--
Alexander Cherepanov

Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov


On 2016-06-14 00:13, Joseph Myers wrote:

On Tue, 14 Jun 2016, Alexander Cherepanov wrote:


The problem is that parts of representations of two different ordinary values
can form a trap representation.


Oh, you're talking about normalizing the destination rather than the
source of the copy?


Yes.

I don't see this problem with a current gcc so the problem is 
hypothetical AFAICT.


--
Alexander Cherepanov

Re: _Bool and trap representations

2016-06-17 Thread Alexander Cherepanov


On 2016-06-15 17:15, Martin Sebor wrote:

There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


Notes from discussions during committee meetings are in the meeting
minutes that are posted along with other committee documents on the
public site.   Those that relate to aspects of defect reports are
usually captured in the Committee Discussion and Committee Response
to the DR.  Other than that, committee discussions that take place
on the committee mailing list (such as the recent ones on this topic)
are archived for reference of committee members (unlike C++, the C
archives are not intended to be open to the public).


So it seems the discussion you referred to is not public, that's 
unfortunate.


And to clarify what you wrote about stability of valid representations, 
is padding expected to be stable when it's not specifically set? I.e. is 
the following optimization supposed to be conforming or not?


Source code:

--
#include 

int main(int argc, char **argv)
{
  (void)argv;

  struct { char c; int i; } s = {0, 0};

  printf("%d\n", argc ? ((unsigned char *)&s)[1] : 5);
  printf("%d\n", argc ? ((unsigned char *)&s)[1] : 7);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
5
7
--

gcc version: gcc (GCC) 7.0.0 20160616 (experimental)

Of course, clang does essentially the same but the testcase is a bit 
more involved (I can post it if somebody is interested). OTOH clang is 
more predictable in this area because rules for dealing with undefined 
values in llvm are more-or-less documented -- 
http://llvm.org/docs/LangRef.html#undefined-values .


I don't see gcc treating padding in long double as indeterminate in the 
same way as in structs but clang seems to treat them equally.


--
Alexander Cherepanov

Re: [RFD] Extremely large alignment of read-only strings

2016-07-28 Thread Alexander Monakov

On Wed, 27 Jul 2016, Thorsten Glaser wrote:

First of all, I think option -malign-data=abi (new in GCC 5) addresses your
need: it can be used to reduce the default (excessive) alignment to just the
psABI-dictated value (you can play with this at https://gcc.godbolt.org even if
you can't install GCC-5 locally).

Note that like with other ABI-affecting options you need to consider
implications for linking with code you're not building yourself: if the other
code expects bigger alignment, you'll have a bug.

One comment to your email below.

> After some (well, lots) more debugging, I eventually
> discovered -fdump-translation-unit (which, in the version
> I was using, also worked for C, not just C++), which showed
> me that the alignment was 256 even (only later reduced to
> 32 as that’s the maximum alignment for i386).

Most likely the quoted figures from GCC dumps are in bits, not bytes.

HTH
Alexander

Re: [libgomp] No references to env.c -> no libgomp construction

2016-11-29 Thread Alexander Monakov

Hello,

On Tue, 29 Nov 2016, Sebastian Huber wrote:
> * env.c: Split out ICV definitions into...
> * icv.c: ...here (new file) and...
> * icv-device.c: ...here. New file.
> 
> the env.c contains now only local symbols (at least for target *-rtems*-*):
> 
[...]
> 
> Thus the libgomp constructor is not linked in into executables.

Thanks for the report.  This issue affects only static libgomp.a (and not on
NVPTX where env.c is deliberately empty).

I think the minimal solution here is to #include  from icv.c instead of
compiling it separately (using <> inclusion rather than "" so in case of NVPTX
we pick up the empty config/nvptx/env.c from toplevel icv.c).

A slightly more involved but perhaps a preferable approach is to remove
config/nvptx/env.c, introduce LIBGOMP_OFFLOADED_ONLY macro, and use it to
guard inclusion of env.c from icv.c (which then can use the #include "env.c"
form).

Thanks.
Alexander

Aliasing of arrays

2016-11-30 Thread Alexander Cherepanov


Hi!

Pascal Cuoq communicated to me the following example:

int ar1(int (*p)[3], int (*q)[3])
{
  (*p)[0] = 1;
  (*q)[1] = 2;
  return (*p)[0];
}

gcc of versions 4.9.2 and 7.0.0 20161129 optimize it with -O2 on the 
premise that elements with different indices don't alias:


 :
   0:   c7 47 0c 01 00 00 00movl   $0x1,0xc(%rdi)
   7:   b8 01 00 00 00  mov$0x1,%eax
   c:   c7 46 10 02 00 00 00movl   $0x2,0x10(%rsi)
  13:   c3  retq

That's fine. But then I expect that gcc will also assume that arrays of 
different known lengths don't alias, i.e. that gcc will optimize this 
example:


int ar2(int (*p)[8], int (*q)[7]) {
  (*p)[3] = 1;
  (*q)[3] = 2;
  return (*p)[3];
}

But this is not optimized:

0020 :
  20:   c7 47 0c 01 00 00 00movl   $0x1,0xc(%rdi)
  27:   c7 46 0c 02 00 00 00movl   $0x2,0xc(%rsi)
  2e:   8b 47 0cmov0xc(%rdi),%eax

Is this behavior fully intentional, is the first example optimized too 
aggressively, is an optimization missed in the second example, or is the 
situation more complex?


--
Alexander Cherepanov

Re: How to avoid constant propagation into functions?

2016-12-07 Thread Alexander Monakov

[adding gcc@ for the compiler-testsuite-related discussion, please drop either
gcc@ or gcc-help@ from Cc: as appropriate in replies]

On Wed, 7 Dec 2016, Segher Boessenkool wrote:
> > For example, this might have impact on writing test for GCC:
> > 
> > When I am writing a test with noinline + noclone then my
> > expectation is that no such propagation happens, because
> > otherwise a test might turn trivial...
> 
> The usual ways to prevent that are to add some volatile, or an
> asm("" : "+g"(some_var));   etc.

No, that doesn't sound right.  As far as I can tell from looking that the GCC
testsuite, the prevailing way is actually the noinline+noclone combo, not the
per-argument asms or volatiles.

This behavior is new in gcc-7 due to new IPA-VRP functionality. So -fno-ipa-vrp
gets the old behavior.  I think from the testsuite perspective the situation got
a bit worse due to this, as now in existing testcases stuff can get propagated
where the testcase used noinline+noclone to suppress propagation. This means
that some testcases may get weaker and no longer test what they were supposed
to.  And writing new testcases gets less convenient too.

However, this actually demonstrates how the noinline+noclone was not
future-proof, and in a way backfired now.  Should there be, ideally, a single
'noipa' attribute encompassing noinline, noclone, -fno-ipa-vrp, -fno-ipa-ra and
all future transforms using inter-procedural knowledge?

Alexander

Re: How to avoid constant propagation into functions?

2016-12-07 Thread Alexander Monakov

On Wed, 7 Dec 2016, Richard Biener wrote:
> >Agreed, that's what I've been using in the past for glibc test cases.
> >
> >If that doesn't work, we'll need something else.  Separate compilation
> >of test cases just to thwart compiler optimizations is a significant
> >burden, and will stop working once we have LTO anyway.
> >
> >What about making the function definitions weak?  Would that be more
> >reliable?
> 
> Adding attribute((used)) should do the trick.  It introduces unknown callers 
> and thus without cloning disables IPA.

Hm, depending on the case I think this may be not enough: it thwarts IPA on the
callee side, but still allows the compiler to optimize the caller: for example,
deduce that callee is pure/const (in which case optimizations in the caller may
cause it to be called fewer times than intended or never at all), apply IPA-RA, 
or
(perhaps in future) deduce that the callee always returns non-NULL and optimize
the caller accordingly.  I think attribute-weak works to suppress IPA on the
caller side, but that is not a good solution because it also has an effect on
linking semantics, may be not available on non-ELF platforms, etc.

Alexander

Re: How to avoid constant propagation into functions?

2016-12-09 Thread Alexander Monakov

On Fri, 9 Dec 2016, Richard Biener wrote:
> Right, 'used' thwarts IPA on the callee side only.  noclone and noinline are
> attributes affecting the caller side but we indeed miss attributes for the
> properties you mention above.  I suppose adding a catch-all attribute for
> caller side effects (like we have 'used' for the callee side) would be a good
> idea.

For general uses, i.e. for testcases that ought to be portable across different
compilers, I believe making a call through a volatile pointer already places a
sufficient compiler barrier to prevent both caller- and callee-side analysis.
That is, where you have

  int foo(int);
  foo(arg);

you could transform it to

  int foo(int);
  int (* volatile vpfoo)(int) = foo;
  vpfoo(arg);

While this also has an effect of forcing the call to be indirect, I think
usually that should be acceptable.

But for uses in the gcc testsuite, I believe an attribute is still needed.

Alexander

Re: [RFC] noipa attribute (was Re: How to avoid constant propagation into functions?)

2016-12-15 Thread Alexander Monakov

On Thu, 15 Dec 2016, Jakub Jelinek wrote:
> So here is a proof of concept of an attribute that disables inlining,
> cloning, ICF, IPA VRP, IPA bit CCP, IPA RA, pure/const/throw discovery.
> Does it look reasonable?  Anything still missing?

I'd like to suggest some additions to the extend.texi entry:

> --- gcc/doc/extend.texi.jj2016-12-15 11:26:07.0 +0100
> +++ gcc/doc/extend.texi   2016-12-15 12:19:32.738996533 +0100
> @@ -2955,6 +2955,15 @@ asm ("");
>  (@pxref{Extended Asm}) in the called function, to serve as a special
>  side-effect.
>  
> +@item noipa
> +@cindex @code{noipa} function attribute
> +Disable interprocedural optimizations between the function with this
> +attribute and its callers, as if the body of the function is not available
> +when optimizing callers and the callers are unavailable when optimizing
> +the body.  This attribute implies @code{noinline}, @code{noclone},
> +@code{no_icf} and @code{used} attributes and in the future might
> +imply further newly added attributes.

1. I believe the last sentence should call out that the effect of this attribute
is not reducible to just the existing attributes, because suppression of IPA-RA
and pure/const discovery is not expressible that way, and that is actually
intended.  Can this be added to clarify the intent:

  However, this attribute is not equivalent to a combination of other
  attributes, because its purpose is to suppress existing and future
  optimizations employing interprocedural analysis, including those that
  do not have an attribute suitable for disabling them individually.

(and perhaps remove ' ... and in the future might imply ...' from the quoted
snippet, because the clarification makes it partially redundant)

2. Can we gently suggest to readers of documentation that this was invented for
use in the GCC testsuite, and encourage them to seek proper alternatives, e.g.:

  This attribute is exposed for the purpose of testing the compiler.  In general
  it should be preferable to properly constrain code generation using the
  language facilities: for example, using separate compilation or calling
  through a volatile pointer achieves a similar effect in a portable way
  [ except in case of a sufficiently advanced compiler indistinguishable from an
  adversary ;) ]

Thanks.
Alexander

LTO remapping/deduction of machine modes of types/decls

2016-12-30 Thread Alexander Monakov

Hello, Richard, Jakub, community,

May I join/restart the old discussion about machine mode remapping at LTO
stream-in time.  To recap, when offloading to NVPTX was introduced, there
was a problem due to differences in the set of supported modes (e.g. there
was no 'XFmode' on NVPTX that would correspond to 'long double' tree type
node in GIMPLE LTO streams produced by x86 host compiler).

The current solution in GCC is to additionally stream a 'mode table' and use it
to remap numeric mode identifiers during LTO stream-in in all trees that have
modes.  This is the solution initially outlined by Jakub in the message
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00226.html .  In response to that,
Richard said,

> I think (also communicated that on IRC) we should instead try not streaming
> machine-modes at all but generating them at stream-in time via layout_type
> or layout_decl.

and later in the thread also:

> I'm just looking for a way to make this less of a hack (and the LTO IL
> less target dependent).  Not for GCC 5 for which something like your
> patch is probably ok, but for the future.

Now that we're in the future, I've asked Vlad Ivanishin (Cc'ed) to try and
implement Richard's approach.  The motivation is enhancing LTO for offloaded
code, in particular to expose library code for inlining.  In that scenario,
the current scheme has a problem that WPA can arbitrarily mix LTO sections
coming from libraries (where the modes don't need remapping) and LTO sections
produced by the host compiler.  Thus, mode_table would need to be only
selectively applied during stream-in, based on the origin of the section.  And,
we'd need to ensure that WPA duplicates mode tables across all ltrans units.

In light of that, I felt that trying Richard's approach would be proper.
Actually, I don't know why gimple/tree representation carries machine modes in
the first place; it seems to be redundant information deducible from type
information.

Vlad's current patch is adding mode deduction for types and decls, matches the
deduced mode against the streamed-in mode, and ICEs in case of mismatch. To be
clear, he's checking this for native LTO via lto-bootstrap, but nevertheless
it's a nice way of giving confidence that mode inference works as intended.

This seems to be fine for C, but in C++ we are seeing some hard-to-explain cases
where the deduced BLKmode for 7-byte-sized/4-byte-aligned base-class decl is
mismatching against deduced DImode.  The DImode would be obviously correct for
8-byte-sized decl of the corresponding type, but the base class decl does not
have 1 byte of padding in the tail.  What's worse, the issue is just for the
mode of the decl: the mode of the type is BLKmode, as we'd expect.

Unfortunately, just adjusting the C++ frontend to place BLKmode on the decl too
doesn't lead to immediate success, because decl modes have implications for
debug info generation, and the compiler starts ICE'ing there instead.

So we're hitting under-documented places in the compiler here, and personally
I don't have the confidence to judge how they're intended to work.

Basically for now my questions are:

1. Is there an intended invariant that decl modes should match type modes? It
appears that if there was, the above situation with C++ base objects would be a
violation.

2. Do you think we should continue digging in this direction?

I'm not sure how much it'd help a discussion, but for completeness Vlad's
current patchset is provided as attachments. Patch 1/3 adds mode inference for
types (only), patch 2 just reverts Jakub's additions of mode_table handling,
and finally patch 3 adds mode inference for decls, adds checking against
streamed-in modes, and shows where the attempted adjustments in the C++ frontend
and debug info generation were.  There are a few coding style violations; sorry;
I hope they are not too distracting.

Thanks.
AlexanderFrom 58ad9d4d75cbc057c003c701ff3f0e6b8fa35e39 Mon Sep 17 00:00:00 2001
From: Vladislav Ivanishin 
Date: Tue, 13 Dec 2016 14:58:26 +0300
Subject: [PATCH 1/3] Infer modes from types after LTO streaming-in

* gcc/lto/lto.c: New function lto_infer_mode () which calls ...
* gcc/stor-layout.c: ... the new function set_mode_for_type ().
* gcc/stor-layout.h: Declare set_mode_for_type ().
---
 gcc/lto/lto.c |  20 +
 gcc/stor-layout.c | 127 ++
 gcc/stor-layout.h |   2 +
 3 files changed, 149 insertions(+)

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 6718fbbe..cec54e3 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -1656,6 +1656,25 @@ unify_scc (struct data_in *data_in, unsigned from,
   return unified_p;
 }
 
+static void
+lto_infer_mode (tree type)
+{
+  if (!TYPE_P (type))
+return;
+
+  if (!COMPLETE_TYPE_P (type) && TYPE_MODE (type) == VOIDmode)
+return;
+
+  /* C++ FE has complex logic for laying out classes.  We don't have
+ the information here to reproduce the decision process (nor do we
+ w

Re: LTO remapping/deduction of machine modes of types/decls

2017-01-02 Thread Alexander Monakov

On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> In my view mode is essential part of the type system.  It (sadly, but still)
> participates in many ABI decisions, but more importantly especially for
> floating point types it is the main source of information of what the type
> actually is, as just size and precision are nowhere near enough.
> The precision/size isn't able to carry information like whether the type is
> decimal or binary floating, what padding it has and where, what NaN etc.
> conventions it uses.  So trying to throw away modes and reconstruct them
> looks conceptually wrong to me.

I wonder if it's possible to have a small tag in tree types to distinguish
between binary/decimal/fixed-point types.  For prototyping, I was thinking
about just looking at the type name, because non-ieee-binary types have an
easily recognizable prefix.

For padding and NaN variability, can you point me on which targets the modes
affect that? The "Machine Modes" chapter in the documentation doesn't give a
hint (IFmode/KFmode are not documented there either).

Alternatively, is reconstructing all modes necessary in the first place? On
tree level GCC has explicit trees for all fundamental types like
float_type_node. Is it possible to remap just those trees? Modes of composite
types should be deducible, and modes of decls may be deducible from their types
(not sure; why do decls have modes separately from their types, anyway?)

> One can also just use
> float __attribute__((mode (XFmode))) or float __attribute__((mode (TFmode)))
> or float __attribute__((mode (KFmode))) or IFmode etc., how do you want to
> differentiate between those?  And I don't see how this can help with the
> long double stuff for NVPTX offloading.  If user uses 80-bit long double
> (or mode(XFmode) floats/doubles) in his source, then as PTX only has SFmode
> and DFmode (perhaps also HFmode?), the only way to get it working is through
> emulation (whether soft-fp, or writing some emulation using double,
> whatever).  Pretending long double on the host is DFmode on the PTX side
> just won't work, they have different representation.

(yes, PTX spec has half floats, but GCC doesn't implement those on PTX today,
and thus doesn't have HFmode now)

For attribute-mode, I wasn't aware of KFmode/IFmode stuff; wherever the modes
affect semantics without leaving any other trace in the type, leaving out the
mode loses information. So either one keeps the modes, or adds sufficient
tagging in the type tree.

For long double, I think offloading to PTX should have the following semantics:
size/alignment of long double matches those on host. Otherwise, storage layout
of composite types won't match, and that's really bad. But otherwise long double
is the same as double on PTX (so for offloading from x86-64 it has 64 bits of
padding). This means that long double data is not transferable between
accelerator and host, but otherwise gives the most sane semantics I can imagine.
I think this implies that XFmode/TFmode don't need to exist on NVPTX to back
long_double_type_node.

Thanks.
Alexander

Re: LTO remapping/deduction of machine modes of types/decls

2017-01-02 Thread Alexander Monakov

On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> If the host has long double the same as double, sure, PTX can use its native
> DFmode even for long double.  But otherwise, the storage must be
> transferable between accelerator and host.

Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB
would find only this implementation behavior acceptable?

Apart from floating-point types, are there other situations where modes carry
information not deducible from the rest of the tree node?

Thanks.
Alexander

Re: LTO remapping/deduction of machine modes of types/decls

2017-01-02 Thread Alexander Monakov

On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> On Mon, Jan 02, 2017 at 09:49:55PM +0300, Alexander Monakov wrote:
> > On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> > > If the host has long double the same as double, sure, PTX can use its 
> > > native
> > > DFmode even for long double.  But otherwise, the storage must be
> > > transferable between accelerator and host.
> > 
> > Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB
> > would find only this implementation behavior acceptable?
> 
> long double is not non-mappable type in the spec, so it is supposed to work.
> The implementation may choose not to offload whenever it sees long
> double/__float128/_Float128/_Float128x etc.

But this is not something the implementation can properly enforce; consider

  long double v;
  char buf[sizeof v];
  #pragma omp target map(from:buf)
sscanf ("1.0", "%Lf", buf);
  memcpy(&v, buf, sizeof v);

The offloading compiler wouldn't see a 'long double' anywhere, it gets brought
in at linking stage. So the implementation would have to tag individual
translation units and see only in the end of linking if the offloaded image
touches a long double datum anywhere. And as the example shows, it would prevent
using printf-like functions.

Alexander

Re: TARGET_MACRO_FUSION_PAIR for something besides compare-and-branch ?

2014-05-28 Thread Alexander Monakov

On Wed, 28 May 2014, Kyrill Tkachov wrote:

> Hi all,
> 
> The documentation for TARGET_MACRO_FUSION_PAIR says that it can be used to
> tell the scheduler that two insns should not be scheduled apart. It doesn't
> specify what kinds of insns those can be.
> 
> Yet from what I can see in sched-deps.c it can only be used on compares and
> conditional branches, as implemented in i386.

Please note that it's not only restricted to conditional branches, but also to
keeping the instructions together if they were consecutive in the first place
(i.e. it does not try to move a compare insn closer to the branch).

Doing it that way allowed to solve the issue at hand at that time without a
separate scan of the whole RTL instruction stream.

> Say I want to specify two other types of instruction that I want to force
> together, would it be worth generalising the TARGET_MACRO_FUSION_PAIR usage
> to achieve that?

I'd say yes, but that would be the least of the problems; the more important
question is how to trigger the hook (you probably want to integrate it into
the existing scheduler dependencies evaluation loop rather than adding a new
loop just to discover macro-fusable pairs).  You'll also have to invent
something new if you want to move non-consecutive fusable insns together if
they are apart.

HTH.
Alexander

clang 3.4.1 (and 3.3) compilation failed with gcc 4.7.4

2014-06-27 Thread Alexander Pyhalov

and/components/clang/build/i86/tools/clang/tools/clang-check/Release+Asserts/ClangCheck.o
ld: fatal: symbol referencing errors. No output written to 
/export/home/alp/srcs/tests/oi-userland/components/clang/build/i86/Release+Asserts/bin/clang-check


Undefined   first referenced
 symbol in file

Can someone help me to debug this issue?
--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University

Question about sysroot and fixincludes

2014-07-16 Thread Alexander Ivchenko

Hi, I have a question about sysroot and fixincludes.

On Android there are different API levels (like android-9, android-10
etc) that match different versions of OS. Gcc from NDK is configured
using sysroot for android-9 and the convenient way for compiling for,
say, android-19 was by providing the sysroot to android-19 as a
command line option (--sysroot).

However, the header from the sysroot with which gcc was configured
could be "fixincluded", and, when I provide a different sysroot as a
command line option, "fixincluded" header could replace the actual
header from the specified sysroot - that is the root-cause of certain
problems.

Should search in 'include-fixed' be disabled when sysroot command line
option is specified?

--Alexander

locale support on illumos (Solaris)

2014-08-14 Thread Alexander Pyhalov


Hello.
Is there any way to make std::locale work on illumos?
I see the following bug report 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15992 and it is

still actual for gcc 4.8.3. (Don't know why it's marked "INVALID").

--
System Administrator of Southern Federal University Computer Center

Trying to fix #61880, what characters are valid in assembler/symbol names

2014-08-15 Thread Alexander Shopov

Hi everyone,

I am trying to fix #61880
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61880 but will need some
guidance as I am a complete newbie.

The problem is concerns gccgo and the way the binaries it generates
link with the rest of the objects.

I have given a really tiny test case in the bug, but the relevant part is this:

We try to use a function written in Go from C. Thus we have something
written in Go that is compiled, something written in C that is
compiled and them gluing them together with the layer provided with
the cgo tool.

Symbol names from the different parts need to be the same in order for
the linker to match them. The names do not match.

I am not sure which side of the naming the problem stands - i.e. if
A!=B, should we change A, B or both to match.

The symbol/assembler name (I do not know what is the right word) that
is generated in the .h header file is:
extern int Cgoexp_Dummy (int p0)
__asm__("cgo_problem_example_com_demo.Cgoexp_Dummy");
-^

The symbol that exists in the object file is:
cgo_problem_example.com_demo.Cgoexp_Dummy
---^

Thus we can either fix
__asm__("cgo_problem_example_com_demo.Cgoexp_Dummy") to be
__asm__("cgo_problem_example_com.demo.Cgoexp_Dummy")
or fix the object name to be cgo_problem_example_com_demo.Cgoexp_Dummy

This probably depends on which characters are valid in symbol names.
If having two or more '.' is OK then I could patch the cgo tool that
generates the __asm__ part
If dots have special meaning or there is a custom mangling that has to
be considered - I would need to find the part of gcc that issues the
object names (which I still cannot identify, any guides welcome)

Kind regards:
al_shopov

Re: ASAN test failures make compare_tests useless

2014-08-18 Thread Alexander Potapenko

Not sure I understand what the problem is. Responded inline.

On Mon, Aug 18, 2014 at 9:43 AM, Yury Gribov  wrote:
> On 08/18/2014 09:42 AM, Yury Gribov wrote:
>>
>> On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote:
>>>
>>> On the compile farm, ASAN tests seem to fail a lot like:
>>>
>>> FAIL: c-c++-common/asan/global-overflow-1.c   -O0  output pattern
>>> test, is ==31166==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> ==31166==ReserveShadowMemoryRange failed while trying to map
>>> 0xdfff0001000 bytes. Perhaps you're using ulimit -v
>>> , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*(
Sounds like the tests do not even start up properly. No mmap failures
should be reported.

>>> The problem is that those addresses and sizes are very random,
The output pattern that must be printed has these addresses masked out
(note "0x[0-9a-f]+" in your report).
No other lines with varying addresses should be printed.

>>> so when
>>> I compare the test results of a pristine trunk with a patched one, I
>>> get:
>>>
>>> New tests that FAIL:
>>>
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> [... hundreds of ASAN tests that failed...]
>>>
>>> Old tests that failed, that have disappeared: (Eeek!)
>>>
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> [... the same hundreds of tests that already failed before...]
>>>
>>> The above makes very difficult to identify failures caused by my patch.
>>>
>>> Can we remove the "==" part of the error? This way compare_tests
>>> will ignore the failures.
Am I understanding correctly that "==" in the test stdout has some
special meaning for compare_tests (whatever they are, I'm not really
familiar with GCC testing infrastructure)?
If so, this is quite a questionable choice (e.g. Valgrind also
prefixes the report lines with "==12345=="), and I don't see the point
in removing PIDs/addresses to please this script.


>>> Alternatively, I could patch compare_tests to sed out that part before
>>> comparing. Would that be acceptable?
>>>
>>> Cheers,
>>>
>>> Manuel.
>>>
>>
>> Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and
>> addresses could be turned off.
>>
>
> Ok, this time actually added them.
>



-- 
Alexander Potapenko
Software Engineer
Google Moscow

Re: ASAN test failures make compare_tests useless

2014-08-18 Thread Alexander Potapenko

On Mon, Aug 18, 2014 at 9:42 AM, Yury Gribov  wrote:
> On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote:
>>
>> On the compile farm, ASAN tests seem to fail a lot like:
>>
>> FAIL: c-c++-common/asan/global-overflow-1.c   -O0  output pattern
>> test, is ==31166==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> ==31166==ReserveShadowMemoryRange failed while trying to map
>> 0xdfff0001000 bytes. Perhaps you're using ulimit -v
>> , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*(
>>
>> The problem is that those addresses and sizes are very random, so when
>> I compare the test results of a pristine trunk with a patched one, I
>> get:
>>
>> New tests that FAIL:
>>
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> [... hundreds of ASAN tests that failed...]
>>
>> Old tests that failed, that have disappeared: (Eeek!)
>>
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> [... the same hundreds of tests that already failed before...]
>>
>> The above makes very difficult to identify failures caused by my patch.
>>
>> Can we remove the "==" part of the error? This way compare_tests
>> will ignore the failures.
>>
>> Alternatively, I could patch compare_tests to sed out that part before
>> comparing. Would that be acceptable?
>>
>> Cheers,
>>
>> Manuel.
>>
>
> Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and addresses
> could be turned off.
>

Could you please name a reason for that?
Doing so complicates the debugging of multi-process applications but
doesn't bring any obvious advantages.

-- 
Alexander Potapenko
Software Engineer
Google Moscow

Re: Trying to fix #61880, what characters are valid in assembler/symbol names

2014-08-19 Thread Alexander Shopov

>> __asm__("cgo_problem_example_com_demo.Cgoexp_Dummy");
>> cgo_problem_example.com_demo.Cgoexp_Dummy

> Normally the first name looks more right.
I suppose that the reason is that the dot character ('.') while
allowed is a bit more special than the rest, right?
OK I will keep this in mind.

> The go tool will be passing a -fgo-pkgpath option to gccgo and a -gccgopkgpath
> option to cgo.  You can use "go build -x" to see the exact commands being run.
Yes, I knew that but many thanks for reminding - it helped me locate
code in gcc right away.

>  First make sure that those options are the same.
I checked and they are the same.

>  They need to do the same thing.
They do not do the same thing.

The difference stems from these two chunks of code:

https://code.google.com/p/go/source/browse/src/cmd/cgo/out.go?name=go1.3.1#985
The definition of the 'clean' function in (p *Package)
gccgoSymbolPrefix() string

and

https://github.com/gcc-mirror/gcc/blob/gcc-4_9_0-release/gcc/go/gofrontend/gogo.cc#L250
The definition of std::string Gogo::pkgpath_for_symbol(const
std::string& pkgpath)

The second is allowing two more characters to appear as themselves in
the symbol names in addition to the a-z, A-Z, 0-9 ranges.
These are '.' and '$'.

'$' should be a rare case for the name of a directory/file, but '.' is
frequent. Additionally this
https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Dollar-Signs.html
cautions against '$' - so if they are trouble it will be gcc again to
fix, not cgo.

I am sending a quick one liner patch via the code review:
https://codereview.appspot.com/125470043

Sorry for spamming with several change sets, it was only several
attempts that I found the sentence "A file may only belong to a single
active CL at a time." in http://golang.org/doc/contribute.html#tmp_11

I wanted to remove '$' and '_' from special treatment in an follow up patch.
'$' should be replaced by '_' as cgo does and '_' could be removed to
make the two functions: cgo's clear and Gogo's pkgpath_for_symbol
sintactically more similar for ease of maintenance.
But you can review and point it out there.

Kind regards:
al_shopov

Re: Branch taken rate of Linux kernel compiled with GCC 4.9

2015-01-13 Thread Alexander Monakov

On Tue, 13 Jan 2015, Pengfei Yuan wrote:
> I use perf with rbf88:k,rff88:k events (Haswell specific) to profile
> the taken rate of conditional branches in the kernel. Here are the
> results:
[...]
> 
> The results are very strange because all the taken rates are greater
> than 50%. Why not reverse the basic block reordering heuristics to
> make them under 50%? Is there anything wrong with GCC?

Your measurement includes the conditional branches at the end of loop bodies.
When loops iterate, those branches are taken, and it doesn't make sense to
reverse them.

HTH
Alexander

colors in `-fsanitize=bounds' and UBSAN_OPTIONS

2015-01-19 Thread Alexander Klimov

I compile and run my program inside Emacs (M-x compile) with 
gcc 5.0.0 20150112 using `-fsanitize=bounds'. Although 
this Emacs buffer is technically a tty, but it is `dumb' and does not 
handle colors properly.

The gcc info concerning `-fsanitize=address' hints that there is a 
parameter ASAN_OPTIONS and the referenced wiki of address-sanitizer 
says that this environment variable can be used to disable the colors. 
Unfortunately and very confusingly, this hint does not help with 
`-fsanitize=bounds', since it uses UBSAN_OPTIONS instead.

I guess the gcc info should be updated to mention UBSAN_OPTIONS.

It would be nice to port the terminal color capability detection code 
from gcc to sanitizer, since gcc itself colorizes the error messages 
properly: by default the colors are on in `xterm', and off in `dumb'.

-- 
Regards,
ASK

Contributing to GCC and question about PR64744

2015-01-23 Thread Alexander Basov

Dear GCC team,

I would like to contribute to the project.

I have a background in embedded systems programming, but few
experience in compiler development.

I'd like to try with fixing PR64744.
Would some one help me to understand what should be correct compilers
behaviour with an example below:

__attribute__((naked))
void foo()
{
  char [2] = {0};
};

Now gcc (trunk for aarch64 target) goes to ICE while compiling this code:
cc1 -O0 test.c

But I think that it should report something like: "local frame
unavailable (naked function?)"


Thanks in advance

-- 
Alexander

Re: Failure to dlopen libgomp due to static TLS data

2015-02-12 Thread Alexander Monakov

There's a pending patch for glibc that addresses this issue among others:
https://sourceware.org/ml/libc-alpha/2014-11/msg00469.html

([BZ#17090/17620/17621]: fix DTV race, assert, and DTV_SURPLUS Static TLS
limit)

Alexander

Re: Android native build of GCC

2015-02-15 Thread Alexander Monakov

> Given that info...and in spite of my aforementioned limited knowledge  I
> went back to take another look at the source and found this in
> libfakechroot.c
> 
> /bld/fakechrt/fakechroot-2.16 $ grep -C 4 dlsym src/libfakechroot.c
> /* Lazily load function */
> LOCAL fakechroot_wrapperfn_t fakechroot_loadfunc (struct fakechroot_wrapper * 
> w)
> {
> char *msg;
> if (!(w->nextfunc = dlsym(RTLD_NEXT, w->name))) {;
> msg = dlerror();
> fprintf(stderr, "%s: %s: %s\n", PACKAGE, w->name, msg != NULL ? msg : 
> "unresolved symbol");
> exit(EXIT_FAILURE);
> }
> 
> I'm fairly certain I remember reading something about Android and lazy
> function loadinghow it doesn't handle it well or does so differently
> from standard Linux builds.  At any rate,  I believe the above code is
> responsible for those annoying 'fakechroot: undefined reference to dlopen'
> errors, so I'll see if I can fix that.

In Android's Bionic libc, the implementation of dlopen() resides in the
dynamic loader, and not present in libdl.so.  So to obtain the pointer to
dlopen the code like above can use dlsym(RTLD_DEFAULT, "dlopen"), but not
RTLD_NEXT (the loader precedes the fakeroot library in the lookup chain).

The preceding discussion seems to have libc and libdl switched.  Normally the
implementation of dlopen is found in libdl.so, but not in libc.so.

Hope that helps,
Alexander

Re: Android native build of GCC

2015-02-15 Thread Alexander Monakov

On Sun, 15 Feb 2015, Cyd Haselton wrote:

> On Sun, Feb 15, 2015 at 12:41 PM, Cyd Haselton  wrote:
> 
> *snip*
> >
> >> So to obtain the pointer to
> >> dlopen the code like above can use dlsym(RTLD_DEFAULT, "dlopen"), but not
> >> RTLD_NEXT (the loader precedes the fakeroot library in the lookup chain).
> >>
> *snip*
> 
> Just a quick update:  RTLD_DEFAULT is definitely not the solution here
> as it results in a segfault when libfakechroot loads.  Perhaps a
> different RTLD_FLAG was meant?

I think you need to use RTLD_DEFAULT only when resolving "dlopen", and keep
RTLD_NEXT for all other symbol names (unless later on you run into other
symbols with similar behavior).  Something like

... = dlsym(strcmp(name, "dlopen") ? RTLD_NEXT : RTLD_DEFAULT, name);

Alexander

Re: Android native build of GCC

2015-02-15 Thread Alexander Monakov



On Sun, 15 Feb 2015, Cyd Haselton wrote:

> On Sun, Feb 15, 2015 at 11:53 AM, Alexander Monakov  
> wrote:
> >> Given that info...and in spite of my aforementioned limited knowledge  I
> >> went back to take another look at the source and found this in
> >> libfakechroot.c
> >>
> >> /bld/fakechrt/fakechroot-2.16 $ grep -C 4 dlsym src/libfakechroot.c
> >> /* Lazily load function */
> >> LOCAL fakechroot_wrapperfn_t fakechroot_loadfunc (struct 
> >> fakechroot_wrapper * w)
> >> {
> >> char *msg;
> >> if (!(w->nextfunc = dlsym(RTLD_NEXT, w->name))) {;
> >> msg = dlerror();
> >> fprintf(stderr, "%s: %s: %s\n", PACKAGE, w->name, msg != NULL ? 
> >> msg : "unresolved symbol");
> >> exit(EXIT_FAILURE);
> >> }
> >>
> >> I'm fairly certain I remember reading something about Android and lazy
> >> function loadinghow it doesn't handle it well or does so differently
> >> from standard Linux builds.  At any rate,  I believe the above code is
> >> responsible for those annoying 'fakechroot: undefined reference to dlopen'
> >> errors, so I'll see if I can fix that.
> >
> > In Android's Bionic libc, the implementation of dlopen() resides in the
> > dynamic loader, and not present in libdl.so.
> 
> Yet in Android's NDK documentation, they state that in order to use
> dlopen() functionality in native code you must link against libdl and
> include dlfcn.h.  Why would this be the case if the dlopen()
> implementation is not in libdl?
> (documentation link: http://www.kandroid.org/ndk/docs/STABLE-APIS.html)

That's the standard way of using dlopen, i.e. same as you would do it on Linux
with glibc for example.  So the link merely says that you can get dlopen the
same way as usual.

The difference is that Android's libdl only contains stub symbols for
dlopen&co, and the real symbols can be looked up in the dynamic loader.  That
RTLD_NEXT does not work for obtaining a pointer for dlopen, as it works on
glibc, is quite unfortunate, and probably a bug in Bionic.

Alexander

Broken test gcc.target/i386/sibcall-2.c

2015-05-13 Thread Alexander Monakov

Hello,

Last year's x86 sibcall improvements added a currently xfailed test:

  /* { dg-do compile { target ia32 } } */
  /* { dg-options "-O2" } */

  extern int doo1 (int);
  extern int doo2 (int);
  extern void bar (char *);

  int foo (int a)
  {
char s[256];
bar (s);
return (a < 0 ? doo1 : doo2) (a);
  }

  /* { dg-final { scan-assembler-not "call\[ \t\]*.%eax" { xfail *-*-* } } } */

It was xfailed by https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00016.html

Can you tell me what the test is supposed to test?  A tail call is impossible
here, because 'bar' might save the address of 's' in a global variable, and
therefore 's' must be live when 'doo1' or 'doo2' are invoked.

Should we remove or unbreak this test?

Thanks.
Alexander

Re: Broken test gcc.target/i386/sibcall-2.c

2015-05-13 Thread Alexander Monakov

Ah.  I realize it's most likely for testing sibcall_[value]_pop_memory
peepholes, right?  In which case the testcase might look like this:

  /* { dg-do compile } */
  /* { dg-options "-O2" } */

  void foo (int a, void (**doo1) (void), void (**doo2) (void))
  {
char s[16] = {0};
do s[a] = 1; while (a &= a-1);
(*(s[8] ? doo1 : doo2)) ();
  }

  /* { dg-final { scan-assembler-not "call" } } */

However on the above testcase memory-indirect jump is currently generated only
for 64-bit x86.  With -mx32 it's impossible, but with -m32 the peephole
doesn't match.  Is that expected?

Can you also tell me why ..._pop call and sibcall instructions are predicated
on !TARGET_64BIT?

Thanks.

Alexander

Re: May 2015 Toolchain Update

2015-05-18 Thread Alexander Monakov

Hello,

A couple of comments below.

On Mon, 18 May 2015, Nick Clifton wrote:
>   val |= ~0 << loaded;// Generates warning
>   val |= (unsigned) ~0 << loaded; // Does not warn

To reduce verbosity, '~0u' can be used here instead of a cast.

> * GCC supports a new option: -fno-plt
> 
>   When compiling position independent code this tells the compiler
>   not to use PLT for external function calls.  Instead the address
>   is loaded from the GOT and then branched to directly.  This
>   leads to more efficient code by eliminating PLT stubs and
>   exposing GOT load to optimizations.
> 
>   Not all architectures support this option, and some other
>   optimization features, such as lazy binding, may disable it.

The last paragraph looks confusing to be on both points.  '-fno-plt' is
implemented as a transformation during TreeSSA-to-RTL expansion, so it works
in a machine-independent manner; it's a no-op only if the target has no way to
turn on '-fPIC'.  Is that what you meant?

Second, lazy binding is not an optimization feature of GCC (it's implemented
as part of (e.g. glibc's) dynamic linker), so it's not quite right to say that
-fno-plt would be disabled by it.  Text I've added to the documentation says:

Lazy binding requires PLT: with -fno-plt all external symbols are resolved
at load time.

Thus, for code compiled with -fno-plt the dynamic linker would not be able to
perform lazy binding (even if it was otherwise possible, e.g. -z now -z relro
weren't in effect, and profitable, i.e. the library was not already prelinked).

Alexander

Re: RFC: Creating a more efficient sincos interface

2018-09-13 Thread Alexander Monakov

On Thu, 13 Sep 2018, Wilco Dijkstra wrote:
> What do people think? Ideally I'd like to support this in a generic way so 
> all targets can
> benefit, but it's also feasible to enable it on a per-target basis. Also 
> since not all libraries
> will support the new interface, there would have to be a flag or configure 
> option to switch
> the new interface off if not supported (maybe automatically based on the 
> math.h header).

GCC already has __builtin_cexpi for this, so I think you can introduce cexpi
implementation in libc, and then adjust expand_builtin_cexpi appropriately.

I wonder if it would be possible to add a fallback cexpi implementation to
libgcc.a that would be picked by the linker if there's no such symbol in libm?

Alexander

libgcov as shared library and other issues

2018-09-24 Thread Alexander Monakov

Hello,

Here's the promised "libgcov summary"; sorry about the delay.

So libgcov has a bit unusual design where:

  - on the one hand, the library is static-only, has PIC code and may be linked
into shared libraries,

  - almost all gcov symbols have "hidden" visibility so they don't participate
in dynamic linking

  - on the other hand, the __gcov_master symbol deliberately has default
visibility, presumably with the intention that a running program has exactly
one instance of this symbol, although the exact motivation is unclear to me.

This latter point does not reliably work as intended though: there are scenarios
where a dynamically linked program will have multiple __gcov_masters anyway:

  - via repeated dlopen(RTLD_LOCAL) with main executable not linked against 
libgcov
or not exporting libgcov symbols (as in PR 83879)
  - when shared libraries have version scripts that hide their __gcov_master
  - when -Bsymbolic is in effect


Additionally, indirect call profiling symbols are not hidden either, and that
leads to extra complications. Since there are multiple symbols, during dynamic
linking they may be partially interposed. PR 84107 demonstrates how this leads
to libgcov segfaulting in a fairly simple and legitimate program.

Bottom line: static linking code with default-visibility symbols
into shared libraries is problematic.

So one strategy is to ensure all gcov symbols have hidden visibility. That would
isolate gcov instances in each shared library loaded in the program, and each
library would have the responsibility to write out its counters when unloaded.
Also, __gcov_dump would dump only the counters specific to the current library.

I may be missing something here so it might be nice to unearth why exactly
__gcov_master is intended to be global.

Another strategy is to introduce libgcov.so and have it host either all libgcov
symbols or just those that by design are required to exist once in the program.

When talking to Richi at the Cauldron I got the impression he'd question if
shared libgcov is worth the cost, e.g. would it make any easier for users to
mix two libraries, one linked against older libgcov, and another with a newer
(something that doesn't work at all now, but would be nice to support if I
understood Richard correctly).

Alexander

Re: Backporting gcc_qsort

2018-10-01 Thread Alexander Monakov

On Mon, 1 Oct 2018, Jeff Law wrote:
> To add a bit more context for Cory.
> 
> Generally backports are limited to fixing regressions and serious code
> generation bugs.  While we do make some exceptions, those are good
> general guidelines.
> 
> I don't think the qsort changes warrant an exception.

Personally I think in this case there isn't a strong reason to backport, the
patch is fairly isolated, so individuals or companies that need it should have
no problem backporting it on their own.  Previously, Franz Sirl reported back
in June they've used the patch to achieve matching output on their Linux-hosted
vs Cygwin-hosted cross-compilers based on GCC 8:
https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00751.html

Alexander

Re: movmem pattern and missed alignment

2018-10-08 Thread Alexander Monakov

On Mon, 8 Oct 2018, Michael Matz wrote:
> > Ok, but why is that not a bug?  The whole point of passing alignment to 
> > the movmem pattern is to let it generate code that takes advantage of 
> > the alignment.  So we get a missed optimization.
> 
> Only if you somewhere visibly add accesses to *i and *j.  Without them you 
> only have the "accesses" via memcpy, and as Richi says, those don't imply 
> any alignment requirements.  The i and j pointers might validly be char* 
> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's 
> nothing in your small example program from which GCC can infer that those 
> two global pointers are in fact 2-aligned.

Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an
'int *' value that is not suitably aligned:

  A pointer to an object type may be converted to a pointer to a different
  object type. If the resulting pointer is not correctly aligned for the
  referenced type, the behavior is undefined.

So in addition to what you said, we should probably say that GCC decides
not to exploit this UB in order to allow code to round-trip pointer values
via arbitrary pointer types?

To put Michael's explanation in different words:

This is not obviously a bug, because static pointer type does not imply the
dynamic pointed-to type. The caller of 'f1' could look like

void call_f1(void)
{
  short ibuf[20] = {0}, jbuf[20] = {0};
  i = (void *) ibuf;
  j = (void *) jbuf;
  f1();
}

and it's valid to memcpy from jbuf to ibuf, memcpy does not "see" the
static pointer type, and works as if by dereferencing 'char *' pointers.
(although as mentioned above it's more subtly invalid when assigning to
i and j).

If 'f1' dereferences 'i', GCC may deduce that dynamic type of '*i' is 'int' and
therefore 'i' must be suitably aligned. But in absence of dereferences GCC
does not make assumptions about dynamic type and alignment.

Alexander

Re: movmem pattern and missed alignment

2018-10-08 Thread Alexander Monakov

On Tue, 9 Oct 2018, Richard Biener wrote:
> >This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
> >for every language on strict-alignment platforms.  This was changed only
> >because of SSE on x86.
> 
> And because we ended up ignoring all pointer casts. 

It's not quite obvious what SSE has to do with this - any hint please?

(according to my quick check this changed between gcc-4.5 and gcc-4.6)

Alexander

Re: movmem pattern and missed alignment

2018-10-09 Thread Alexander Monakov

On Tue, 9 Oct 2018, Richard Biener wrote:
> 
> then we cannot set the alignment of i_1 at/after k = *i_1 because doing so 
> would
> affect the alignment test which we'd then optimize away.  We'd need to 
> introduce
> a SSA copy to get a new SSA name but that would be optimized away quickly.

We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
work to emit it just before the memcpy

  i_2 = __builtin_assume_aligned(i_1, 4);
  __builtin_memcpy(j, i_2, 32);

in theory?

Alexander

Re: avoidance of lea after 5 operations?

2018-10-11 Thread Alexander Monakov

On Thu, 11 Oct 2018, Jason A. Donenfeld wrote:
> 
> I realize this is probably a fairly trivial matter, but I am very
> curious if somebody knows which heuristic gcc is applying here, and
> why exactly. It's not something done by any other compiler I could
> find, and it only started happening with gcc 6.

It's a change in register allocation, gcc selects eax instead of esi
for the shifts. Doesn't appear to be obviously intentional, could be
a bug or bad luck.

Alexander

[wwwdocs] Typo in description of __builtin_expect_with_probability

2018-10-29 Thread Alexander Oblovatniy

Hello,

I'd like to report a typo in description of
«__builtin_expect_with_probability»:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins

The description starts with "The built-in has same semantics as
*__builtin_expect_with_probability*", but it seems like *__builtin_expect*
should be there.

Thanks!

Re: [PATCH 1/1] stackleak: Register the 'stackleak_cleanup' pass before the 'mach' pass

2018-12-03 Thread Alexander Popov

On 30.11.2018 20:12, Kees Cook wrote:
> On Fri, Nov 30, 2018 at 9:09 AM Kees Cook  wrote:
>>
>> On Fri, Nov 30, 2018 at 5:20 AM Alexander Popov  wrote:
>>>
>>> Currently the 'stackleak_cleanup' pass deleting a CALL insn is executed
>>> after the 'reload' pass. That allows gcc to do some weird optimization in
>>> function prologues and epilogues, which are generated later [1].
>>>
>>> Let's avoid that by registering the 'stackleak_cleanup' pass before
>>> the 'mach' pass, which performs the machine dependent code transformations.
>>> It's the moment when the stack frame size is final and function prologues
>>> and epilogues are already generated.
>>>
>>> [1] https://www.openwall.com/lists/kernel-hardening/2018/11/23/2
>>>
>>> Reported-by: kbuild test robot 
>>> Signed-off-by: Alexander Popov 
>>
>> Thanks, applied!
> 
> Eek, no, this is breaking my build badly:
> 
> *** WARNING *** there are active plugins, do not report this as a bug
> unless you can reproduce it without enabling any plugins.
> Event| Plugins
> PLUGIN_START_UNIT| stackleak_plugin
> kernel/exit.c: In function ‘release_task’:
> kernel/exit.c:228:1: internal compiler error: Segmentation fault
>  }
> 
> Failing with:
> 
> gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0

I've done debugging of gcc with gdb and now understand my mistake.

It turned out that I register the 'stackleak_cleanup' pass deleting CALL insn
for that particular moment when the control flow graph is inconsistent.

That's what the machine-specific reorg passes do on various architectures:

  /* We are freeing block_for_insn in the toplev to keep compatibility
 with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
  compute_bb_for_insn ();

So recomputing basic block info for insns before calling delete_insn_and_edges()
fixes the issue.

But I think it's better to register the 'stackleak_cleanup' pass just one pass
earlier -- before the '*free_cfg' pass. I'll double check it for different
versions of gcc on all supported architectures and return with a new patch.

Best regards,
Alexander

Re: [PATCH 1/1] stackleak: Register the 'stackleak_cleanup' pass before the 'mach' pass

2018-12-06 Thread Alexander Popov

On 03.12.2018 21:25, Alexander Popov wrote:
> But I think it's better to register the 'stackleak_cleanup' pass just one pass
> earlier -- before the '*free_cfg' pass. I'll double check it for different
> versions of gcc on all supported architectures and return with a new patch.

I've tested this idea for gcc-5,6,7,8 on x86_64, x86_32, and arm64.
I'll send the patch soon.

Best regards,
Alexander

1 2 3 >

1 - 100 of 286 matches

Mail list logo