Best way to compute cost of a sequence of gimple stmt

2014-06-10 Thread Thomas Preud'homme
Hi there,

With recent changes to it, the bswap pass can now replace a series of
(probably aligned) load + bitwise operation (AND, OR and shifts) + casts
by a (potentially unaligned) load and a bswap. I was rightfully pointed
out that this might be more expensive than the original sequence of
gimple statements. Therefore I am trying to compute the cost of the
sequence with and without the transformation to make an informed
decision.

So far I proceeded by reusing the computation_cost function from
ivopts and various functions from expmed (shift_cost, convert_cost
and some new ones: rot_cost for instance). However, this doesn't
allow me to compute the cost of a function call (the call to the bswap
builtin) and I am lurking towards exposing expand_gimple_stmt () in
a new function gimple_stmt_cost (). I am wondering though if it is a
correct thing to do as I am not familiar with how expansion operates.
I am also wondering if I should use gimple_stmt_cost as seldomly as
possible or on the contrary make use of it for all statements so as to
get rid of the modifications in ivopts and expmed.

I'd appreciate any advices on how to compute the cost of a sequence
of gimple statements.

Best regards,

Thomas Preud'homme




Re: Best way to compute cost of a sequence of gimple stmt

2014-06-10 Thread Richard Biener
On Tue, Jun 10, 2014 at 10:32 AM, Thomas Preud'homme
 wrote:
> Hi there,
>
> With recent changes to it, the bswap pass can now replace a series of
> (probably aligned) load + bitwise operation (AND, OR and shifts) + casts
> by a (potentially unaligned) load and a bswap. I was rightfully pointed
> out that this might be more expensive than the original sequence of
> gimple statements. Therefore I am trying to compute the cost of the
> sequence with and without the transformation to make an informed
> decision.
>
> So far I proceeded by reusing the computation_cost function from
> ivopts and various functions from expmed (shift_cost, convert_cost
> and some new ones: rot_cost for instance). However, this doesn't
> allow me to compute the cost of a function call (the call to the bswap
> builtin) and I am lurking towards exposing expand_gimple_stmt () in
> a new function gimple_stmt_cost (). I am wondering though if it is a
> correct thing to do as I am not familiar with how expansion operates.
> I am also wondering if I should use gimple_stmt_cost as seldomly as
> possible or on the contrary make use of it for all statements so as to
> get rid of the modifications in ivopts and expmed.
>
> I'd appreciate any advices on how to compute the cost of a sequence
> of gimple statements.

In general this is impossible to do.  I don't have a good answer on
how to determine whether (unaligned) load + bswap is faster than
doing sth else - but there is a very good chance that the original
code is even worse.  For the unaligned load you can expect
an optimal code sequence to be generated - likewise for the bswap.
Now - if you want to do the best for the combination of both I'd
say you add support to the expr.c bitfield extraction code to do
the bswap on-the-fly and use TER to see that you are doing the
bswap on a memory source.

Anyway, what you'd really need to do is compare the original
code against the transform where on GIMPLE it's very-many-stmts
vs. two-stmts, and thus "obviously faster".

There is only two choices - disable unaligned-load + bswap on
SLOW_UNALIGNED_ACCESS targets or not.  Doing sth more
fancy won't do the trick and isn't worth the trouble IMHO.

Richard.

> Best regards,
>
> Thomas Preud'homme
>
>


How can I get started as a GCC developer

2014-06-10 Thread Anonymous User

Hello everyone,

I'd like to help with the modularization of GCC.

I've visited the getting started page at the official wiki but its 
contents seem too old. I'm fairly new to GCC and and I'm bewildered by 
its huge code base with lengthy and complicated makefiles and 
sophisticated internal mechanisms. I don't know how to figure out the 
exact process GCC is built and feel incompetent to make changes to 
existing code for fear of breaking the underlying mechanisms. Also, it 
seems to me that the documentation on GCC internals at the official wiki 
and various resources the wiki points to are not enough to help a 
newcomer to make significant contribution, since they are largely 
incomplete and outdated.


However, I do have rudimentary knowledge about the structure of the code 
base and have successfully built a frontend that merely emits a main 
function that returns 0 for GCC 4.9.0. But I have no idea how I can make 
further progress other than by aimlessly browsing through the source code.


So how can I gain a systematic understanding of the internals of GCC in 
order to get started with some serious work?




implicit gnat_malloc seen as vararg function

2014-06-10 Thread BELBACHIR Selim
Hi,

I'm working on a private port of GCC 4.7.3/GNAT 7.1.2.

Calls to ADA 'new' operator generates implicit gnat_malloc(size) calls (which 
has to be provided by user program or runtime).

In my macro INIT_CUMULATIVE_ARGS I noticed that gnat_malloc(size) calls are 
seen as vararg function because the tree.c function stdarg_p(fntype) returns 
true. This creates bad code because caller put the argument in stack (this is 
normal behaviour for my vararg functions) whereas the callee expected argument 
in register (gnat_malloc signature should only contains the 'size'  argument of 
type size_type).

microblaze and iq2000 backend should have the same problem because inside 
INIT_CUMULATIVE_ARGS macro, variable argument function are identified by 
browsing fntype tree chain (like stdarg_p does)

Do I have to fix the GNAT frontend or did I missed an option dealing with 
gnat_malloc behaviour ?

Regards,

   Selim Belbachir


Re: [GSoC] decision tree first steps

2014-06-10 Thread Richard Biener
On Tue, Jun 10, 2014 at 1:06 PM, Prathamesh Kulkarni
 wrote:
> On Fri, Jun 6, 2014 at 7:18 PM, Richard Biener
>  wrote:
>> On Fri, Jun 6, 2014 at 12:02 PM, Richard Biener
>>  wrote:
>>> On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni
>>>  wrote:
 On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener
  wrote:
> On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni
>  wrote:
>> I have few questions regarding genmatch:
>>
>> a) Why is 4 hard-coded here: ?
>> in write_nary_simplifiers:
>>  fprintf (f, "  tree captures[4] = {};\n");
>
> Magic number (this must be big enough for all cases ...).  Honestly
> this should be improved (but requires another scan over the matcher IL
> to figure out the max N used in @N).
>
>> b) Should we add syntax for a symbol to denote multiple operators ?
>> For exampleim in simplify_rotate:
>> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^  (CNT1 + CNT2 ==
>> bitsize of type of X).
>
> Something to enhance the IL with, yes.  I'd say we support
>
> (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR)
>
> thus,
>
> (define_op  op...)
>
>> c) Remove for parsing capture in parse_expr since we reject outermost
>> captured expressions ?
>
> but parse_expr is also used for inner expressions, no?
>
> (plus (minus@2 @0 @1) @3)
>
> should still work
>
>> d) I am not able to follow this comment in match.pd:
>> /* Patterns required to avoid SCCVN testsuite regressions.  */
>>
>> /* (x >> 31) & 1 -> (x >> 31).  Folding in fold-const is more
>>complicated here, it does
>>  Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
>>  (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>>  if the new mask might be further optimized.  */
>> (match_and_simplify
>>   (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep)
>>   if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0)
>>   @0)
>
> The comment is literally copied from the case I extracted the
> (simplified) variant from fold-const.c.  See lines 11961-12056 in 
> fold-const.c.
> It'll be a challenge to implement the equivalent in a pattern ;)
>
>>
>> Decision Tree.
>> I have tried to come up with a prototype for decision tree (patch 
>> attached).
>> For simplicity, it handles patterns involving only unary operators and
>> no predicates
>> and returns false when the pattern fails to match (no goto to match
>> another pattern).
>> I meant to post it only for illustration, and I have not really paid
>> attention to code quality (bad formatting, memory leaks, etc.).
>>
>> * Basic Idea
>> A pattern consists of following parts: match, ifexpr and result.
>> Let's call  as "simplification" operand.
>> The common prefix between different match operands would be represented
>> by same nodes in the decision tree.
>>
>> Example:
>> (negate (bit_not @0))
>> S1
>>
>> (negate (negate @0))
>> S2
>>
>> S1, S2 denote simplifications for the above patterns respectively.
>>
>> The decision tree would look something like
>> (that's the way it gets constructed with the patch):
>>
>> dummy/root
>> |
>>NEGATE_EXPR
>>  /  \
>>  BIT_NOT   NEGATE_EXPR
>>| |
>>  @0 @0
>>| |
>>  S1  S2
>>
>> a) The children of an internal node are number of decisions that
>> can be taken at that node. In the above case it's 2 for outer 
>> NEGATE_EXPR.
>> b) Simplification operand represents leaves of the decision tree
>> c) Instead of having list of heads, I have added one dummy node,
>> and heads become children of these dummy root node.
>> d) Code-gen for non-simplification operands involves generating,
>> "matching" code and for simplification operands involves generating
>> "transform" code
>>
>> * Overall Flow:
>> I guess we would build the decision tree from the AST.
>> So the flow would be like:
>> source -> struct simplify (match, ifexpr, result) -> decision tree -> c 
>> code.
>>
>> Something like (in main):
>> decision_tree dt;
>> while (there is another pattern)
>> {
>>   simplify *s = parse_match_and_simplify ();
>>   insert s into decision tree;
>> };
>> So parsing routines are concerned with parsing and building the AST 
>> (operand),
>> and not with the decision tree. Is that fine ?
>
> Yes, that's good.
>
>> * Representation of decision tree.
>> A decision tree would need a way to represent language constructs
>

[PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Jiri Kosina
We have been chasing a memory corruption bug, which turned out to be 
caused by very old gcc (4.3.4), which happily turned conditional load into 
a non-conditional one, and that broke correctness (the condition was met 
only if lock was held) and corrupted memory.

This particular problem with that particular code did not happen when 
never gccs were used. I've brought this up with our gcc folks, as I wanted 
to make sure that this can't really happen again, and it turns out it 
actually can.

Quoting Martin Jambor :


More current GCCs are more careful when it comes to replacing a
conditional load with a non-conditional one, most notably they check
that a store happens in each iteration of _a_ loop but they assume
loops are executed.  They also perform a simple check whether the
store cannot trap which currently passes only for non-const
variables.  A simple testcase demonstrating it on an x86_64 is for
example the following:

$ cat cond_store.c

int g_1 = 1;

int g_2[1024] __attribute__((section ("safe_section"), aligned (4096)));

int c = 4;

int __attribute__ ((noinline))
foo (void)
{
  int l;
  for (l = 0; (l != 4); l++) {
if (g_1)
  return l;
for (g_2[0] = 0; (g_2[0] >= 26); ++g_2[0])
  ;
  }
  return 2;
}

int main (int argc, char* argv[])
{
  if (mprotect (g_2, sizeof(g_2), PROT_READ) == -1)
{
  int e = errno;
  error (e, e, "mprotect error %i", e);
}
  foo ();
  __builtin_printf("OK\n");
  return 0;
}
/* EOF */
$ ~/gcc/trunk/inst/bin/gcc cond_store.c -O2 --param allow-store-data-races=0
$ ./a.out
OK
$ ~/gcc/trunk/inst/bin/gcc cond_store.c -O2 --param allow-store-data-races=1
$ ./a.out
Segmentation fault

The testcase fails the same at least with 4.9, 4.8 and 4.7.  Therefore
I would suggest building kernels with this parameter set to zero. I
also agree with Jikos that the default should be changed for -O2.  I
have run most of the SPEC 2k6 CPU benchmarks (gamess and dealII
failed, at -O2, not sure why) compiled with and without this option
and did not see any real difference between respective run-times.


Hopefully the default will be changed in newer gccs, but let's force
it for kernel builds so that we are on a safe side even when older
gcc are used.

Cc: Martin Jambor 
Cc: Petr Mladek 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Signed-off-by: Jiri Kosina 
---
 Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index 00a933b..613367f 100644
--- a/Makefile
+++ b/Makefile
@@ -585,6 +585,9 @@ else
 KBUILD_CFLAGS  += -O2
 endif
 
+# Tell gcc to never replace conditional load with a non-conditional one
+KBUILD_CFLAGS  += $(call cc-option,--param allow-store-data-races=0)
+
 include $(srctree)/arch/$(SRCARCH)/Makefile
 
 ifdef CONFIG_READABLE_ASM

-- 
Jiri Kosina
SUSE Labs


Re: How can I get started as a GCC developer

2014-06-10 Thread Ian Lance Taylor
On Tue, Jun 10, 2014 at 3:30 AM, Anonymous User
 wrote:
>
> I'd like to help with the modularization of GCC.
>
> I've visited the getting started page at the official wiki but its contents
> seem too old. I'm fairly new to GCC and and I'm bewildered by its huge code
> base with lengthy and complicated makefiles and sophisticated internal
> mechanisms. I don't know how to figure out the exact process GCC is built
> and feel incompetent to make changes to existing code for fear of breaking
> the underlying mechanisms. Also, it seems to me that the documentation on
> GCC internals at the official wiki and various resources the wiki points to
> are not enough to help a newcomer to make significant contribution, since
> they are largely incomplete and outdated.
>
> However, I do have rudimentary knowledge about the structure of the code
> base and have successfully built a frontend that merely emits a main
> function that returns 0 for GCC 4.9.0. But I have no idea how I can make
> further progress other than by aimlessly browsing through the source code.
>
> So how can I gain a systematic understanding of the internals of GCC in
> order to get started with some serious work?

It's a good question.  You're quite right that the documentation tends
to be outdated and incomplete.  That suggests one easy way to
contribute: make the documentation better.  Unfortunately, apart from
that, it's hard to know how to answer your question.  It's possible to
answer a specific questions ("how does X work?").  It's hard to answer
a general question ("how does everything work?").

Ian


Re: How can I get started as a GCC developer

2014-06-10 Thread Basile Starynkevitch
On Tue, Jun 10, 2014 at 06:30:54PM +0800, Anonymous User wrote:

I'm not sure that being anonymous is helpful on GCC 
(and it might even be frowned upon, but I don't want to start a flamewar) I 
think that you need and you want 
to be identified. Besides, working on GCC is difficult; you'll soon be proud of 
being able
to work on it, and that also wants you to be identified. 

You surely need to start with legal stuff. They take time (perhaps months, and 
surely weeks!).
Read carefully http://gcc.gnu.org/contribute.html#legal

Once all is done and signed (both by FSF and you or your employer), add your 
real name in MAINTAINERS file.

> 
> So how can I gain a systematic understanding of the internals of GCC in
> order to get started with some serious work?


I'm definitely biaised, but I suggest first to be able to write some GCC 
plugins 
(or some MELT extensions, see http://gcc-melt.org/ for more).

Read in particular my latest slides 
http://gcc-melt.org/gcc-plugin-MELT-LinuxCollabSummit2014.pdf
(GCC plugins thru the MELT examples). They give a lot of pointers.

Don't forget to read http://www.cse.iitb.ac.in/grc/

Regards.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Peter Zijlstra
On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
> +# Tell gcc to never replace conditional load with a non-conditional one
> +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0)
> +

Why do we not want: -fmemory-model=safe? And should we not at the very
least also disable packed-store-data-races?


pgpDwTn3j17ts.pgp
Description: PGP signature


Re: Weird startup issue with -fsplit-stack

2014-06-10 Thread Dmitry Antipov

On 05/21/2014 06:10 PM, Ian Lance Taylor wrote:


I'm sorry, I have nothing useful to suggest.  I agree that that sounds
like a stack overflow, which should in general be impossible with
-fsplit-stack when using the gold linker.  I don't know what is
happening here.  I've tested with massive recursion so I don't think
that is the problem by itself.


Hm...did you test with a lot of longjmps? I'm just curious about this
comment in libgcc/generic-morestack.c:

/* The stack segment that we think we are currently using.  This will
   be correct in normal usage, but will be incorrect if an exception
   unwinds into a different stack segment or if longjmp jumps to a
   different stack segment.  */

So, what happens if longjmp jumps to a different segment? Is the result
undefined? Is it possible to detect such a jump?

Dmitry



Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Marek Polacek
On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
> > +# Tell gcc to never replace conditional load with a non-conditional one
> > +KBUILD_CFLAGS  += $(call cc-option,--param allow-store-data-races=0)
> > +
> 
> Why do we not want: -fmemory-model=safe? And should we not at the very
> least also disable packed-store-data-races?

Note that the option does not exist, even though it is mentioned in the
documentation.

Marek


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Peter Zijlstra
On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote:
> On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
> > > +# Tell gcc to never replace conditional load with a non-conditional one
> > > +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0)
> > > +
> > 
> > Why do we not want: -fmemory-model=safe? And should we not at the very
> > least also disable packed-store-data-races?
> 
> Note that the option does not exist, even though it is mentioned in the
> documentation.

Urgh.. ok. Any word on the packed-store-data thing?


pgpMWhQCfAGsj.pgp
Description: PGP signature


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Jakub Jelinek
On Tue, Jun 10, 2014 at 05:13:29PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote:
> > On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
> > > > +# Tell gcc to never replace conditional load with a non-conditional one
> > > > +KBUILD_CFLAGS  += $(call cc-option,--param allow-store-data-races=0)
> > > > +
> > > 
> > > Why do we not want: -fmemory-model=safe? And should we not at the very
> > > least also disable packed-store-data-races?
> > 
> > Note that the option does not exist, even though it is mentioned in the
> > documentation.
> 
> Urgh.. ok. Any word on the packed-store-data thing?

That is recognized, undocumented and never used in the compiler (not in 4.7
or any later release till now).  Most of the spots in the compiler that
could introduce data races were actually just changed, there is (already
since 4.7) just a single conditional on the --param allow-store-data-races=X
value.

Jakub


Re: How can I get started as a GCC developer

2014-06-10 Thread Jonathan Wakely
On 10 June 2014 15:00, Basile Starynkevitch wrote:
> Once all is done and signed (both by FSF and you or your employer), add your 
> real name in MAINTAINERS file.

That's not needed until you get write access to the repository, but
you can submit patches and contribute without write access.


Re: How can I get started as a GCC developer

2014-06-10 Thread Jonathan Wakely
On 10 June 2014 16:38, Jonathan Wakely wrote:
> On 10 June 2014 15:00, Basile Starynkevitch wrote:
>> Once all is done and signed (both by FSF and you or your employer), add your 
>> real name in MAINTAINERS file.
>
> That's not needed until you get write access to the repository, but
> you can submit patches and contribute without write access.

To clarify, I meant adding yourself to the MAINTAINERS file is not
required to contribute (the signed copyright paperwork is needed).


Re: Weird startup issue with -fsplit-stack

2014-06-10 Thread Ian Lance Taylor
On Tue, Jun 10, 2014 at 7:54 AM, Dmitry Antipov  wrote:
> On 05/21/2014 06:10 PM, Ian Lance Taylor wrote:
>
>> I'm sorry, I have nothing useful to suggest.  I agree that that sounds
>> like a stack overflow, which should in general be impossible with
>> -fsplit-stack when using the gold linker.  I don't know what is
>> happening here.  I've tested with massive recursion so I don't think
>> that is the problem by itself.
>
>
> Hm...did you test with a lot of longjmps? I'm just curious about this
> comment in libgcc/generic-morestack.c:
>
> /* The stack segment that we think we are currently using.  This will
>be correct in normal usage, but will be incorrect if an exception
>unwinds into a different stack segment or if longjmp jumps to a
>different stack segment.  */
>
> So, what happens if longjmp jumps to a different segment? Is the result
> undefined? Is it possible to detect such a jump?

I have not tested with longjmps.  The Go library uses a similar
function (setcontext) but it updates the split stack context as it
goes.

You may be right: a program that splits the stack, then calls setjmp,
then splits the stack, then calls longjmp, may mess up after the
longjmp when returning from the stack split in the function that calls
setjmp.

This case can be detected in __generic_releasestack.  If the current
stack pointer is not in __morestack_current_segment, the code can call
__generic_findstack to set __morestack_current_segment to the correct
value.

Ian


broken links?

2014-06-10 Thread Hebenstreit, Michael
Either something is broken on my web-access or the links on 
https://gcc.gnu.org/install/prerequisites.html pointing to 
ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the files 
anywhere else :(

Thanks for all your efforts
Michael


Michael Hebenstreit Senior Cluster Architect
Intel Corporation, MS: RR1-105/H14  Software and Services Group/DCE
4100 Sara Road  Tel.:   +1 505-794-3144 
Rio Rancho, NM 87124
UNITED STATES   E-mail: michael.hebenstr...@intel.com



Re: broken links?

2014-06-10 Thread Jonathan Wakely
On 10 June 2014 17:20, Hebenstreit, Michael wrote:
> Either something is broken on my web-access or the links on 
> https://gcc.gnu.org/install/prerequisites.html pointing to 
> ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the files 
> anywhere else :(

The FTP site and files are still there (I've just checked with lftp on
the command-line) but clicking on the links in Firefox doesn't work
(but does work in Webkit-based browsers). Pasting the ftp:// link
directly into Firefox's location bar works.

Maybe related to the recent move to HTTPS on gcc.gnu.org but it seems
to be a Firefox problem, not gcc.gnu.org one.


Re: broken links?

2014-06-10 Thread Jonathan Wakely
On 10 June 2014 17:41, Jonathan Wakely wrote:
> On 10 June 2014 17:20, Hebenstreit, Michael wrote:
>> Either something is broken on my web-access or the links on 
>> https://gcc.gnu.org/install/prerequisites.html pointing to 
>> ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the files 
>> anywhere else :(
>
> The FTP site and files are still there (I've just checked with lftp on
> the command-line) but clicking on the links in Firefox doesn't work
> (but does work in Webkit-based browsers). Pasting the ftp:// link
> directly into Firefox's location bar works.

Hmm, and now clicking the links works. I wonder if someone just fixed
something, or if Firefox just needed a kick.


RE: broken links?

2014-06-10 Thread Hebenstreit, Michael
you are right  - must be a Firefox problem; I had no problem using wget, IE8 
works as well 
Firefox is still redirected to https://gcc.gnu.org/pub/gcc/infrastructure/ 
though

sorry for alarm 
thanks for reply
Michael

-Original Message-
From: Jonathan Wakely [mailto:jwakely@gmail.com] 
Sent: Tuesday, June 10, 2014 10:43 AM
To: Hebenstreit, Michael
Cc: gcc@gcc.gnu.org
Subject: Re: broken links?

On 10 June 2014 17:41, Jonathan Wakely wrote:
> On 10 June 2014 17:20, Hebenstreit, Michael wrote:
>> Either something is broken on my web-access or the links on 
>> https://gcc.gnu.org/install/prerequisites.html pointing to 
>> ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the 
>> files anywhere else :(
>
> The FTP site and files are still there (I've just checked with lftp on 
> the command-line) but clicking on the links in Firefox doesn't work 
> (but does work in Webkit-based browsers). Pasting the ftp:// link 
> directly into Firefox's location bar works.

Hmm, and now clicking the links works. I wonder if someone just fixed 
something, or if Firefox just needed a kick.


Issue with sub-object __builtin_object_size

2014-06-10 Thread Ulrich Weigand
Hello,

the following C++ test case:

struct pollfd
  {
int fd;
short int events;
short int revents;
  };

struct Pollfd : public pollfd { };

struct Pollfd myfd[10];

int test (void)
{
  return __builtin_object_size ((struct pollfd *)myfd, 1);
}

ends up returning 8 from the "test" routine, not 80.


In the real-world application this test case was extracted from,
this causes a call:

  poll(myfd, count, 0);  // 1 < count < 10

to fail with a "Buffer overflow detected" message at run-time
when building with _FORTIFY_SOURCE = 2 against glibc.  [ Here,
there is no explicit cast, but it is implied by the prototype
of the "poll" routine. ]

(Note that in the real-world application, the derived struct Pollfd
has some member functions to construct and pretty-print the structure,
but has no additional data members.)


>From the __builtin_object_size documentation, it's not immediately
clear to me whether this is supposed to work or not:

   If the least significant
   bit is clear, objects are whole variables, if it is set, a closest
   surrounding subobject is considered the object a pointer points to.

Is the presence of the above cast (explicit or implicit) supposed to
modify the notion of "closest surrounding subobject"?


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Linus Torvalds
On Tue, Jun 10, 2014 at 6:23 AM, Jiri Kosina  wrote:
> We have been chasing a memory corruption bug, which turned out to be
> caused by very old gcc (4.3.4), which happily turned conditional load into
> a non-conditional one, and that broke correctness (the condition was met
> only if lock was held) and corrupted memory.

Just out of interest, can you point to the particular kernel code that
caused this? I think that's more interesting than the example program
you show - which I'm sure is really nice for gcc developers as an
example, but from a kernel standpoint I think it's more important to
show the particular problems this caused for the kernel?

Linus


Re: Issue with sub-object __builtin_object_size

2014-06-10 Thread Jakub Jelinek
On Tue, Jun 10, 2014 at 07:37:50PM +0200, Ulrich Weigand wrote:
> the following C++ test case:
> 
> struct pollfd
>   {
> int fd;
> short int events;
> short int revents;
>   };
> 
> struct Pollfd : public pollfd { };
> 
> struct Pollfd myfd[10];
> 
> int test (void)
> {
>   return __builtin_object_size ((struct pollfd *)myfd, 1);
> }
> 
> ends up returning 8 from the "test" routine, not 80.

The thing is that the C++ FE transforms this kind of cast to
&((struct Pollfd *) &myfd)->D.2233
so for middle-end where __builtin_object_size is evaluated this
is like:
struct Pollfd { struct pollfd something; };
struct Pollfd myfd[10];
int test (void)
{
  return __builtin_object_size (&myfd[0].something, 1);
}
and returning 8 in that case is completely correct.
So the question is why instead of a simple cast it created
ADDR_EXPR of a FIELD_DECL instead.

Jakub


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Steven Noonan
On Tue, Jun 10, 2014 at 10:46 AM, Linus Torvalds
 wrote:
> On Tue, Jun 10, 2014 at 6:23 AM, Jiri Kosina  wrote:
>> We have been chasing a memory corruption bug, which turned out to be
>> caused by very old gcc (4.3.4), which happily turned conditional load into
>> a non-conditional one, and that broke correctness (the condition was met
>> only if lock was held) and corrupted memory.
>
> Just out of interest, can you point to the particular kernel code that
> caused this? I think that's more interesting than the example program
> you show - which I'm sure is really nice for gcc developers as an
> example, but from a kernel standpoint I think it's more important to
> show the particular problems this caused for the kernel?
>

Jiri, is there a workaround for compilers that don't support '--param
allow-store-data-races=0'? For example:

$ gcc-4.5 -O2 -o cond_store cond_store.c && ./cond_store
Segmentation fault (core dumped)

$ gcc-4.5 -O2 --param allow-store-data-races=0 -o cond_store
cond_store.c && ./cond_store
cc1: error: invalid parameter ‘allow-store-data-races’

$ gcc-4.5 -v
Using built-in specs.
COLLECT_GCC=gcc-4.5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.4/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --libdir=/usr/lib --libexecdir=/usr/lib
--program-suffix=-4.5 --enable-shared
--enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit
--disable-libstdcxx-pch --disable-multilib --disable-libgomp
--disable-libmudflap --disable-libssp --enable-clocale=gnu
--with-tune=generic --with-cloog --with-ppl --with-system-zlib
Thread model: posix
gcc version 4.5.4 (GCC)


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Jiri Kosina
On Tue, 10 Jun 2014, Linus Torvalds wrote:

> > We have been chasing a memory corruption bug, which turned out to be
> > caused by very old gcc (4.3.4), which happily turned conditional load into
> > a non-conditional one, and that broke correctness (the condition was met
> > only if lock was held) and corrupted memory.
> 
> Just out of interest, can you point to the particular kernel code that
> caused this? I think that's more interesting than the example program
> you show - which I'm sure is really nice for gcc developers as an
> example, but from a kernel standpoint I think it's more important to
> show the particular problems this caused for the kernel?

Well, as I said, that was with gcc 4.3.4, and GCC people expressed 
themselves that that was a slightly different optimization. It made me 
nervous enough though to ask whether it's absolutely positively not going 
to happen with newer gccs, and the code snippet quoted in the original 
mail came back as a response.

The code in question was out-of-tree printk-in-NMI (yeah, surprise 
suprise, once again) patch written by Petr Mladek, let me quote his 
comment from our internal bugzilla:

===
I have spent few days investigating inconsistent state of kernel ring buffer.
It went out that it was caused by speculative store generated by 
gcc-4.3.4.

The problem is in assembly generated for make_free_space(). The functions is
called the following way:

+ vprintk_emit();
+ log = MAIN_LOG; // with logbuf_lock
   or
   log = NMI_LOG; // with nmi_logbuf_lock
   cont_add(log, ...);
+ cont_flush(log, ...);
+ log_store(log, ...);
  + log_make_free_space(log, ...);

If called with log = NMI_LOG then only nmi_log_* global variables are safe to
modify but the generated code does store also into (main_)log_* global 
variables:


:
   55  push   %rbp
   89 f6   mov%esi,%esi

   48 8b 05 03 99 51 01mov0x1519903(%rip),%rax   # 
82620868 
   44 8b 1d ec 98 51 01mov0x15198ec(%rip),%r11d  # 
82620858 
   8b 35 36 60 14 01   mov0x1146036(%rip),%esi   # 
8224cfa8 
   44 8b 35 33 60 14 01mov0x1146033(%rip),%r14d  # 
8224cfac 
   4c 8b 2d d0 98 51 01mov0x15198d0(%rip),%r13   # 
82620850 
   4c 8b 25 11 61 14 01mov0x1146111(%rip),%r12   # 
8224d098 
   49 89 c2mov%rax,%r10
   48 21 c2and%rax,%rdx
   48 8b 1d 0c 99 55 01mov0x155990c(%rip),%rbx   # 
826608a0 
   49 c1 ea 20 shr$0x20,%r10
   48 89 55 d0 mov%rdx,-0x30(%rbp)
   44 29 desub%r11d,%esi
   45 29 d6sub%r10d,%r14d
   4c 8b 0d 97 98 51 01mov0x1519897(%rip),%r9   # 
82620840 
   eb 7e   jmp81107029  

[...]
   85 ff   test   %edi,%edi  # edi = 1 for 
NMI_LOG
   4c 89 e8mov%r13,%rax
   4c 89 camov%r9,%rdx
   74 0a   je 8110703d  

   8b 15 27 98 51 01   mov0x1519827(%rip),%edx   # 
82620860 
   48 8b 45 d0 mov-0x30(%rbp),%rax
   48 39 c2cmp%rax,%rdx  # end of loop 
   0f 84 da 00 00 00   je 81107120 

[...]
   85 ff   test   %edi,%edi  # edi = 1 for 
NMI_LOG
   4c 89 0d 17 97 51 01mov%r9,0x1519717(%rip)# 
82620840 
   ^^
   KABOOOM
   74 35   je 81107160   



It stores log_first_seq when edi == NMI_LOG. This instructions are used also
when edi == MAIN_LOG but the store is done speculatively before the condition
is decided.  It is unsafe because we do not have "logbuf_lock" in NMI context
and some other process migh modify "log_first_seq" in parallel.
===

I believe that the best course of action is both

- building kernel (and anything multi-threaded, I guess) with that 
  optimization turned off
- persuade gcc folks to change the default for future releases

Thanks,

-- 
Jiri Kosina
SUSE Labs


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Richard Biener
On June 10, 2014 8:04:13 PM CEST, Steven Noonan  wrote:
>On Tue, Jun 10, 2014 at 10:46 AM, Linus Torvalds
> wrote:
>> On Tue, Jun 10, 2014 at 6:23 AM, Jiri Kosina  wrote:
>>> We have been chasing a memory corruption bug, which turned out to be
>>> caused by very old gcc (4.3.4), which happily turned conditional
>load into
>>> a non-conditional one, and that broke correctness (the condition was
>met
>>> only if lock was held) and corrupted memory.
>>
>> Just out of interest, can you point to the particular kernel code
>that
>> caused this? I think that's more interesting than the example program
>> you show - which I'm sure is really nice for gcc developers as an
>> example, but from a kernel standpoint I think it's more important to
>> show the particular problems this caused for the kernel?
>>
>
>Jiri, is there a workaround for compilers that don't support '--param
>allow-store-data-races=0'? For example:

The optimization that purposely performs the undesired transform is loop store 
motion which is part of the tree loop invariant motion optimization. You can 
disable that with -fno-tree-loop-im.

That the bug didn't appear with newer compilers was due to lucky decisions to 
not inline a particular function.

Richard.

>$ gcc-4.5 -O2 -o cond_store cond_store.c && ./cond_store
>Segmentation fault (core dumped)
>
>$ gcc-4.5 -O2 --param allow-store-data-races=0 -o cond_store
>cond_store.c && ./cond_store
>cc1: error: invalid parameter ‘allow-store-data-races’
>
>$ gcc-4.5 -v
>Using built-in specs.
>COLLECT_GCC=gcc-4.5
>COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.4/lto-wrapper
>Target: x86_64-unknown-linux-gnu
>Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
>--infodir=/usr/share/info --libdir=/usr/lib --libexecdir=/usr/lib
>--program-suffix=-4.5 --enable-shared
>--enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit
>--disable-libstdcxx-pch --disable-multilib --disable-libgomp
>--disable-libmudflap --disable-libssp --enable-clocale=gnu
>--with-tune=generic --with-cloog --with-ppl --with-system-zlib
>Thread model: posix
>gcc version 4.5.4 (GCC)




RE: Best way to compute cost of a sequence of gimple stmt

2014-06-10 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, June 10, 2014 5:16 PM
> 

> In general this is impossible to do.  I don't have a good answer on
> how to determine whether (unaligned) load + bswap is faster than
> doing sth else - but there is a very good chance that the original
> code is even worse.  For the unaligned load you can expect
> an optimal code sequence to be generated - likewise for the bswap.
> Now - if you want to do the best for the combination of both I'd
> say you add support to the expr.c bitfield extraction code to do
> the bswap on-the-fly and use TER to see that you are doing the
> bswap on a memory source.

Oh I see. Doing it there would mean instead of two independent
operations you'd do the best combination possible, is that right?

> 
> There is only two choices - disable unaligned-load + bswap on
> SLOW_UNALIGNED_ACCESS targets or not.  Doing sth more
> fancy won't do the trick and isn't worth the trouble IMHO.

There is some other reason to compute the cost that I didn't
mention. For instance, you suggested to recognize partial
load (+bswap). Quoting you:

> unsigned foo (unsigned char *x)
> {
>   return x[0] << 24 | x[2] << 8 | x[3];
> }
> 
> ?  We could do an unsigned int load from x and zero byte 3
> with an AND.  

Even with aligned access, the above might be slower if x[0] was
already loaded previously and sits in a register.

I'm tempted to use a simple heuristic such as comparing the
number of loads before and after, adding one if the load is
unaligned. So in the above example, supposing that there is
some computation done around x[0] before the return line,
we'd have 2 loads before Vs 2 x is unaligned and we would
cancel the optimization. If x is aligned the optimization would
proceed.

Do you thing this approach is also too much trouble or would
not work?

Best regards,

Thomas