Best way to compute cost of a sequence of gimple stmt
Hi there, With recent changes to it, the bswap pass can now replace a series of (probably aligned) load + bitwise operation (AND, OR and shifts) + casts by a (potentially unaligned) load and a bswap. I was rightfully pointed out that this might be more expensive than the original sequence of gimple statements. Therefore I am trying to compute the cost of the sequence with and without the transformation to make an informed decision. So far I proceeded by reusing the computation_cost function from ivopts and various functions from expmed (shift_cost, convert_cost and some new ones: rot_cost for instance). However, this doesn't allow me to compute the cost of a function call (the call to the bswap builtin) and I am lurking towards exposing expand_gimple_stmt () in a new function gimple_stmt_cost (). I am wondering though if it is a correct thing to do as I am not familiar with how expansion operates. I am also wondering if I should use gimple_stmt_cost as seldomly as possible or on the contrary make use of it for all statements so as to get rid of the modifications in ivopts and expmed. I'd appreciate any advices on how to compute the cost of a sequence of gimple statements. Best regards, Thomas Preud'homme
Re: Best way to compute cost of a sequence of gimple stmt
On Tue, Jun 10, 2014 at 10:32 AM, Thomas Preud'homme wrote: > Hi there, > > With recent changes to it, the bswap pass can now replace a series of > (probably aligned) load + bitwise operation (AND, OR and shifts) + casts > by a (potentially unaligned) load and a bswap. I was rightfully pointed > out that this might be more expensive than the original sequence of > gimple statements. Therefore I am trying to compute the cost of the > sequence with and without the transformation to make an informed > decision. > > So far I proceeded by reusing the computation_cost function from > ivopts and various functions from expmed (shift_cost, convert_cost > and some new ones: rot_cost for instance). However, this doesn't > allow me to compute the cost of a function call (the call to the bswap > builtin) and I am lurking towards exposing expand_gimple_stmt () in > a new function gimple_stmt_cost (). I am wondering though if it is a > correct thing to do as I am not familiar with how expansion operates. > I am also wondering if I should use gimple_stmt_cost as seldomly as > possible or on the contrary make use of it for all statements so as to > get rid of the modifications in ivopts and expmed. > > I'd appreciate any advices on how to compute the cost of a sequence > of gimple statements. In general this is impossible to do. I don't have a good answer on how to determine whether (unaligned) load + bswap is faster than doing sth else - but there is a very good chance that the original code is even worse. For the unaligned load you can expect an optimal code sequence to be generated - likewise for the bswap. Now - if you want to do the best for the combination of both I'd say you add support to the expr.c bitfield extraction code to do the bswap on-the-fly and use TER to see that you are doing the bswap on a memory source. Anyway, what you'd really need to do is compare the original code against the transform where on GIMPLE it's very-many-stmts vs. two-stmts, and thus "obviously faster". There is only two choices - disable unaligned-load + bswap on SLOW_UNALIGNED_ACCESS targets or not. Doing sth more fancy won't do the trick and isn't worth the trouble IMHO. Richard. > Best regards, > > Thomas Preud'homme > >
How can I get started as a GCC developer
Hello everyone, I'd like to help with the modularization of GCC. I've visited the getting started page at the official wiki but its contents seem too old. I'm fairly new to GCC and and I'm bewildered by its huge code base with lengthy and complicated makefiles and sophisticated internal mechanisms. I don't know how to figure out the exact process GCC is built and feel incompetent to make changes to existing code for fear of breaking the underlying mechanisms. Also, it seems to me that the documentation on GCC internals at the official wiki and various resources the wiki points to are not enough to help a newcomer to make significant contribution, since they are largely incomplete and outdated. However, I do have rudimentary knowledge about the structure of the code base and have successfully built a frontend that merely emits a main function that returns 0 for GCC 4.9.0. But I have no idea how I can make further progress other than by aimlessly browsing through the source code. So how can I gain a systematic understanding of the internals of GCC in order to get started with some serious work?
implicit gnat_malloc seen as vararg function
Hi, I'm working on a private port of GCC 4.7.3/GNAT 7.1.2. Calls to ADA 'new' operator generates implicit gnat_malloc(size) calls (which has to be provided by user program or runtime). In my macro INIT_CUMULATIVE_ARGS I noticed that gnat_malloc(size) calls are seen as vararg function because the tree.c function stdarg_p(fntype) returns true. This creates bad code because caller put the argument in stack (this is normal behaviour for my vararg functions) whereas the callee expected argument in register (gnat_malloc signature should only contains the 'size' argument of type size_type). microblaze and iq2000 backend should have the same problem because inside INIT_CUMULATIVE_ARGS macro, variable argument function are identified by browsing fntype tree chain (like stdarg_p does) Do I have to fix the GNAT frontend or did I missed an option dealing with gnat_malloc behaviour ? Regards, Selim Belbachir
Re: [GSoC] decision tree first steps
On Tue, Jun 10, 2014 at 1:06 PM, Prathamesh Kulkarni wrote: > On Fri, Jun 6, 2014 at 7:18 PM, Richard Biener > wrote: >> On Fri, Jun 6, 2014 at 12:02 PM, Richard Biener >> wrote: >>> On Fri, Jun 6, 2014 at 11:02 AM, Prathamesh Kulkarni >>> wrote: On Mon, Jun 2, 2014 at 6:14 PM, Richard Biener wrote: > On Mon, Jun 2, 2014 at 1:16 PM, Prathamesh Kulkarni > wrote: >> I have few questions regarding genmatch: >> >> a) Why is 4 hard-coded here: ? >> in write_nary_simplifiers: >> fprintf (f, " tree captures[4] = {};\n"); > > Magic number (this must be big enough for all cases ...). Honestly > this should be improved (but requires another scan over the matcher IL > to figure out the max N used in @N). > >> b) Should we add syntax for a symbol to denote multiple operators ? >> For exampleim in simplify_rotate: >> (X << CNT1) OP (X >> CNT2) with OP being +, |, ^ (CNT1 + CNT2 == >> bitsize of type of X). > > Something to enhance the IL with, yes. I'd say we support > > (define_op additive PLUS_EXPR MINUS_EXPR POINTER_PLUS_EXPR) > > thus, > > (define_op op...) > >> c) Remove for parsing capture in parse_expr since we reject outermost >> captured expressions ? > > but parse_expr is also used for inner expressions, no? > > (plus (minus@2 @0 @1) @3) > > should still work > >> d) I am not able to follow this comment in match.pd: >> /* Patterns required to avoid SCCVN testsuite regressions. */ >> >> /* (x >> 31) & 1 -> (x >> 31). Folding in fold-const is more >>complicated here, it does >> Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) >> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) >> if the new mask might be further optimized. */ >> (match_and_simplify >> (bit_and (rshift@0 @1 INTEGER_CST_P@2) integer_onep) >> if (compare_tree_int (@2, TYPE_PRECISION (TREE_TYPE (@1)) - 1) == 0) >> @0) > > The comment is literally copied from the case I extracted the > (simplified) variant from fold-const.c. See lines 11961-12056 in > fold-const.c. > It'll be a challenge to implement the equivalent in a pattern ;) > >> >> Decision Tree. >> I have tried to come up with a prototype for decision tree (patch >> attached). >> For simplicity, it handles patterns involving only unary operators and >> no predicates >> and returns false when the pattern fails to match (no goto to match >> another pattern). >> I meant to post it only for illustration, and I have not really paid >> attention to code quality (bad formatting, memory leaks, etc.). >> >> * Basic Idea >> A pattern consists of following parts: match, ifexpr and result. >> Let's call as "simplification" operand. >> The common prefix between different match operands would be represented >> by same nodes in the decision tree. >> >> Example: >> (negate (bit_not @0)) >> S1 >> >> (negate (negate @0)) >> S2 >> >> S1, S2 denote simplifications for the above patterns respectively. >> >> The decision tree would look something like >> (that's the way it gets constructed with the patch): >> >> dummy/root >> | >>NEGATE_EXPR >> / \ >> BIT_NOT NEGATE_EXPR >>| | >> @0 @0 >>| | >> S1 S2 >> >> a) The children of an internal node are number of decisions that >> can be taken at that node. In the above case it's 2 for outer >> NEGATE_EXPR. >> b) Simplification operand represents leaves of the decision tree >> c) Instead of having list of heads, I have added one dummy node, >> and heads become children of these dummy root node. >> d) Code-gen for non-simplification operands involves generating, >> "matching" code and for simplification operands involves generating >> "transform" code >> >> * Overall Flow: >> I guess we would build the decision tree from the AST. >> So the flow would be like: >> source -> struct simplify (match, ifexpr, result) -> decision tree -> c >> code. >> >> Something like (in main): >> decision_tree dt; >> while (there is another pattern) >> { >> simplify *s = parse_match_and_simplify (); >> insert s into decision tree; >> }; >> So parsing routines are concerned with parsing and building the AST >> (operand), >> and not with the decision tree. Is that fine ? > > Yes, that's good. > >> * Representation of decision tree. >> A decision tree would need a way to represent language constructs >
[PATCH] tell gcc optimizer to never introduce new data races
We have been chasing a memory corruption bug, which turned out to be caused by very old gcc (4.3.4), which happily turned conditional load into a non-conditional one, and that broke correctness (the condition was met only if lock was held) and corrupted memory. This particular problem with that particular code did not happen when never gccs were used. I've brought this up with our gcc folks, as I wanted to make sure that this can't really happen again, and it turns out it actually can. Quoting Martin Jambor : More current GCCs are more careful when it comes to replacing a conditional load with a non-conditional one, most notably they check that a store happens in each iteration of _a_ loop but they assume loops are executed. They also perform a simple check whether the store cannot trap which currently passes only for non-const variables. A simple testcase demonstrating it on an x86_64 is for example the following: $ cat cond_store.c int g_1 = 1; int g_2[1024] __attribute__((section ("safe_section"), aligned (4096))); int c = 4; int __attribute__ ((noinline)) foo (void) { int l; for (l = 0; (l != 4); l++) { if (g_1) return l; for (g_2[0] = 0; (g_2[0] >= 26); ++g_2[0]) ; } return 2; } int main (int argc, char* argv[]) { if (mprotect (g_2, sizeof(g_2), PROT_READ) == -1) { int e = errno; error (e, e, "mprotect error %i", e); } foo (); __builtin_printf("OK\n"); return 0; } /* EOF */ $ ~/gcc/trunk/inst/bin/gcc cond_store.c -O2 --param allow-store-data-races=0 $ ./a.out OK $ ~/gcc/trunk/inst/bin/gcc cond_store.c -O2 --param allow-store-data-races=1 $ ./a.out Segmentation fault The testcase fails the same at least with 4.9, 4.8 and 4.7. Therefore I would suggest building kernels with this parameter set to zero. I also agree with Jikos that the default should be changed for -O2. I have run most of the SPEC 2k6 CPU benchmarks (gamess and dealII failed, at -O2, not sure why) compiled with and without this option and did not see any real difference between respective run-times. Hopefully the default will be changed in newer gccs, but let's force it for kernel builds so that we are on a safe side even when older gcc are used. Cc: Martin Jambor Cc: Petr Mladek Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Andrew Morton Signed-off-by: Jiri Kosina --- Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Makefile b/Makefile index 00a933b..613367f 100644 --- a/Makefile +++ b/Makefile @@ -585,6 +585,9 @@ else KBUILD_CFLAGS += -O2 endif +# Tell gcc to never replace conditional load with a non-conditional one +KBUILD_CFLAGS += $(call cc-option,--param allow-store-data-races=0) + include $(srctree)/arch/$(SRCARCH)/Makefile ifdef CONFIG_READABLE_ASM -- Jiri Kosina SUSE Labs
Re: How can I get started as a GCC developer
On Tue, Jun 10, 2014 at 3:30 AM, Anonymous User wrote: > > I'd like to help with the modularization of GCC. > > I've visited the getting started page at the official wiki but its contents > seem too old. I'm fairly new to GCC and and I'm bewildered by its huge code > base with lengthy and complicated makefiles and sophisticated internal > mechanisms. I don't know how to figure out the exact process GCC is built > and feel incompetent to make changes to existing code for fear of breaking > the underlying mechanisms. Also, it seems to me that the documentation on > GCC internals at the official wiki and various resources the wiki points to > are not enough to help a newcomer to make significant contribution, since > they are largely incomplete and outdated. > > However, I do have rudimentary knowledge about the structure of the code > base and have successfully built a frontend that merely emits a main > function that returns 0 for GCC 4.9.0. But I have no idea how I can make > further progress other than by aimlessly browsing through the source code. > > So how can I gain a systematic understanding of the internals of GCC in > order to get started with some serious work? It's a good question. You're quite right that the documentation tends to be outdated and incomplete. That suggests one easy way to contribute: make the documentation better. Unfortunately, apart from that, it's hard to know how to answer your question. It's possible to answer a specific questions ("how does X work?"). It's hard to answer a general question ("how does everything work?"). Ian
Re: How can I get started as a GCC developer
On Tue, Jun 10, 2014 at 06:30:54PM +0800, Anonymous User wrote: I'm not sure that being anonymous is helpful on GCC (and it might even be frowned upon, but I don't want to start a flamewar) I think that you need and you want to be identified. Besides, working on GCC is difficult; you'll soon be proud of being able to work on it, and that also wants you to be identified. You surely need to start with legal stuff. They take time (perhaps months, and surely weeks!). Read carefully http://gcc.gnu.org/contribute.html#legal Once all is done and signed (both by FSF and you or your employer), add your real name in MAINTAINERS file. > > So how can I gain a systematic understanding of the internals of GCC in > order to get started with some serious work? I'm definitely biaised, but I suggest first to be able to write some GCC plugins (or some MELT extensions, see http://gcc-melt.org/ for more). Read in particular my latest slides http://gcc-melt.org/gcc-plugin-MELT-LinuxCollabSummit2014.pdf (GCC plugins thru the MELT examples). They give a lot of pointers. Don't forget to read http://www.cse.iitb.ac.in/grc/ Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote: > +# Tell gcc to never replace conditional load with a non-conditional one > +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0) > + Why do we not want: -fmemory-model=safe? And should we not at the very least also disable packed-store-data-races? pgpDwTn3j17ts.pgp Description: PGP signature
Re: Weird startup issue with -fsplit-stack
On 05/21/2014 06:10 PM, Ian Lance Taylor wrote: I'm sorry, I have nothing useful to suggest. I agree that that sounds like a stack overflow, which should in general be impossible with -fsplit-stack when using the gold linker. I don't know what is happening here. I've tested with massive recursion so I don't think that is the problem by itself. Hm...did you test with a lot of longjmps? I'm just curious about this comment in libgcc/generic-morestack.c: /* The stack segment that we think we are currently using. This will be correct in normal usage, but will be incorrect if an exception unwinds into a different stack segment or if longjmp jumps to a different stack segment. */ So, what happens if longjmp jumps to a different segment? Is the result undefined? Is it possible to detect such a jump? Dmitry
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote: > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote: > > +# Tell gcc to never replace conditional load with a non-conditional one > > +KBUILD_CFLAGS += $(call cc-option,--param allow-store-data-races=0) > > + > > Why do we not want: -fmemory-model=safe? And should we not at the very > least also disable packed-store-data-races? Note that the option does not exist, even though it is mentioned in the documentation. Marek
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote: > On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote: > > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote: > > > +# Tell gcc to never replace conditional load with a non-conditional one > > > +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0) > > > + > > > > Why do we not want: -fmemory-model=safe? And should we not at the very > > least also disable packed-store-data-races? > > Note that the option does not exist, even though it is mentioned in the > documentation. Urgh.. ok. Any word on the packed-store-data thing? pgpMWhQCfAGsj.pgp Description: PGP signature
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, Jun 10, 2014 at 05:13:29PM +0200, Peter Zijlstra wrote: > On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote: > > On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote: > > > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote: > > > > +# Tell gcc to never replace conditional load with a non-conditional one > > > > +KBUILD_CFLAGS += $(call cc-option,--param allow-store-data-races=0) > > > > + > > > > > > Why do we not want: -fmemory-model=safe? And should we not at the very > > > least also disable packed-store-data-races? > > > > Note that the option does not exist, even though it is mentioned in the > > documentation. > > Urgh.. ok. Any word on the packed-store-data thing? That is recognized, undocumented and never used in the compiler (not in 4.7 or any later release till now). Most of the spots in the compiler that could introduce data races were actually just changed, there is (already since 4.7) just a single conditional on the --param allow-store-data-races=X value. Jakub
Re: How can I get started as a GCC developer
On 10 June 2014 15:00, Basile Starynkevitch wrote: > Once all is done and signed (both by FSF and you or your employer), add your > real name in MAINTAINERS file. That's not needed until you get write access to the repository, but you can submit patches and contribute without write access.
Re: How can I get started as a GCC developer
On 10 June 2014 16:38, Jonathan Wakely wrote: > On 10 June 2014 15:00, Basile Starynkevitch wrote: >> Once all is done and signed (both by FSF and you or your employer), add your >> real name in MAINTAINERS file. > > That's not needed until you get write access to the repository, but > you can submit patches and contribute without write access. To clarify, I meant adding yourself to the MAINTAINERS file is not required to contribute (the signed copyright paperwork is needed).
Re: Weird startup issue with -fsplit-stack
On Tue, Jun 10, 2014 at 7:54 AM, Dmitry Antipov wrote: > On 05/21/2014 06:10 PM, Ian Lance Taylor wrote: > >> I'm sorry, I have nothing useful to suggest. I agree that that sounds >> like a stack overflow, which should in general be impossible with >> -fsplit-stack when using the gold linker. I don't know what is >> happening here. I've tested with massive recursion so I don't think >> that is the problem by itself. > > > Hm...did you test with a lot of longjmps? I'm just curious about this > comment in libgcc/generic-morestack.c: > > /* The stack segment that we think we are currently using. This will >be correct in normal usage, but will be incorrect if an exception >unwinds into a different stack segment or if longjmp jumps to a >different stack segment. */ > > So, what happens if longjmp jumps to a different segment? Is the result > undefined? Is it possible to detect such a jump? I have not tested with longjmps. The Go library uses a similar function (setcontext) but it updates the split stack context as it goes. You may be right: a program that splits the stack, then calls setjmp, then splits the stack, then calls longjmp, may mess up after the longjmp when returning from the stack split in the function that calls setjmp. This case can be detected in __generic_releasestack. If the current stack pointer is not in __morestack_current_segment, the code can call __generic_findstack to set __morestack_current_segment to the correct value. Ian
broken links?
Either something is broken on my web-access or the links on https://gcc.gnu.org/install/prerequisites.html pointing to ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the files anywhere else :( Thanks for all your efforts Michael Michael Hebenstreit Senior Cluster Architect Intel Corporation, MS: RR1-105/H14 Software and Services Group/DCE 4100 Sara Road Tel.: +1 505-794-3144 Rio Rancho, NM 87124 UNITED STATES E-mail: michael.hebenstr...@intel.com
Re: broken links?
On 10 June 2014 17:20, Hebenstreit, Michael wrote: > Either something is broken on my web-access or the links on > https://gcc.gnu.org/install/prerequisites.html pointing to > ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the files > anywhere else :( The FTP site and files are still there (I've just checked with lftp on the command-line) but clicking on the links in Firefox doesn't work (but does work in Webkit-based browsers). Pasting the ftp:// link directly into Firefox's location bar works. Maybe related to the recent move to HTTPS on gcc.gnu.org but it seems to be a Firefox problem, not gcc.gnu.org one.
Re: broken links?
On 10 June 2014 17:41, Jonathan Wakely wrote: > On 10 June 2014 17:20, Hebenstreit, Michael wrote: >> Either something is broken on my web-access or the links on >> https://gcc.gnu.org/install/prerequisites.html pointing to >> ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the files >> anywhere else :( > > The FTP site and files are still there (I've just checked with lftp on > the command-line) but clicking on the links in Firefox doesn't work > (but does work in Webkit-based browsers). Pasting the ftp:// link > directly into Firefox's location bar works. Hmm, and now clicking the links works. I wonder if someone just fixed something, or if Firefox just needed a kick.
RE: broken links?
you are right - must be a Firefox problem; I had no problem using wget, IE8 works as well Firefox is still redirected to https://gcc.gnu.org/pub/gcc/infrastructure/ though sorry for alarm thanks for reply Michael -Original Message- From: Jonathan Wakely [mailto:jwakely@gmail.com] Sent: Tuesday, June 10, 2014 10:43 AM To: Hebenstreit, Michael Cc: gcc@gcc.gnu.org Subject: Re: broken links? On 10 June 2014 17:41, Jonathan Wakely wrote: > On 10 June 2014 17:20, Hebenstreit, Michael wrote: >> Either something is broken on my web-access or the links on >> https://gcc.gnu.org/install/prerequisites.html pointing to >> ftp://gcc.gnu.org/pub/gcc/infrastructure/ are gone - can't find the >> files anywhere else :( > > The FTP site and files are still there (I've just checked with lftp on > the command-line) but clicking on the links in Firefox doesn't work > (but does work in Webkit-based browsers). Pasting the ftp:// link > directly into Firefox's location bar works. Hmm, and now clicking the links works. I wonder if someone just fixed something, or if Firefox just needed a kick.
Issue with sub-object __builtin_object_size
Hello, the following C++ test case: struct pollfd { int fd; short int events; short int revents; }; struct Pollfd : public pollfd { }; struct Pollfd myfd[10]; int test (void) { return __builtin_object_size ((struct pollfd *)myfd, 1); } ends up returning 8 from the "test" routine, not 80. In the real-world application this test case was extracted from, this causes a call: poll(myfd, count, 0); // 1 < count < 10 to fail with a "Buffer overflow detected" message at run-time when building with _FORTIFY_SOURCE = 2 against glibc. [ Here, there is no explicit cast, but it is implied by the prototype of the "poll" routine. ] (Note that in the real-world application, the derived struct Pollfd has some member functions to construct and pretty-print the structure, but has no additional data members.) >From the __builtin_object_size documentation, it's not immediately clear to me whether this is supposed to work or not: If the least significant bit is clear, objects are whole variables, if it is set, a closest surrounding subobject is considered the object a pointer points to. Is the presence of the above cast (explicit or implicit) supposed to modify the notion of "closest surrounding subobject"? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, Jun 10, 2014 at 6:23 AM, Jiri Kosina wrote: > We have been chasing a memory corruption bug, which turned out to be > caused by very old gcc (4.3.4), which happily turned conditional load into > a non-conditional one, and that broke correctness (the condition was met > only if lock was held) and corrupted memory. Just out of interest, can you point to the particular kernel code that caused this? I think that's more interesting than the example program you show - which I'm sure is really nice for gcc developers as an example, but from a kernel standpoint I think it's more important to show the particular problems this caused for the kernel? Linus
Re: Issue with sub-object __builtin_object_size
On Tue, Jun 10, 2014 at 07:37:50PM +0200, Ulrich Weigand wrote: > the following C++ test case: > > struct pollfd > { > int fd; > short int events; > short int revents; > }; > > struct Pollfd : public pollfd { }; > > struct Pollfd myfd[10]; > > int test (void) > { > return __builtin_object_size ((struct pollfd *)myfd, 1); > } > > ends up returning 8 from the "test" routine, not 80. The thing is that the C++ FE transforms this kind of cast to &((struct Pollfd *) &myfd)->D.2233 so for middle-end where __builtin_object_size is evaluated this is like: struct Pollfd { struct pollfd something; }; struct Pollfd myfd[10]; int test (void) { return __builtin_object_size (&myfd[0].something, 1); } and returning 8 in that case is completely correct. So the question is why instead of a simple cast it created ADDR_EXPR of a FIELD_DECL instead. Jakub
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, Jun 10, 2014 at 10:46 AM, Linus Torvalds wrote: > On Tue, Jun 10, 2014 at 6:23 AM, Jiri Kosina wrote: >> We have been chasing a memory corruption bug, which turned out to be >> caused by very old gcc (4.3.4), which happily turned conditional load into >> a non-conditional one, and that broke correctness (the condition was met >> only if lock was held) and corrupted memory. > > Just out of interest, can you point to the particular kernel code that > caused this? I think that's more interesting than the example program > you show - which I'm sure is really nice for gcc developers as an > example, but from a kernel standpoint I think it's more important to > show the particular problems this caused for the kernel? > Jiri, is there a workaround for compilers that don't support '--param allow-store-data-races=0'? For example: $ gcc-4.5 -O2 -o cond_store cond_store.c && ./cond_store Segmentation fault (core dumped) $ gcc-4.5 -O2 --param allow-store-data-races=0 -o cond_store cond_store.c && ./cond_store cc1: error: invalid parameter ‘allow-store-data-races’ $ gcc-4.5 -v Using built-in specs. COLLECT_GCC=gcc-4.5 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.4/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --libdir=/usr/lib --libexecdir=/usr/lib --program-suffix=-4.5 --enable-shared --enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit --disable-libstdcxx-pch --disable-multilib --disable-libgomp --disable-libmudflap --disable-libssp --enable-clocale=gnu --with-tune=generic --with-cloog --with-ppl --with-system-zlib Thread model: posix gcc version 4.5.4 (GCC)
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, 10 Jun 2014, Linus Torvalds wrote: > > We have been chasing a memory corruption bug, which turned out to be > > caused by very old gcc (4.3.4), which happily turned conditional load into > > a non-conditional one, and that broke correctness (the condition was met > > only if lock was held) and corrupted memory. > > Just out of interest, can you point to the particular kernel code that > caused this? I think that's more interesting than the example program > you show - which I'm sure is really nice for gcc developers as an > example, but from a kernel standpoint I think it's more important to > show the particular problems this caused for the kernel? Well, as I said, that was with gcc 4.3.4, and GCC people expressed themselves that that was a slightly different optimization. It made me nervous enough though to ask whether it's absolutely positively not going to happen with newer gccs, and the code snippet quoted in the original mail came back as a response. The code in question was out-of-tree printk-in-NMI (yeah, surprise suprise, once again) patch written by Petr Mladek, let me quote his comment from our internal bugzilla: === I have spent few days investigating inconsistent state of kernel ring buffer. It went out that it was caused by speculative store generated by gcc-4.3.4. The problem is in assembly generated for make_free_space(). The functions is called the following way: + vprintk_emit(); + log = MAIN_LOG; // with logbuf_lock or log = NMI_LOG; // with nmi_logbuf_lock cont_add(log, ...); + cont_flush(log, ...); + log_store(log, ...); + log_make_free_space(log, ...); If called with log = NMI_LOG then only nmi_log_* global variables are safe to modify but the generated code does store also into (main_)log_* global variables: : 55 push %rbp 89 f6 mov%esi,%esi 48 8b 05 03 99 51 01mov0x1519903(%rip),%rax # 82620868 44 8b 1d ec 98 51 01mov0x15198ec(%rip),%r11d # 82620858 8b 35 36 60 14 01 mov0x1146036(%rip),%esi # 8224cfa8 44 8b 35 33 60 14 01mov0x1146033(%rip),%r14d # 8224cfac 4c 8b 2d d0 98 51 01mov0x15198d0(%rip),%r13 # 82620850 4c 8b 25 11 61 14 01mov0x1146111(%rip),%r12 # 8224d098 49 89 c2mov%rax,%r10 48 21 c2and%rax,%rdx 48 8b 1d 0c 99 55 01mov0x155990c(%rip),%rbx # 826608a0 49 c1 ea 20 shr$0x20,%r10 48 89 55 d0 mov%rdx,-0x30(%rbp) 44 29 desub%r11d,%esi 45 29 d6sub%r10d,%r14d 4c 8b 0d 97 98 51 01mov0x1519897(%rip),%r9 # 82620840 eb 7e jmp81107029 [...] 85 ff test %edi,%edi # edi = 1 for NMI_LOG 4c 89 e8mov%r13,%rax 4c 89 camov%r9,%rdx 74 0a je 8110703d 8b 15 27 98 51 01 mov0x1519827(%rip),%edx # 82620860 48 8b 45 d0 mov-0x30(%rbp),%rax 48 39 c2cmp%rax,%rdx # end of loop 0f 84 da 00 00 00 je 81107120 [...] 85 ff test %edi,%edi # edi = 1 for NMI_LOG 4c 89 0d 17 97 51 01mov%r9,0x1519717(%rip)# 82620840 ^^ KABOOOM 74 35 je 81107160 It stores log_first_seq when edi == NMI_LOG. This instructions are used also when edi == MAIN_LOG but the store is done speculatively before the condition is decided. It is unsafe because we do not have "logbuf_lock" in NMI context and some other process migh modify "log_first_seq" in parallel. === I believe that the best course of action is both - building kernel (and anything multi-threaded, I guess) with that optimization turned off - persuade gcc folks to change the default for future releases Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH] tell gcc optimizer to never introduce new data races
On June 10, 2014 8:04:13 PM CEST, Steven Noonan wrote: >On Tue, Jun 10, 2014 at 10:46 AM, Linus Torvalds > wrote: >> On Tue, Jun 10, 2014 at 6:23 AM, Jiri Kosina wrote: >>> We have been chasing a memory corruption bug, which turned out to be >>> caused by very old gcc (4.3.4), which happily turned conditional >load into >>> a non-conditional one, and that broke correctness (the condition was >met >>> only if lock was held) and corrupted memory. >> >> Just out of interest, can you point to the particular kernel code >that >> caused this? I think that's more interesting than the example program >> you show - which I'm sure is really nice for gcc developers as an >> example, but from a kernel standpoint I think it's more important to >> show the particular problems this caused for the kernel? >> > >Jiri, is there a workaround for compilers that don't support '--param >allow-store-data-races=0'? For example: The optimization that purposely performs the undesired transform is loop store motion which is part of the tree loop invariant motion optimization. You can disable that with -fno-tree-loop-im. That the bug didn't appear with newer compilers was due to lucky decisions to not inline a particular function. Richard. >$ gcc-4.5 -O2 -o cond_store cond_store.c && ./cond_store >Segmentation fault (core dumped) > >$ gcc-4.5 -O2 --param allow-store-data-races=0 -o cond_store >cond_store.c && ./cond_store >cc1: error: invalid parameter ‘allow-store-data-races’ > >$ gcc-4.5 -v >Using built-in specs. >COLLECT_GCC=gcc-4.5 >COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.4/lto-wrapper >Target: x86_64-unknown-linux-gnu >Configured with: ../configure --prefix=/usr --mandir=/usr/share/man >--infodir=/usr/share/info --libdir=/usr/lib --libexecdir=/usr/lib >--program-suffix=-4.5 --enable-shared >--enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit >--disable-libstdcxx-pch --disable-multilib --disable-libgomp >--disable-libmudflap --disable-libssp --enable-clocale=gnu >--with-tune=generic --with-cloog --with-ppl --with-system-zlib >Thread model: posix >gcc version 4.5.4 (GCC)
RE: Best way to compute cost of a sequence of gimple stmt
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, June 10, 2014 5:16 PM > > In general this is impossible to do. I don't have a good answer on > how to determine whether (unaligned) load + bswap is faster than > doing sth else - but there is a very good chance that the original > code is even worse. For the unaligned load you can expect > an optimal code sequence to be generated - likewise for the bswap. > Now - if you want to do the best for the combination of both I'd > say you add support to the expr.c bitfield extraction code to do > the bswap on-the-fly and use TER to see that you are doing the > bswap on a memory source. Oh I see. Doing it there would mean instead of two independent operations you'd do the best combination possible, is that right? > > There is only two choices - disable unaligned-load + bswap on > SLOW_UNALIGNED_ACCESS targets or not. Doing sth more > fancy won't do the trick and isn't worth the trouble IMHO. There is some other reason to compute the cost that I didn't mention. For instance, you suggested to recognize partial load (+bswap). Quoting you: > unsigned foo (unsigned char *x) > { > return x[0] << 24 | x[2] << 8 | x[3]; > } > > ? We could do an unsigned int load from x and zero byte 3 > with an AND. Even with aligned access, the above might be slower if x[0] was already loaded previously and sits in a register. I'm tempted to use a simple heuristic such as comparing the number of loads before and after, adding one if the load is unaligned. So in the above example, supposing that there is some computation done around x[0] before the return line, we'd have 2 loads before Vs 2 x is unaligned and we would cancel the optimization. If x is aligned the optimization would proceed. Do you thing this approach is also too much trouble or would not work? Best regards, Thomas