Re: suspect code in fold-const.c
On Fri, 15 Nov 2013, Kenneth Zadeck wrote: > > This patch fixes a number of places where the mode bitsize had been used but > the mode precision should have been used. The tree level is somewhat sloppy > about this - some places use the mode precision and some use the mode bitsize. > It seems that the mode precision is the proper choice since it does the > correct thing if the underlying mode is a partial int mode. > > This code has been tested on x86-64 with no regressions. Ok to commit? Ok. Thanks, Richard. > > > 2013-11-15 Kenneth Zadeck > * tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to > GET_MODE_PRECISION. > * fold-const.c (fold_single_bit_test_into_sign_test, > fold_binary_loc): Change GET_MODE_BITSIZE to > GET_MODE_PRECISION. > > Kenny > > > On 11/15/2013 08:32 AM, Kenneth Zadeck wrote: > > On 11/15/2013 04:07 AM, Eric Botcazou wrote: > > > > this code from fold-const.c starts on line 13811. > > > > > > > > else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi > > > >&& TREE_INT_CST_LOW (arg1) == signed_max_lo > > > >&& TYPE_UNSIGNED (arg1_type) > > > >/* We will flip the signedness of the comparison operator > > > > associated with the mode of arg1, so the sign bit is > > > > specified by this mode. Check that arg1 is the signed > > > > max associated with this sign bit. */ > > > >&& width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type)) > > > >/* signed_type does not work on pointer types. */ > > > >&& INTEGRAL_TYPE_P (arg1_type)) > > > with width defined as: > > > > > > unsigned int width = TYPE_PRECISION (arg1_type); > > > > > > > it seems that the check on bitsize should really be a check on the > > > > precision of the variable. If this seems right, i will correct this on > > > > the trunk and make the appropriate changes to the wide-int branch. > > > Do you mean > > > > > >&& width == GET_MODE_PRECISION (TYPE_MODE (arg1_type)) > > > > > > instead? If so, that would probably make sense, but there are a few other > > > places with the same TYPE_PRECISION/GET_MODE_BITSIZE check, in particular > > > the > > > very similar transformation done in fold_single_bit_test_into_sign_test. > > > > > yes. I understand the need to do this check on the mode rather than the > > precision of the type itself. > > The point is that if the mode under this type happens to be a partial int > > mode, then that sign bit may not even be where the bitsize points to. > > > > However, having just done a few greps, it looks like this case was just the > > one that i found while doing the wide-int work, there may be several more of > > these cases. Just in fold-const, there are a couple in fold_binary_loc. > > The one in tree.c:int_fits_type_p looks particularly wrong. > > > > I think that there are also several in tree-vect-patterns.c. > > > > Kenny > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
Re: memset zero bytes at NULL - isolate-erroneous-paths
On Mon, Nov 18, 2013 at 8:11 AM, Florian Weimer wrote: > * Jeff Law: > >>> Is this new in C11? Does it apply to functions such as strnlen as well? > >> No, it's C99 I think. There was a clarification which came in after >> C99 which clarified that even if the length is zero, the pointers must >> still be valid. > > Okay, I found the language in sections 7.1.4 and 7.21.1 (thanks Marc). > > This is a bit unfortunate because it interoperates poorly with > std::vector::data(), which can return a null pointer if the vector > is empty. I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI perspective. Jeff, is there an easy way to avoid this? Testcase: void fn (void *addr, int a) { if (a == 0) addr = (void *)0; __builtin_memset (addr, '\0', a); } I wonder where in isolate-paths you check for builtins at all? ah, it's probably from the nonnull attribute on memset. Which also means that trying to catch this case reliably isn't going to work (you cannot prove the program has len == 0 in this case and conditionally not trapping would somewhat defeat the purpose of isolating this path) Richard.
Re: memset zero bytes at NULL - isolate-erroneous-paths
On Mon, Nov 18, 2013 at 12:08:27PM +0100, Richard Biener wrote: > I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI > perspective. Jeff, is there an easy way to avoid this? Testcase: > > void fn (void *addr, int a) > { > if (a == 0) > addr = (void *)0; > __builtin_memset (addr, '\0', a); > } > > I wonder where in isolate-paths you check for builtins at all? ah, > it's probably from the nonnull attribute on memset. Which also > means that trying to catch this case reliably isn't going to work > (you cannot prove the program has len == 0 in this case and > conditionally not trapping would somewhat defeat the purpose > of isolating this path) Well, if some function has nonnull attribute on some argument, then that argument shouldn't have NULL value even if some length argument is 0. In the case of memset (and various other functions) C99 clearly says that memset (NULL, 0, 0); is invalid, if there are some functions that have a pointer, length argument pair and for length 0 pointer is allowed to be NULL, then those functions shouldn't have nonnull attribute. Jakub
Re: PLUGIN_HEADER_FILE event for tracing of header inclusions.
On Sun, 17 Nov 2013, Basile Starynkevitch wrote: > What would be the good way to add such a plugin event to GCC 4.9? See the cpp_callbacks structure, used to make diagnostics go through GCC's diagnostics machinery, for example. I'm not clear why the existing callbacks (in particular the file_change one) wouldn't be enough. -- Joseph S. Myers jos...@codesourcery.com
Re: memset zero bytes at NULL - isolate-erroneous-paths
On 11/18/13 04:08, Richard Biener wrote: I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI perspective. Jeff, is there an easy way to avoid this? Testcase: void fn (void *addr, int a) { if (a == 0) addr = (void *)0; __builtin_memset (addr, '\0', a); } I wonder where in isolate-paths you check for builtins at all? ah, it's probably from the nonnull attribute on memset. Which also means that trying to catch this case reliably isn't going to work (you cannot prove the program has len == 0 in this case and conditionally not trapping would somewhat defeat the purpose of isolating this path) It's the nonnull attribute on memset. One thought would split the optimization into two parts. One which transforms *0 and the other which transforms calls/returns. Have the former enabled by -O2 the latter off for now. For the next release, both enabled by default at -O2. Add distinct warnings for both cases, possibly enabled by -Wall (depends on the noise). That gets most of the benefit now and gives a way for users to identify brokenness in their code. Sadly, this feels a lot like -fstrict-aliasing did eons ago. Aggressive TBAA exposed all kinds problems and it took a lot of user (re)education to get them fixed. jeff
Re: memset zero bytes at NULL - isolate-erroneous-paths
On Mon, Nov 18, 2013 at 07:24:46AM -0700, Jeff Law wrote: > On 11/18/13 04:08, Richard Biener wrote: > >>I'd say that turning memset (0, '\0', 0) into a trap is bad from a QOI > >perspective. Jeff, is there an easy way to avoid this? Testcase: > > > >void fn (void *addr, int a) > >{ > > if (a == 0) > > addr = (void *)0; > > __builtin_memset (addr, '\0', a); > >} > > > >I wonder where in isolate-paths you check for builtins at all? ah, > >it's probably from the nonnull attribute on memset. Which also > >means that trying to catch this case reliably isn't going to work > >(you cannot prove the program has len == 0 in this case and > >conditionally not trapping would somewhat defeat the purpose > >of isolating this path) > It's the nonnull attribute on memset. One thought would split the > optimization into two parts. One which transforms *0 and the other > which transforms calls/returns. Have the former enabled by -O2 the > latter off for now. For the next release, both enabled by default at > -O2. > > Add distinct warnings for both cases, possibly enabled by -Wall > (depends on the noise). > > That gets most of the benefit now and gives a way for users to > identify brokenness in their code. > > Sadly, this feels a lot like -fstrict-aliasing did eons ago. > Aggressive TBAA exposed all kinds problems and it took a lot of user > (re)education to get them fixed. > You risk that when user tries to use isolate paths only to find spurious errors like these that he will not use it even in cases where it helps. One way would be remove nonnull attribute in mem* functions. Note that c standard also disallows. char *m = malloc (32); if (!m) return 0; ... int pos = 32; return memchr (m, 42, 32 - pos); On other hand if we could break invalid programs with impunity one could make memchr/memcmp a cycle faster by dropping an initial n == 0 check.
OpenACC or OpenMP 4.0 target directives
Hello, I'm doing master at Polytechnic University of Catalonia, BarcelonaTech and I started to my master thesis. My topic is code generation for hardware accelerator into OmpSs. OmpSs is being developed by Barcelona Supercomputer Center, and it has a runtime for gpu. It can manage kernel invocation, multi-gpu, data transfer, asyncronus kernel invocation and so on. That's why i'm using OmpSs. Because i want to only focus code generation and optimizations. But i'm so new for this work. Now i support that "target", "teams", "distribute", "distribute parallel for" directives. However of course i can generate a so naive kernel :( I'm looking for optimization techniques. I came across a news about gcc will support OpenACC/OpenMP target directive. How can i download this version? Moreover i'm going to ask question about optimization. Which optimization techniques have you applied? Do you have a any suggestion for me for this thesis? (papers, algorithms and so on) Regards, Güray Özen ~grypp
Re: PLUGIN_HEADER_FILE event for tracing of header inclusions.
On Mon, 2013-11-18 at 13:17 +, Joseph S. Myers wrote: > On Sun, 17 Nov 2013, Basile Starynkevitch wrote: > > > What would be the good way to add such a plugin event to GCC 4.9? > > See the cpp_callbacks structure, used to make diagnostics go through GCC's > diagnostics machinery, for example. I'm not clear why the existing > callbacks (in particular the file_change one) wouldn't be enough. Thanks for your reply (and your interest to my suggestion). I am not sure to understand what you suggest (because I see several ways to understand it). The first would be to add inside file libcpp/directives.c in its function _cpp_do_file_change (e.g. after line 1044 the statement /* Signal to plugins that a header file is included. */ invoke_plugin_callbacks (PLUGIN_HEADER_FILE, ORDINARY_MAP_FILE_NAME (map)); The second would be to add a new way to invoke plugin callbacks which would be to add the file libcpp/internals.h to the list of plugin exported headers. At the very least, this means to add into the PLUGIN_HEADERS variable of gcc/Makefile.in several files from libcpp/includes/ and possibley even libcpp/internals.h I find that the second way introduce a policy change w.r.t plugins. Up to now, we tried hard to define the way plugins interact with GCC thru the plugins.h and plugins.def file, but it looks that you want yet another way. I strongly prefer adding a new plugin event (PLUGIN_HEADER_FILE) and just use it (and document it) to adding a new way of having plugins modify the behavior of GCC (thru our various hooks, in that case the file_change callback). What do you practically suggest? Don't you feel that adding a new plugin event (PLUGIN_HEADER_FILE) to plugins.def and adding a single call to invoke_plugin_callbacks much lighter and simpler than having the plugin need several additional files (into PLUGIN_HEADERS make variable) etc...? Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: PLUGIN_HEADER_FILE event for tracing of header inclusions.
On Mon, 18 Nov 2013, Basile Starynkevitch wrote: > On Mon, 2013-11-18 at 13:17 +, Joseph S. Myers wrote: > > On Sun, 17 Nov 2013, Basile Starynkevitch wrote: > > > > > What would be the good way to add such a plugin event to GCC 4.9? > > > > See the cpp_callbacks structure, used to make diagnostics go through GCC's > > diagnostics machinery, for example. I'm not clear why the existing > > callbacks (in particular the file_change one) wouldn't be enough. > > > Thanks for your reply (and your interest to my suggestion). > > I am not sure to understand what you suggest (because I see several ways > to understand it). I'm suggesting: * You probably don't need to change libcpp at all. Instead, insert your call to invoke_plugin_callbacks inside c-opts.c:cb_file_change. * But if for some reason cb_file_change isn't called at the right time, then create a new function, still inside the c-family code, which calls invoke_plugin_callbacks, and a corresponding cpp_callbacks entry for it, and make one of the c-opts.c functions that sets callbacks fill in the new entry. The key point is that both of those keep libcpp self-contained - you don't need to include plugin headers inside libcpp, because the libcpp client is responsible for registering callbacks with libcpp's callback mechanism, and it's the responsibility of such a callback to call plugins if the libcpp client (GCC in this case) has a plugin mechanism such that a plugin should be called from the callback. > What do you practically suggest? Don't you feel that adding a new plugin > event (PLUGIN_HEADER_FILE) to plugins.def and adding a single call to > invoke_plugin_callbacks much lighter and simpler than having the plugin The point is that this call needs to be in GCC, the client of the libcpp library, not directly in libcpp, the library. -- Joseph S. Myers jos...@codesourcery.com
Re: Frame pointer, bug or feature? (x86)
What's the difference in the C vs. the C++ spec that makes it a VLA in GNU-C? On Fri, Nov 15, 2013 at 10:07 AM, Andrew Pinski wrote: > On Fri, Nov 15, 2013 at 9:31 AM, Hendrik Greving > wrote: >> In the below test case, "CASE_A" actually uses a frame pointer, while >> !CASE_A doesn't. I can't imagine this is a feature, this is a bug, >> isn't it? Is there any reason the compiler couldn't know that >> loop_blocks never needs a dynamic stack size? > > > Both a feature and a bug. In the CASE_A case (with GNU C) it is a VLA > while in the !CASE_A case (or in either case with C++), it is a normal > array definition. The compiler could have converted the VLA to a > normal array but does not depending on the size of the array. > > Thanks, > Andrew Pinski > >> >> #include >> #include >> >> #define MY_DEFINE 100 >> #define CASE_A 1 >> >> extern init(int (*a)[]); >> >> int >> foo() >> { >> #if CASE_A >> const int max = MY_DEFINE * 2; >> int loop_blocks[max]; >> #else >> int loop_blocks[MY_DEFINE * 2]; >> #endif >> init(&loop_blocks); >> return loop_blocks[5]; >> } >> >> int >> main() >> { >> int i = foo(); >> printf("is is %d\n", i); >> } >> >> Thanks, >> Hendrik Greving
Re: Frame pointer, bug or feature? (x86)
On Mon, Nov 18, 2013 at 10:47 AM, Hendrik Greving wrote: > What's the difference in the C vs. the C++ spec that makes it a VLA in GNU-C? max in C++ is considered an integer constant expression while in C it is just an expression. Thanks, Andrew Pinski > > On Fri, Nov 15, 2013 at 10:07 AM, Andrew Pinski wrote: >> On Fri, Nov 15, 2013 at 9:31 AM, Hendrik Greving >> wrote: >>> In the below test case, "CASE_A" actually uses a frame pointer, while >>> !CASE_A doesn't. I can't imagine this is a feature, this is a bug, >>> isn't it? Is there any reason the compiler couldn't know that >>> loop_blocks never needs a dynamic stack size? >> >> >> Both a feature and a bug. In the CASE_A case (with GNU C) it is a VLA >> while in the !CASE_A case (or in either case with C++), it is a normal >> array definition. The compiler could have converted the VLA to a >> normal array but does not depending on the size of the array. >> >> Thanks, >> Andrew Pinski >> >>> >>> #include >>> #include >>> >>> #define MY_DEFINE 100 >>> #define CASE_A 1 >>> >>> extern init(int (*a)[]); >>> >>> int >>> foo() >>> { >>> #if CASE_A >>> const int max = MY_DEFINE * 2; >>> int loop_blocks[max]; >>> #else >>> int loop_blocks[MY_DEFINE * 2]; >>> #endif >>> init(&loop_blocks); >>> return loop_blocks[5]; >>> } >>> >>> int >>> main() >>> { >>> int i = foo(); >>> printf("is is %d\n", i); >>> } >>> >>> Thanks, >>> Hendrik Greving
RFC: Use 32-byte PLT to preserve bound registers
Here is a proposal to use 32-byte PLT to preserve bound registers. Any comments? BTW, we are working on another proposal to use a second PLT section with 8 byte or 16 byte memory overhead, instead of 24 byte overhead. -- H.J. --- Intel MPX: http://software.intel.com/sites/default/files/319433-015.pdf introduces 4 bound registers, which will be used for parameter passing in x86-64. Bound registers are cleared by branch instructions. Branch instructions with BND prefix will keep bound register contents. This leads to 2 requirements to 64-bit MPX run-time: 1. Dynamic linker (ld.so) should save and restore bound registers during symbol lookup. 2. Change the current 16-byte PLT0: ff 35 08 00 00 00pushq GOT+8(%rip) ff 25 00 10 00jmpq *GOT+16(%rip) 0f 1f 40 00nopl 0x0(%rax) and 16-byte PLT1: ff 25 00 00 00 00jmpq *name@GOTPCREL(%rip) 68 00 00 00 00 pushq $index e9 00 00 00 00 jmpq PLT0 which clear bound registers, to preserve bound registers. We use 2 new relocations: #define R_X86_64_PC32_BND 39 /* PC relative 32 bit signed with BND prefix */ #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */ to mark branch instructions with BND prefix. When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, it switches to a different PLT0: ff 35 08 00 00 00pushq GOT+8(%rip) f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip) 0f 1f 00nopl (%rax) to preserve bound registers for symbol lookup. For a symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use a 32-byte PLT1: f2 ff 25 00 00 00 00bnd jmpq *name@GOTPCREL(%rip) 68 00 00 00 00pushq $index f2 e9 00 00 00 00 bnd jmpq PLT0 0f 1f 80 00 00 00 00nopl 0(%rax) 0f 1f 80 00 00 00 00nopl 0(%rax) Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing the corresponding the pushq offset with GOT[1] + (GOT offset - &GOT[3]) * 2 It depends on that each pushq is 16-byte apart and GOT entry is 8 byte. To support prelink, each 16-byte block in PLT must have an 8-byte entry in GOT. Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1. Then we can undo prelink by computing the corresponding the pushq offset with pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2 pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0 For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, this approach increases PLT size by 16 bytes and GOT size by 8 bytes. That is 24 bytes in total. Pros: No additional sections are needed. Cons: 24-byte memory overhead for each symbol with BND relocation.
Re: RFC: Use 32-byte PLT to preserve bound registers
There is a typo in pushq offset computation. It should be pushq_offset += ((unsigned char *) pushq_offset)[-6] == 0xf2 ? 1 : 0 instead of pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0 H.J. On Mon, Nov 18, 2013 at 11:03 AM, H.J. Lu wrote: > Here is a proposal to use 32-byte PLT to preserve bound registers. > Any comments? > > BTW, we are working on another proposal to use a second PLT > section with 8 byte or 16 byte memory overhead, instead of > 24 byte overhead. > > -- > H.J. > --- > Intel MPX: > > http://software.intel.com/sites/default/files/319433-015.pdf > > introduces 4 bound registers, which will be used for parameter passing > in x86-64. Bound registers are cleared by branch instructions. Branch > instructions with BND prefix will keep bound register contents. This leads > to 2 requirements to 64-bit MPX run-time: > > 1. Dynamic linker (ld.so) should save and restore bound registers during > symbol lookup. > 2. Change the current 16-byte PLT0: > > ff 35 08 00 00 00pushq GOT+8(%rip) > ff 25 00 10 00jmpq *GOT+16(%rip) > 0f 1f 40 00nopl 0x0(%rax) > > and 16-byte PLT1: > > ff 25 00 00 00 00jmpq *name@GOTPCREL(%rip) > 68 00 00 00 00 pushq $index > e9 00 00 00 00 jmpq PLT0 > > which clear bound registers, to preserve bound registers. > > We use 2 new relocations: > > #define R_X86_64_PC32_BND 39 /* PC relative 32 bit signed with BND prefix */ > #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */ > > to mark branch instructions with BND prefix. > > When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, > it switches to a different PLT0: > > ff 35 08 00 00 00pushq GOT+8(%rip) > f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip) > 0f 1f 00nopl (%rax) > > to preserve bound registers for symbol lookup. For a symbol with > R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use > a 32-byte PLT1: > > f2 ff 25 00 00 00 00bnd jmpq *name@GOTPCREL(%rip) > 68 00 00 00 00pushq $index > f2 e9 00 00 00 00 bnd jmpq PLT0 > 0f 1f 80 00 00 00 00nopl 0(%rax) > 0f 1f 80 00 00 00 00nopl 0(%rax) > > Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and > GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing > the corresponding the pushq offset with > > GOT[1] + (GOT offset - &GOT[3]) * 2 > > It depends on that each pushq is 16-byte apart and GOT entry is 8 byte. > To support prelink, each 16-byte block in PLT must have an 8-byte entry > in GOT. Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1. > Then we can undo prelink by computing the corresponding the pushq offset > with > > pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2 > pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0 > > For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND > relocations, this approach increases PLT size by 16 bytes and > GOT size by 8 bytes. That is 24 bytes in total. > > Pros: No additional sections are needed. > Cons: 24-byte memory overhead for each symbol with BND relocation. -- H.J.
Re: Frame pointer, bug or feature? (x86)
Interesting, I just read up on it and I didn't know that. Thanks. Is it correct to say though that it is a missing optimization and frame_pointer_needed shouldn't evaluate to true? On Mon, Nov 18, 2013 at 10:55 AM, Andrew Pinski wrote: > On Mon, Nov 18, 2013 at 10:47 AM, Hendrik Greving > wrote: >> What's the difference in the C vs. the C++ spec that makes it a VLA in GNU-C? > > > max in C++ is considered an integer constant expression while in C it > is just an expression. > > Thanks, > Andrew Pinski > >> >> On Fri, Nov 15, 2013 at 10:07 AM, Andrew Pinski wrote: >>> On Fri, Nov 15, 2013 at 9:31 AM, Hendrik Greving >>> wrote: In the below test case, "CASE_A" actually uses a frame pointer, while !CASE_A doesn't. I can't imagine this is a feature, this is a bug, isn't it? Is there any reason the compiler couldn't know that loop_blocks never needs a dynamic stack size? >>> >>> >>> Both a feature and a bug. In the CASE_A case (with GNU C) it is a VLA >>> while in the !CASE_A case (or in either case with C++), it is a normal >>> array definition. The compiler could have converted the VLA to a >>> normal array but does not depending on the size of the array. >>> >>> Thanks, >>> Andrew Pinski >>> #include #include #define MY_DEFINE 100 #define CASE_A 1 extern init(int (*a)[]); int foo() { #if CASE_A const int max = MY_DEFINE * 2; int loop_blocks[max]; #else int loop_blocks[MY_DEFINE * 2]; #endif init(&loop_blocks); return loop_blocks[5]; } int main() { int i = foo(); printf("is is %d\n", i); } Thanks, Hendrik Greving
Re: OpenACC or OpenMP 4.0 target directives
Güray Özen wrote: I came across a news about gcc will support OpenACC/OpenMP target directive. How can i download this version? Well, the support is at an early stage, targetting several different backends. The work is done by several teams and, hence, not always very well coordinated. I think over the next months, it will improve as bits get merged into a common branch. Some first steps to OpenACC support can be found in the GOMP-4_0-branch and in the openacc-1_0-branch branch. The GOMP-4_0-branch bits aren't sufficient for offloading, yet. To my knowledge, the only publicly available implementation, which allows offloading is the openacc-1_0-branch, cf. http://gcc.gnu.org/ml/gcc/2013-10/msg9.html To try it, download either of the two branches and build GCC yourself; see bottom of http://gcc.gnu.org/wiki/GFortranBinaries#FromSource Moreover i'm going to ask question about optimization. I think the first step is to actually get it working. Otherwise, the normal compiler optimizations are applied also to the target sections. I could imagine that there will also be some specific optimizations, e.g. with regards to copy/copyin/-out by avoiding unnecessary data transfers; however, I don't know whether such an optimization is already done in any of the branches. Tobias
Re: Frame pointer, bug or feature? (x86)
On Mon, Nov 18, 2013 at 11:22:22AM -0800, Hendrik Greving wrote: > Interesting, I just read up on it and I didn't know that. Thanks. Is > it correct to say though that it is a missing optimization and > frame_pointer_needed shouldn't evaluate to true? Certainly not unconditionally. It depends on the size and in which scope it is declared. If user meant to use a VLA and compiler optimizes it into non-VLA, then it isn't deallocated at the end of it's scope, so if it e.g. is very large or there are many of those, the optimization can break valid programs (especially if it's scope isn't the function scope but some smaller scope). Jakub
Re: Frame pointer, bug or feature? (x86)
Hmm don't VLA's obey the same lifetime rules as regular automatic arrays on the stack? On Mon, Nov 18, 2013 at 11:48 AM, Jakub Jelinek wrote: > On Mon, Nov 18, 2013 at 11:22:22AM -0800, Hendrik Greving wrote: >> Interesting, I just read up on it and I didn't know that. Thanks. Is >> it correct to say though that it is a missing optimization and >> frame_pointer_needed shouldn't evaluate to true? > > Certainly not unconditionally. It depends on the size and in which scope > it is declared. If user meant to use a VLA and compiler optimizes it into > non-VLA, then it isn't deallocated at the end of it's scope, so if it e.g. > is very large or there are many of those, the optimization can break valid > programs (especially if it's scope isn't the function scope but some smaller > scope). > > Jakub
Re: Frame pointer, bug or feature? (x86)
On Mon, Nov 18, 2013 at 12:43:50PM -0800, Hendrik Greving wrote: > Hmm don't VLA's obey the same lifetime rules as regular automatic > arrays on the stack? In the languages yes, in GCC no. There is code to determine possibilities of sharing some stack space between variables that can't be used at the same time, but all the stack space for addressable automatic variables is typically allocated in function prologue and deallocated in the epilogue. So, if you have say: extern void baz (char *); __attribute__((noinline)) void bar (void) { char buf[7 * 1024 * 1024]; baz (buf); } void foo (void) { bar (); { const int length = 5 * 1024 * 1024; char buf[length]; baz (buf); } bar (); } and say typical Linux stack limit of 8-10MB, then if baz function (nor anything it calls) doesn't need much stack space, nor foo callers, then if buf[length] is a VLA, it will probably work just fine, if GCC decided to optimize it into char buf[5 * 1024 * 1024]; instead, it would likely fail. Jakub
Re: Frame pointer, bug or feature? (x86)
I see what you're saying. You mean because the VLA stack space can be dynamically "free'd" right away, as opposed to be there until the epilogue. That is true :( Is still seems odd when just looking at it. It's hard to imagine somebody would actually code myarray[const_thousand_var] as opposed to myarray[1000] with the intention to control stack allocation... thanks though On Mon, Nov 18, 2013 at 12:54 PM, Jakub Jelinek wrote: > On Mon, Nov 18, 2013 at 12:43:50PM -0800, Hendrik Greving wrote: >> Hmm don't VLA's obey the same lifetime rules as regular automatic >> arrays on the stack? > > In the languages yes, in GCC no. There is code to determine possibilities > of sharing some stack space between variables that can't be used at the same > time, but all the stack space for addressable automatic variables is > typically allocated in function prologue and deallocated in the epilogue. > > So, if you have say: > extern void baz (char *); > > __attribute__((noinline)) void > bar (void) > { > char buf[7 * 1024 * 1024]; > baz (buf); > } > > void > foo (void) > { > bar (); > { > const int length = 5 * 1024 * 1024; > char buf[length]; > baz (buf); > } > bar (); > } > > and say typical Linux stack limit of 8-10MB, then if baz function (nor > anything it calls) doesn't need much stack space, nor foo callers, then > if buf[length] is a VLA, it will probably work just fine, if GCC decided > to optimize it into char buf[5 * 1024 * 1024]; instead, it would likely > fail. > > Jakub
Re: suspect code in fold-const.c
committed as revision 204987. thanks kenny On 11/18/2013 05:38 AM, Richard Biener wrote: On Fri, 15 Nov 2013, Kenneth Zadeck wrote: This patch fixes a number of places where the mode bitsize had been used but the mode precision should have been used. The tree level is somewhat sloppy about this - some places use the mode precision and some use the mode bitsize. It seems that the mode precision is the proper choice since it does the correct thing if the underlying mode is a partial int mode. This code has been tested on x86-64 with no regressions. Ok to commit? Ok. Thanks, Richard. 2013-11-15 Kenneth Zadeck * tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to GET_MODE_PRECISION. * fold-const.c (fold_single_bit_test_into_sign_test, fold_binary_loc): Change GET_MODE_BITSIZE to GET_MODE_PRECISION. Kenny On 11/15/2013 08:32 AM, Kenneth Zadeck wrote: On 11/15/2013 04:07 AM, Eric Botcazou wrote: this code from fold-const.c starts on line 13811. else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi && TREE_INT_CST_LOW (arg1) == signed_max_lo && TYPE_UNSIGNED (arg1_type) /* We will flip the signedness of the comparison operator associated with the mode of arg1, so the sign bit is specified by this mode. Check that arg1 is the signed max associated with this sign bit. */ && width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type)) /* signed_type does not work on pointer types. */ && INTEGRAL_TYPE_P (arg1_type)) with width defined as: unsigned int width = TYPE_PRECISION (arg1_type); it seems that the check on bitsize should really be a check on the precision of the variable. If this seems right, i will correct this on the trunk and make the appropriate changes to the wide-int branch. Do you mean && width == GET_MODE_PRECISION (TYPE_MODE (arg1_type)) instead? If so, that would probably make sense, but there are a few other places with the same TYPE_PRECISION/GET_MODE_BITSIZE check, in particular the very similar transformation done in fold_single_bit_test_into_sign_test. yes. I understand the need to do this check on the mode rather than the precision of the type itself. The point is that if the mode under this type happens to be a partial int mode, then that sign bit may not even be where the bitsize points to. However, having just done a few greps, it looks like this case was just the one that i found while doing the wide-int work, there may be several more of these cases. Just in fold-const, there are a couple in fold_binary_loc. The one in tree.c:int_fits_type_p looks particularly wrong. I think that there are also several in tree-vect-patterns.c. Kenny Index: gcc/tree.c === --- gcc/tree.c (revision 204986) +++ gcc/tree.c (working copy) @@ -8629,7 +8629,7 @@ retry: /* Third, unsigned integers with top bit set never fit signed types. */ if (! TYPE_UNSIGNED (type) && unsc) { - int prec = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (c))) - 1; + int prec = GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (c))) - 1; if (prec < HOST_BITS_PER_WIDE_INT) { if (unsigned HOST_WIDE_INT) 1) << prec) & dc.low) != 0) Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 204986) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2013-11-18 Kenneth Zadeck + + * tree.c (int_fits_type_p): Change GET_MODE_BITSIZE to + GET_MODE_PRECISION. + * fold-const.c (fold_single_bit_test_into_sign_test) + (fold_binary_loc): Change GET_MODE_BITSIZE to + GET_MODE_PRECISION. + 2013-11-18 Teresa Johnson * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 204986) +++ gcc/fold-const.c (working copy) @@ -6593,7 +6593,7 @@ fold_single_bit_test_into_sign_test (loc /* This is only a win if casting to a signed type is cheap, i.e. when arg00's type is not a partial mode. */ && TYPE_PRECISION (TREE_TYPE (arg00)) - == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg00 + == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg00 { tree stype = signed_type_for (TREE_TYPE (arg00)); return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR, @@ -12049,7 +12049,7 @@ fold_binary_loc (location_t loc, zerobits = unsigned HOST_WIDE_INT) 1) << shiftc) - 1); else if (TREE_CODE (arg0) == RSHIFT_EXPR && TYPE_PRECISION (TREE_TYPE (arg0)) - == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (arg0 + == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg0 { prec = TYPE_PRECISION (TREE_TYPE (arg0)); tree arg00 = TREE_OPERAND (arg0, 0); @@ -12060,7 +12060,7 @@ fold_binary_loc (location_t loc