Re: [PATCH] Fix up cmove expansion (PR target/58864)
Jakub Jelinek wrote: >Hi! > >The following testcase ICEs because expand_cond_expr_using_cmove >calls emit_conditional_move (which calls do_pending_stack_adjust >under some circumstances), but when that fails, just removes all the >insns generated by emit_conditional_move (and perhaps some earlier ones >too), thus it removes also the stack adjustment. > >Apparently 2 similar places were fixing it by just calling >do_pending_stack_adjust () first just in case, some other places >had (most likely) the same bug as this function. > >Rather than adding do_pending_stack_adjust () in all the places, >especially >when it isn't clear whether emit_conditional_move will be called at all >and >whether it will actually do do_pending_stack_adjust (), I chose to add >two new functions to save/restore the pending stack adjustment state, >so that when instruction sequence is thrown away (either by doing >start_sequence/end_sequence around it and not emitting it, or >delete_insns_since) the state can be restored, and have changed all the >places that IMHO need it for emit_conditional_move. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for >trunk/4.8? The idea is good but I'd like to see a struct rather than an array for the storage. Thanks, Richard. >2013-11-29 Jakub Jelinek > > PR target/58864 > * dojump.c (save_pending_stack_adjust, restore_pending_stack_adjust): > New functions. > * expr.h (save_pending_stack_adjust, restore_pending_stack_adjust): > New prototypes. > * expr.c (expand_cond_expr_using_cmove): Use it. > (expand_expr_real_2): Use it instead of unconditional > do_pending_stack_adjust. > * optabs.c (expand_doubleword_shift): Use it. > * expmed.c (expand_sdiv_pow2): Use it instead of unconditional > do_pending_stack_adjust. > (emit_store_flag): Use it. > > * g++.dg/opt/pr58864.C: New test. > >--- gcc/expr.c.jj 2013-11-27 18:02:46.0 +0100 >+++ gcc/expr.c 2013-11-29 14:35:12.234808484 +0100 >@@ -7951,6 +7951,9 @@ expand_cond_expr_using_cmove (tree treeo > else > temp = assign_temp (type, 0, 1); > >+ int save[2]; >+ save_pending_stack_adjust (save); >+ > start_sequence (); > expand_operands (treeop1, treeop2, > temp, &op1, &op2, EXPAND_NORMAL); >@@ -8009,6 +8012,7 @@ expand_cond_expr_using_cmove (tree treeo > /* Otherwise discard the sequence and fall back to code with > branches. */ > end_sequence (); >+ restore_pending_stack_adjust (save); > #endif > return NULL_RTX; > } >@@ -8789,12 +8793,9 @@ expand_expr_real_2 (sepops ops, rtx targ > if (can_conditionally_move_p (mode)) > { > rtx insn; >+ int save[2]; > >- /* ??? Same problem as in expmed.c: emit_conditional_move >- forces a stack adjustment via compare_from_rtx, and we >- lose the stack adjustment if the sequence we are about >- to create is discarded. */ >- do_pending_stack_adjust (); >+ save_pending_stack_adjust (save); > > start_sequence (); > >@@ -8817,6 +8818,7 @@ expand_expr_real_2 (sepops ops, rtx targ > /* Otherwise discard the sequence and fall back to code with > branches. */ > end_sequence (); >+ restore_pending_stack_adjust (save); > } > #endif > if (target != op0) >--- gcc/optabs.c.jj2013-11-19 21:56:22.0 +0100 >+++ gcc/optabs.c 2013-11-29 14:39:15.963513835 +0100 >@@ -1079,17 +1079,20 @@ expand_doubleword_shift (enum machine_mo > > #ifdef HAVE_conditional_move > /* Try using conditional moves to generate straight-line code. */ >- { >-rtx start = get_last_insn (); >-if (expand_doubleword_shift_condmove (op1_mode, binoptab, >-cmp_code, cmp1, cmp2, >-outof_input, into_input, >-op1, superword_op1, >-outof_target, into_target, >-unsignedp, methods, shift_mask)) >- return true; >-delete_insns_since (start); >- } >+ int save[2]; >+ >+ save_pending_stack_adjust (save); >+ >+ rtx start = get_last_insn (); >+ if (expand_doubleword_shift_condmove (op1_mode, binoptab, >+ cmp_code, cmp1, cmp2, >+ outof_input, into_input, >+ op1, superword_op1, >+ outof_target, into_target, >+ unsignedp, methods, shift_mask)) >+return true; >+ delete_insns_since (start); >+ restore_pending_stack_adjust (save); > #endif > >/* As a last resort, use branches to select the correct alternative. >*/ >--- gcc/dojump.c.jj2013-11-19 21:56:27.0 +0100 >+++ gcc/dojump.c 2013-11-29 14:35:35.088685749 +0100 >@@ -96,6 +96,29
*ping* Re: wwwdocs: Broken links due to the preprocess script
On October 25, 2013 22:32, Tobias Burnus wrote: Tobias Burnus wrote: Thanks for looking at the patch. However, the patch has a link problem. The documentation is at http://gcc.gnu.org/onlinedocs/gcc/Loop_002dSpecific-Pragmas.html That's also the link I use in the changes.html file. However, some script changes the link to: http://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html which won't work. Try yourself at http://gcc.gnu.org/gcc-4.9/changes.html Actually, a similar issue was reported at http://gcc.gnu.org/ml/gcc-help/2013-10/msg00132.html The reason for the broken links are the following lines in the /www/bin/preprocess script: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/wwwdocs/bin/preprocess.diff?r1=1.38&r2=1.39&f=h Gerald, do you still know why you added it 9 years ago? The commit comment is "Use sed to work around makeinfo 4.7 brokenness." I think "makeinfo" is still broken, but those pages do not seem to go through the preprocess script, which means that only links to that page will change to a hyphen, breaking the links. Do you think it would be sensible to remove those lines again - or, alternatively, to run a similar script (e.g. "perl -i -e 's/_002d/-/g' `find onlinedocs -name \*.html`) on the onlinedocs/. I think the impact of the the former on links is smaller. (One still needs to re-run the script on those files to restore the links.) Tobias
*ping* Re: gcc/invoke.texi: Add missing @opindex
Tobias Burnus wrote: Tobias Burnus wrote: While looking at the index for -fsanitize=, I found out that it – and many other options – lack the @opindex. Attached is an attempted to add the missing ones. Updated patch: I also observed some odd "*<-fsanitize=null>" output in the man page; Manuel suggested a fix which indeed works (using @gcctabopt), which I now also include. OK for the trunk? Tobias
Add TREE_INT_CST_OFFSET_NUNITS
So maybe two INTEGER_CST lengths weren't enough. Because bitsizetype can be offset_int-sized, wi::to_offset had a TYPE_PRECISION condition to pick the array length: template inline unsigned int wi::extended_tree ::get_len () const { if (N == MAX_BITSIZE_MODE_ANY_INT || N > TYPE_PRECISION (TREE_TYPE (m_t))) return TREE_INT_CST_EXT_NUNITS (m_t); else return TREE_INT_CST_NUNITS (m_t); } and this TYPE_PRECISION condition was relatively hot in get_ref_base_and_extent when compiling insn-recog.ii. Adding a third length for offset_int does seem to reduce the cost of the offset_int + to_offset addition. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/ChangeLog.wide-int === --- gcc/ChangeLog.wide-int 2013-11-30 09:31:16.359198395 + +++ gcc/ChangeLog.wide-int 2013-11-30 09:41:50.987741444 + @@ -616,6 +616,7 @@ (TREE_INT_CST_HIGH): Delete. (TREE_INT_CST_NUNITS): New. (TREE_INT_CST_EXT_NUNITS): Likewise. + (TREE_INT_CST_OFFSET_NUNITS): Likewise. (TREE_INT_CST_ELT): Likewise. (INT_CST_LT): Use wide-int interfaces. (INT_CST_LE): New. Index: gcc/tree-core.h === --- gcc/tree-core.h 2013-11-30 09:31:16.359198395 + +++ gcc/tree-core.h 2013-11-30 09:41:12.011470169 + @@ -764,11 +764,16 @@ struct GTY(()) tree_base { struct { /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in its native precision. */ - unsigned short unextended; + unsigned char unextended; /* The number of HOST_WIDE_INTs if the INTEGER_CST is extended to wider precisions based on its TYPE_SIGN. */ - unsigned short extended; + unsigned char extended; + + /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in +offset_int precision, with smaller integers being extended +according to their TYPE_SIGN. */ + unsigned char offset; } int_length; /* VEC length. This field is only used with TREE_VEC. */ Index: gcc/tree.c === --- gcc/tree.c 2013-11-30 09:31:16.359198395 + +++ gcc/tree.c 2013-11-30 09:41:42.965685621 + @@ -1285,6 +1285,7 @@ wide_int_to_tree (tree type, const wide_ /* Make sure no one is clobbering the shared constant. */ gcc_checking_assert (TREE_TYPE (t) == type && TREE_INT_CST_NUNITS (t) == 1 +&& TREE_INT_CST_OFFSET_NUNITS (t) == 1 && TREE_INT_CST_EXT_NUNITS (t) == 1 && TREE_INT_CST_ELT (t, 0) == hwi); else @@ -1964,6 +1965,7 @@ make_int_cst_stat (int len, int ext_len TREE_SET_CODE (t, INTEGER_CST); TREE_INT_CST_NUNITS (t) = len; TREE_INT_CST_EXT_NUNITS (t) = ext_len; + TREE_INT_CST_OFFSET_NUNITS (t) = MIN (ext_len, OFFSET_INT_ELTS); TREE_CONSTANT (t) = 1; Index: gcc/tree.h === --- gcc/tree.h 2013-11-30 09:31:16.359198395 + +++ gcc/tree.h 2013-11-30 09:41:29.418591391 + @@ -907,6 +907,8 @@ #define TREE_INT_CST_NUNITS(NODE) \ (INTEGER_CST_CHECK (NODE)->base.u.int_length.unextended) #define TREE_INT_CST_EXT_NUNITS(NODE) \ (INTEGER_CST_CHECK (NODE)->base.u.int_length.extended) +#define TREE_INT_CST_OFFSET_NUNITS(NODE) \ + (INTEGER_CST_CHECK (NODE)->base.u.int_length.offset) #define TREE_INT_CST_ELT(NODE, I) TREE_INT_CST_ELT_CHECK (NODE, I) #define TREE_INT_CST_LOW(NODE) \ ((unsigned HOST_WIDE_INT) TREE_INT_CST_ELT (NODE, 0)) @@ -4623,8 +4625,10 @@ wi::extended_tree ::get_val () const inline unsigned int wi::extended_tree ::get_len () const { - if (N == MAX_BITSIZE_MODE_ANY_INT - || N > TYPE_PRECISION (TREE_TYPE (m_t))) + if (N == ADDR_MAX_PRECISION) +return TREE_INT_CST_OFFSET_NUNITS (m_t); + else if (N == MAX_BITSIZE_MODE_ANY_INT + || N > TYPE_PRECISION (TREE_TYPE (m_t))) return TREE_INT_CST_EXT_NUNITS (m_t); else return TREE_INT_CST_NUNITS (m_t); Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-11-30 09:31:16.359198395 + +++ gcc/wide-int.h 2013-11-30 09:40:32.710196218 + @@ -256,6 +256,9 @@ #define ADDR_MAX_BITSIZE 64 #define ADDR_MAX_PRECISION \ ((ADDR_MAX_BITSIZE + 4 + HOST_BITS_PER_WIDE_INT - 1) & ~(HOST_BITS_PER_WIDE_INT - 1)) +/* The number of HWIs needed to store an offset_int. */ +#define OFFSET_INT_ELTS (ADDR_MAX_PRECISION / HOST_BITS_PER_WIDE_INT) + /* The type of result produced by a binary operation on types T1 and T2. Defined purely for brevity. */ #define WI_BINARY_RESULT(T1, T2) \
[wide-int] Avoid some temporaries and use shifts more often
This started out as an another attempt to find places where we had things like: offset_int x = wi::to_offset (...); x = ...x...; and change them to: offset_int x = ...wi::to_offset (...)...; with the get_ref_base_and_extent case being the main one. But it turned out that some of them were also multiplying or dividing by BITS_PER_UNIT, so it ended up also being a patch to convert those to shifts. I didn't want to cut-&-paste the 3 : log2 (BITS_PER_UNIT) conditional yet more times, so I added a LOG2_BITS_PER_UNIT to defaults.h. I can retrofit it to the existing code if that's OK at this stage. For insn-recog.ii this reduces the number of divmod_internal calls from 7884858 to 369746. Thanks, Richard Index: gcc/ChangeLog.wide-int === --- gcc/ChangeLog.wide-int 2013-11-29 15:09:59.623293132 + +++ gcc/ChangeLog.wide-int 2013-11-29 15:11:48.611155898 + @@ -111,6 +111,7 @@ (stabstr_U): Use wide-int interfaces. (dbxout_type): Update to use cst_fits_shwi_p. * defaults.h + (LOG2_BITS_PER_UNIT): Define. (TARGET_SUPPORTS_WIDE_INT): Add default. * dfp.c: Include wide-int.h. (decimal_real_to_integer2): Use wide-int interfaces and rename to Index: gcc/alias.c === --- gcc/alias.c 2013-11-29 15:04:41.136142237 + +++ gcc/alias.c 2013-11-29 15:11:48.606155857 + @@ -2355,8 +2355,8 @@ adjust_offset_for_component_ref (tree x, offset_int woffset = (wi::to_offset (xoffset) - + wi::udiv_trunc (wi::to_offset (DECL_FIELD_BIT_OFFSET (field)), -BITS_PER_UNIT)); + + wi::lrshift (wi::to_offset (DECL_FIELD_BIT_OFFSET (field)), + LOG2_BITS_PER_UNIT)); if (!wi::fits_uhwi_p (woffset)) { *known_p = false; Index: gcc/defaults.h === --- gcc/defaults.h 2013-11-29 15:04:41.136142237 + +++ gcc/defaults.h 2013-11-29 15:11:48.606155857 + @@ -475,6 +475,14 @@ #define DWARF_TYPE_SIGNATURE_SIZE 8 #define BITS_PER_UNIT 8 #endif +#if BITS_PER_UNIT == 8 +#define LOG2_BITS_PER_UNIT 3 +#elif BITS_PER_UNIT == 16 +#define LOG2_BITS_PER_UNIT 4 +#else +#error Unknown BITS_PER_UNIT +#endif + #ifndef BITS_PER_WORD #define BITS_PER_WORD (BITS_PER_UNIT * UNITS_PER_WORD) #endif Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c 2013-11-29 15:04:41.136142237 + +++ gcc/dwarf2out.c 2013-11-29 15:40:56.188806688 + @@ -14930,7 +14930,7 @@ field_byte_offset (const_tree decl) object_offset_in_bits = bitpos_int; object_offset_in_bytes -= wi::udiv_trunc (object_offset_in_bits, BITS_PER_UNIT); += wi::lrshift (object_offset_in_bits, LOG2_BITS_PER_UNIT); return object_offset_in_bytes.to_shwi (); } Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c 2013-11-29 15:04:41.136142237 + +++ gcc/gimple-fold.c 2013-11-29 15:41:17.425983303 + @@ -2926,7 +2926,6 @@ fold_nonarray_ctor_reference (tree type, tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); tree field_size = DECL_SIZE (cfield); offset_int bitoffset; - offset_int byte_offset_cst = wi::to_offset (byte_offset); offset_int bitoffset_end, access_end; /* Variable sized objects in static constructors makes no sense, @@ -2939,7 +2938,8 @@ fold_nonarray_ctor_reference (tree type, /* Compute bit offset of the field. */ bitoffset = (wi::to_offset (field_offset) - + byte_offset_cst * BITS_PER_UNIT); + + wi::lshift (wi::to_offset (byte_offset), +LOG2_BITS_PER_UNIT)); /* Compute bit offset where the field ends. */ if (field_size != NULL_TREE) bitoffset_end = bitoffset + wi::to_offset (field_size); Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c 2013-11-29 15:04:41.136142237 + +++ gcc/gimple-ssa-strength-reduction.c 2013-11-29 15:40:56.188806688 + @@ -897,7 +897,7 @@ restructure_reference (tree *pbase, tree c2 = 0; } - c4 = wi::udiv_floor (index, BITS_PER_UNIT); + c4 = wi::lrshift (index, LOG2_BITS_PER_UNIT); c5 = backtrace_base_for_ref (&t2); *pbase = t1; Index: gcc/tree-dfa.c === --- gcc/tree-dfa.c 2013-11-29 15:04:41.136142237 + +++ gcc/tree-dfa.c 2013-11-29 15:41:39.513166464 + @@ -437,10 +437,8 @@ get_ref_base_and_extent (tree exp, HOST_ if (this_offset && TREE_CODE (this_offset) == INTEGER_CST) { - offset_int woffset = wi::to_offset (this_o
[wide-int] Use __builtin_expect for length checks
Without profiling information, GCC tends to assume "x == 1" and "x + y == 2" are likely false, so this patch adds some __builtin_expects. (system.h has a dummy definition for compilers that don't support __builtin_expect.) Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-11-30 09:40:32.710196218 + +++ gcc/wide-int.h 2013-11-30 10:07:06.567433289 + @@ -1675,7 +1675,7 @@ wi::eq_p (const T1 &x, const T2 &y) while (++i != xi.len); return true; } - if (yi.len == 1) + if (__builtin_expect (yi.len == 1, true)) { /* XI is only equal to YI if it too has a single HWI. */ if (xi.len != 1) @@ -1751,7 +1751,7 @@ wi::ltu_p (const T1 &x, const T2 &y) /* Optimize the case of two HWIs. The HWIs are implicitly sign-extended for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both values does not change the result. */ - if (xi.len + yi.len == 2) + if (__builtin_expect (xi.len + yi.len == 2, true)) { unsigned HOST_WIDE_INT xl = xi.to_uhwi (); unsigned HOST_WIDE_INT yl = yi.to_uhwi (); @@ -1922,7 +1922,7 @@ wi::cmpu (const T1 &x, const T2 &y) /* Optimize the case of two HWIs. The HWIs are implicitly sign-extended for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both values does not change the result. */ - if (xi.len + yi.len == 2) + if (__builtin_expect (xi.len + yi.len == 2, true)) { unsigned HOST_WIDE_INT xl = xi.to_uhwi (); unsigned HOST_WIDE_INT yl = yi.to_uhwi (); @@ -2128,7 +2128,7 @@ wi::bit_and (const T1 &x, const T2 &y) WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; - if (xi.len + yi.len == 2) + if (__builtin_expect (xi.len + yi.len == 2, true)) { val[0] = xi.ulow () & yi.ulow (); result.set_len (1, is_sign_extended); @@ -2149,7 +2149,7 @@ wi::bit_and_not (const T1 &x, const T2 & WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; - if (xi.len + yi.len == 2) + if (__builtin_expect (xi.len + yi.len == 2, true)) { val[0] = xi.ulow () & ~yi.ulow (); result.set_len (1, is_sign_extended); @@ -2170,7 +2170,7 @@ wi::bit_or (const T1 &x, const T2 &y) WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; - if (xi.len + yi.len == 2) + if (__builtin_expect (xi.len + yi.len == 2, true)) { val[0] = xi.ulow () | yi.ulow (); result.set_len (1, is_sign_extended); @@ -2191,7 +2191,7 @@ wi::bit_or_not (const T1 &x, const T2 &y WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; - if (xi.len + yi.len == 2) + if (__builtin_expect (xi.len + yi.len == 2, true)) { val[0] = xi.ulow () | ~yi.ulow (); result.set_len (1, is_sign_extended); @@ -2212,7 +2212,7 @@ wi::bit_xor (const T1 &x, const T2 &y) WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; - if (xi.len + yi.len == 2) + if (__builtin_expect (xi.len + yi.len == 2, true)) { val[0] = xi.ulow () ^ yi.ulow (); result.set_len (1, is_sign_extended); @@ -2248,7 +2248,7 @@ wi::add (const T1 &x, const T2 &y) HOST_BITS_PER_WIDE_INT are relatively rare and there's not much point handling them inline. */ else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) - && xi.len + yi.len == 2) + && __builtin_expect (xi.len + yi.len == 2, true)) { unsigned HOST_WIDE_INT xl = xi.ulow (); unsigned HOST_WIDE_INT yl = yi.ulow (); @@ -2323,7 +2323,7 @@ wi::sub (const T1 &x, const T2 &y) HOST_BITS_PER_WIDE_INT are relatively rare and there's not much point handling them inline. */ else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) - && xi.len + yi.len == 2) + && __builtin_expect (xi.len + yi.len == 2, true)) { unsigned HOST_WIDE_INT xl = xi.ulow (); unsigned HOST_WIDE_INT yl = yi.ulow ();
[C,C++] integer constants in attribute arguments
Hello, we currently reject: constexpr int s = 32; typedef double VEC __attribute__ ((__vector_size__ (s))); and similarly for other attributes, while we accept s+0 or (int)s, etc. The code is basically copied from the constructor attribute. The C front-end is much less forgiving than the C++ one, so we need to protect the call to default_conversion (as in PR c/59280), and for some reason one of the attributes can see a FUNCTION_DECL where others see an IDENTIFIER_NODE, I didn't try to understand why and just added that check to the code. Bootstrap and testsuite on x86_64-linux-gnu. 2013-11-30 Marc Glisse PR c++/53017 PR c++/59211 gcc/c-family/ * c-common.c (handle_aligned_attribute, handle_alloc_size_attribute, handle_vector_size_attribute, handle_nonnull_attribute): Call default_conversion on the attribute argument. gcc/cp/ * tree.c (handle_init_priority_attribute): Likewise. gcc/ * doc/extend.texi (Function Attributes): Typo. gcc/testsuite/ * c-c++-common/attributes-1.c: New testcase. * g++.dg/cpp0x/constexpr-attribute2.C: Likewise. -- Marc GlisseIndex: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 205548) +++ gcc/c-family/c-common.c (working copy) @@ -7504,24 +7504,32 @@ check_cxx_fundamental_alignment_constrai /* Handle a "aligned" attribute; arguments as in struct attribute_spec.handler. */ static tree handle_aligned_attribute (tree *node, tree ARG_UNUSED (name), tree args, int flags, bool *no_add_attrs) { tree decl = NULL_TREE; tree *type = NULL; int is_type = 0; - tree align_expr = (args ? TREE_VALUE (args) -: size_int (ATTRIBUTE_ALIGNED_VALUE / BITS_PER_UNIT)); + tree align_expr; int i; + if (args) +{ + align_expr = TREE_VALUE (args); + if (align_expr && TREE_CODE (align_expr) != IDENTIFIER_NODE) + align_expr = default_conversion (align_expr); +} + else +align_expr = size_int (ATTRIBUTE_ALIGNED_VALUE / BITS_PER_UNIT); + if (DECL_P (*node)) { decl = *node; type = &TREE_TYPE (decl); is_type = TREE_CODE (*node) == TYPE_DECL; } else if (TYPE_P (*node)) type = node, is_type = 1; if ((i = check_user_alignment (align_expr, false)) == -1 @@ -8007,20 +8015,23 @@ handle_malloc_attribute (tree *node, tre struct attribute_spec.handler. */ static tree handle_alloc_size_attribute (tree *node, tree ARG_UNUSED (name), tree args, int ARG_UNUSED (flags), bool *no_add_attrs) { unsigned arg_count = type_num_arguments (*node); for (; args; args = TREE_CHAIN (args)) { tree position = TREE_VALUE (args); + if (position && TREE_CODE (position) != IDENTIFIER_NODE + && TREE_CODE (position) != FUNCTION_DECL) + position = default_conversion (position); if (TREE_CODE (position) != INTEGER_CST || TREE_INT_CST_HIGH (position) || TREE_INT_CST_LOW (position) < 1 || TREE_INT_CST_LOW (position) > arg_count ) { warning (OPT_Wattributes, "alloc_size parameter outside range"); *no_add_attrs = true; return NULL_TREE; @@ -8451,20 +8462,22 @@ handle_vector_size_attribute (tree *node int ARG_UNUSED (flags), bool *no_add_attrs) { unsigned HOST_WIDE_INT vecsize, nunits; enum machine_mode orig_mode; tree type = *node, new_type, size; *no_add_attrs = true; size = TREE_VALUE (args); + if (size && size != error_mark_node && TREE_CODE (size) != IDENTIFIER_NODE) +size = default_conversion (size); if (!tree_fits_uhwi_p (size)) { warning (OPT_Wattributes, "%qE attribute ignored", name); return NULL_TREE; } /* Get the vector size (in bytes). */ vecsize = tree_to_uhwi (size); @@ -8548,21 +8561,25 @@ handle_nonnull_attribute (tree *node, tr } return NULL_TREE; } /* Argument list specified. Verify that each argument number references a pointer argument. */ for (attr_arg_num = 1; args; args = TREE_CHAIN (args)) { unsigned HOST_WIDE_INT arg_num = 0, ck_num; - if (!get_nonnull_operand (TREE_VALUE (args), &arg_num)) + tree arg = TREE_VALUE (args); + if (arg && TREE_CODE (arg) != IDENTIFIER_NODE) + arg = default_conversion (arg); + + if (!get_nonnull_operand (arg, &arg_num)) { error ("nonnull argument has invalid operand number (argument %lu)", (unsigned long) attr_arg_num); *no_add_attrs = true; return NULL_TREE; } if (prototype_p (type)) { function_args_iterator iter; Index: gcc/cp/tree.c === --- gcc/cp/tree.c
Re: [PATCH] fix combine.c:reg_nonzero_bits_for_combine where last_set_mode is narrower than mode
> 2013-11-29 Paulo Matos >Eric Botcazou > > * combine.c (reg_nonzero_bits_for_combine): Apply mask transformation > as applied to nonzero_sign_valid fixing bug when last_set_mode has > less precision than mode. Applied, thanks. -- Eric Botcazou
Re: [PATCH] Fix up cmove expansion (PR target/58864)
> Rather than adding do_pending_stack_adjust () in all the places, especially > when it isn't clear whether emit_conditional_move will be called at all and > whether it will actually do do_pending_stack_adjust (), I chose to add > two new functions to save/restore the pending stack adjustment state, > so that when instruction sequence is thrown away (either by doing > start_sequence/end_sequence around it and not emitting it, or > delete_insns_since) the state can be restored, and have changed all the > places that IMHO need it for emit_conditional_move. Why not do it in emit_conditional_move directly then? The code thinks it's clever to do: do_pending_stack_adjust (); last = get_last_insn (); prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1), GET_CODE (comparison), NULL_RTX, unsignedp, OPTAB_WIDEN, &comparison, &cmode); [...] delete_insns_since (last); return NULL_RTX; but apparently not, so why not delete the stack adjustment as well and restore the state afterwards? -- Eric Botcazou
Re: [wide-int] Use __builtin_expect for length checks
Richard Sandiford wrote: >Without profiling information, GCC tends to assume "x == 1" and >"x + y == 2" are likely false, so this patch adds some >__builtin_expects. >(system.h has a dummy definition for compilers that don't support >__builtin_expect.) > >Tested on x86_64-linux-gnu. OK to install? Ok. Thanks, Richard. >Thanks, >Richard > > >Index: gcc/wide-int.h >=== >--- gcc/wide-int.h 2013-11-30 09:40:32.710196218 + >+++ gcc/wide-int.h 2013-11-30 10:07:06.567433289 + >@@ -1675,7 +1675,7 @@ wi::eq_p (const T1 &x, const T2 &y) > while (++i != xi.len); > return true; > } >- if (yi.len == 1) >+ if (__builtin_expect (yi.len == 1, true)) > { > /* XI is only equal to YI if it too has a single HWI. */ > if (xi.len != 1) >@@ -1751,7 +1751,7 @@ wi::ltu_p (const T1 &x, const T2 &y) >/* Optimize the case of two HWIs. The HWIs are implicitly >sign-extended >for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both > values does not change the result. */ >- if (xi.len + yi.len == 2) >+ if (__builtin_expect (xi.len + yi.len == 2, true)) > { > unsigned HOST_WIDE_INT xl = xi.to_uhwi (); > unsigned HOST_WIDE_INT yl = yi.to_uhwi (); >@@ -1922,7 +1922,7 @@ wi::cmpu (const T1 &x, const T2 &y) >/* Optimize the case of two HWIs. The HWIs are implicitly >sign-extended >for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both > values does not change the result. */ >- if (xi.len + yi.len == 2) >+ if (__builtin_expect (xi.len + yi.len == 2, true)) > { > unsigned HOST_WIDE_INT xl = xi.to_uhwi (); > unsigned HOST_WIDE_INT yl = yi.to_uhwi (); >@@ -2128,7 +2128,7 @@ wi::bit_and (const T1 &x, const T2 &y) > WIDE_INT_REF_FOR (T1) xi (x, precision); > WIDE_INT_REF_FOR (T2) yi (y, precision); > bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; >- if (xi.len + yi.len == 2) >+ if (__builtin_expect (xi.len + yi.len == 2, true)) > { > val[0] = xi.ulow () & yi.ulow (); > result.set_len (1, is_sign_extended); >@@ -2149,7 +2149,7 @@ wi::bit_and_not (const T1 &x, const T2 & > WIDE_INT_REF_FOR (T1) xi (x, precision); > WIDE_INT_REF_FOR (T2) yi (y, precision); > bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; >- if (xi.len + yi.len == 2) >+ if (__builtin_expect (xi.len + yi.len == 2, true)) > { > val[0] = xi.ulow () & ~yi.ulow (); > result.set_len (1, is_sign_extended); >@@ -2170,7 +2170,7 @@ wi::bit_or (const T1 &x, const T2 &y) > WIDE_INT_REF_FOR (T1) xi (x, precision); > WIDE_INT_REF_FOR (T2) yi (y, precision); > bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; >- if (xi.len + yi.len == 2) >+ if (__builtin_expect (xi.len + yi.len == 2, true)) > { > val[0] = xi.ulow () | yi.ulow (); > result.set_len (1, is_sign_extended); >@@ -2191,7 +2191,7 @@ wi::bit_or_not (const T1 &x, const T2 &y > WIDE_INT_REF_FOR (T1) xi (x, precision); > WIDE_INT_REF_FOR (T2) yi (y, precision); > bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; >- if (xi.len + yi.len == 2) >+ if (__builtin_expect (xi.len + yi.len == 2, true)) > { > val[0] = xi.ulow () | ~yi.ulow (); > result.set_len (1, is_sign_extended); >@@ -2212,7 +2212,7 @@ wi::bit_xor (const T1 &x, const T2 &y) > WIDE_INT_REF_FOR (T1) xi (x, precision); > WIDE_INT_REF_FOR (T2) yi (y, precision); > bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended; >- if (xi.len + yi.len == 2) >+ if (__builtin_expect (xi.len + yi.len == 2, true)) > { > val[0] = xi.ulow () ^ yi.ulow (); > result.set_len (1, is_sign_extended); >@@ -2248,7 +2248,7 @@ wi::add (const T1 &x, const T2 &y) > HOST_BITS_PER_WIDE_INT are relatively rare and there's not much > point handling them inline. */ > else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) >- && xi.len + yi.len == 2) >+ && __builtin_expect (xi.len + yi.len == 2, true)) > { > unsigned HOST_WIDE_INT xl = xi.ulow (); > unsigned HOST_WIDE_INT yl = yi.ulow (); >@@ -2323,7 +2323,7 @@ wi::sub (const T1 &x, const T2 &y) > HOST_BITS_PER_WIDE_INT are relatively rare and there's not much > point handling them inline. */ > else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) >- && xi.len + yi.len == 2) >+ && __builtin_expect (xi.len + yi.len == 2, true)) > { > unsigned HOST_WIDE_INT xl = xi.ulow (); > unsigned HOST_WIDE_INT yl = yi.ulow ();
Re: [wide-int] Avoid some temporaries and use shifts more often
Richard Sandiford wrote: >This started out as an another attempt to find places where we had >things like: > > offset_int x = wi::to_offset (...); > x = ...x...; > >and change them to: > > offset_int x = ...wi::to_offset (...)...; > >with the get_ref_base_and_extent case being the main one. >But it turned out that some of them were also multiplying or >dividing by BITS_PER_UNIT, so it ended up also being a patch to >convert those to shifts. Ok and yes please. Thanks, Richard. >I didn't want to cut-&-paste the 3 : log2 (BITS_PER_UNIT) conditional >yet more times, so I added a LOG2_BITS_PER_UNIT to defaults.h. I can >retrofit it to the existing code if that's OK at this stage. > >For insn-recog.ii this reduces the number of divmod_internal calls from >7884858 to 369746. > >Thanks, >Richard > > >Index: gcc/ChangeLog.wide-int >=== >--- gcc/ChangeLog.wide-int 2013-11-29 15:09:59.623293132 + >+++ gcc/ChangeLog.wide-int 2013-11-29 15:11:48.611155898 + >@@ -111,6 +111,7 @@ > (stabstr_U): Use wide-int interfaces. > (dbxout_type): Update to use cst_fits_shwi_p. > * defaults.h >+ (LOG2_BITS_PER_UNIT): Define. > (TARGET_SUPPORTS_WIDE_INT): Add default. > * dfp.c: Include wide-int.h. > (decimal_real_to_integer2): Use wide-int interfaces and rename to >Index: gcc/alias.c >=== >--- gcc/alias.c2013-11-29 15:04:41.136142237 + >+++ gcc/alias.c2013-11-29 15:11:48.606155857 + >@@ -2355,8 +2355,8 @@ adjust_offset_for_component_ref (tree x, > > offset_int woffset > = (wi::to_offset (xoffset) >- + wi::udiv_trunc (wi::to_offset (DECL_FIELD_BIT_OFFSET (field)), >- BITS_PER_UNIT)); >+ + wi::lrshift (wi::to_offset (DECL_FIELD_BIT_OFFSET (field)), >+LOG2_BITS_PER_UNIT)); > if (!wi::fits_uhwi_p (woffset)) > { > *known_p = false; >Index: gcc/defaults.h >=== >--- gcc/defaults.h 2013-11-29 15:04:41.136142237 + >+++ gcc/defaults.h 2013-11-29 15:11:48.606155857 + >@@ -475,6 +475,14 @@ #define DWARF_TYPE_SIGNATURE_SIZE 8 > #define BITS_PER_UNIT 8 > #endif > >+#if BITS_PER_UNIT == 8 >+#define LOG2_BITS_PER_UNIT 3 >+#elif BITS_PER_UNIT == 16 >+#define LOG2_BITS_PER_UNIT 4 >+#else >+#error Unknown BITS_PER_UNIT >+#endif >+ > #ifndef BITS_PER_WORD > #define BITS_PER_WORD (BITS_PER_UNIT * UNITS_PER_WORD) > #endif >Index: gcc/dwarf2out.c >=== >--- gcc/dwarf2out.c2013-11-29 15:04:41.136142237 + >+++ gcc/dwarf2out.c2013-11-29 15:40:56.188806688 + >@@ -14930,7 +14930,7 @@ field_byte_offset (const_tree decl) > object_offset_in_bits = bitpos_int; > > object_offset_in_bytes >-= wi::udiv_trunc (object_offset_in_bits, BITS_PER_UNIT); >+= wi::lrshift (object_offset_in_bits, LOG2_BITS_PER_UNIT); > return object_offset_in_bytes.to_shwi (); > } > > >Index: gcc/gimple-fold.c >=== >--- gcc/gimple-fold.c 2013-11-29 15:04:41.136142237 + >+++ gcc/gimple-fold.c 2013-11-29 15:41:17.425983303 + >@@ -2926,7 +2926,6 @@ fold_nonarray_ctor_reference (tree type, > tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); > tree field_size = DECL_SIZE (cfield); > offset_int bitoffset; >- offset_int byte_offset_cst = wi::to_offset (byte_offset); > offset_int bitoffset_end, access_end; > > /* Variable sized objects in static constructors makes no sense, >@@ -2939,7 +2938,8 @@ fold_nonarray_ctor_reference (tree type, > > /* Compute bit offset of the field. */ > bitoffset = (wi::to_offset (field_offset) >- + byte_offset_cst * BITS_PER_UNIT); >+ + wi::lshift (wi::to_offset (byte_offset), >+ LOG2_BITS_PER_UNIT)); > /* Compute bit offset where the field ends. */ > if (field_size != NULL_TREE) > bitoffset_end = bitoffset + wi::to_offset (field_size); >Index: gcc/gimple-ssa-strength-reduction.c >=== >--- gcc/gimple-ssa-strength-reduction.c2013-11-29 15:04:41.136142237 >+ >+++ gcc/gimple-ssa-strength-reduction.c2013-11-29 15:40:56.188806688 >+ >@@ -897,7 +897,7 @@ restructure_reference (tree *pbase, tree > c2 = 0; > } > >- c4 = wi::udiv_floor (index, BITS_PER_UNIT); >+ c4 = wi::lrshift (index, LOG2_BITS_PER_UNIT); > c5 = backtrace_base_for_ref (&t2); > > *pbase = t1; >Index: gcc/tree-dfa.c >=== >--- gcc/tree-dfa.c 2013-11-29 15:04:41.136142237 + >+++ gcc/tree-dfa.c 2013-11-29 15:41:39.513166464 + >@@ -437,10 +437,8
Re: [wide-int] Add a fast path for multiplication by 0
Richard Sandiford wrote: >Richard Biener writes: >> On Fri, Nov 29, 2013 at 12:14 PM, Richard Sandiford >> wrote: >>> In the fold-const.ii testcase, well over half of the mul_internal >calls >>> were for multiplication by 0 (106038 out of 169355). This patch >adds >>> an early-out for that. >>> >>> Tested on x86_64-linux-gnu. OK to install? >> >> Ok. Did you check how many of the remaining are multiplies by 1? > >Turns out to be 9685, which is probably enough to justify a special >case. > >Tested on x86_64-linux-gnu. OK to install? Ok. I assume we already have a special.case for division by 1? Thanks, Richard. >Thanks, >Richard > > >Index: gcc/wide-int.cc >=== >--- gcc/wide-int.cc2013-11-29 15:04:41.177142418 + >+++ gcc/wide-int.cc2013-11-29 15:05:36.482424592 + >@@ -1296,6 +1296,20 @@ wi::mul_internal (HOST_WIDE_INT *val, co > return 1; > } > >+ /* Handle multiplications by 1. */ >+ if (op1len == 1 && op1[0] == 1) >+{ >+ for (i = 0; i < op2len; i++) >+ val[i] = op2[i]; >+ return op2len; >+} >+ if (op2len == 1 && op2[0] == 1) >+{ >+ for (i = 0; i < op1len; i++) >+ val[i] = op1[i]; >+ return op1len; >+} >+ > /* If we need to check for overflow, we can only do half wide > multiplies quickly because we need to look at the top bits to > check for the overflow. */
Re: [PING^2] [PATCH] PR59063
Yury Gribov writes: > diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp > index e0bf2da..06122e2 100644 > --- a/gcc/testsuite/lib/asan-dg.exp > +++ b/gcc/testsuite/lib/asan-dg.exp > @@ -39,9 +39,9 @@ proc asan_link_flags { paths } { > set shlib_ext [get_shlib_extension] > > if { $gccpath != "" } { > + append flags " -B${gccpath}/libsanitizer/asan/ " >if { [file exists "${gccpath}/libsanitizer/asan/.libs/libasan.a"] > || [file exists > "${gccpath}/libsanitizer/asan/.libs/libasan.${shlib_ext}"] } { > - append flags " -B${gccpath}/libsanitizer/asan/ " > append flags " -L${gccpath}/libsanitizer/asan/.libs " > append ld_library_path ":${gccpath}/libsanitizer/asan/.libs" >} > diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp > index 4ec5fdf..b7f2b17 100644 > --- a/gcc/testsuite/lib/ubsan-dg.exp > +++ b/gcc/testsuite/lib/ubsan-dg.exp > @@ -30,9 +30,9 @@ proc ubsan_link_flags { paths } { > set shlib_ext [get_shlib_extension] > > if { $gccpath != "" } { > + append flags " -B${gccpath}/libsanitizer/ubsan/ " >if { [file exists "${gccpath}/libsanitizer/ubsan/.libs/libubsan.a"] > || [file exists > "${gccpath}/libsanitizer/ubsan/.libs/libubsan.${shlib_ext}"] } { > - append flags " -B${gccpath}/libsanitizer/ubsan/ " > append flags " -L${gccpath}/libsanitizer/ubsan/.libs" > append ld_library_path ":${gccpath}/libsanitizer/ubsan/.libs" >} This is causing all the tests being run on all targets, even if libsanitizer is not supported, most of them failing due to link errors. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [WWWDOCS] Document IPA/LTO/FDO/i386 changes in GCC-4.9
On Thu, 28 Nov 2013, Jan Hubicka wrote: > We previously renamed every static function foo into foo.1234 (just as a > precaution because other compilation unit may have also function foo). > This confuses many thins, so now we do renaming only when we see a > conflict. Ah, I see. Thanks. > > + Because -fno-fat-lto-objects is now by default, I assume you mean "now on by default" or "now enabled by default"? > + gcc-ar and gcc-nm wrappers needs The...wrappers needs -> need Fine with those tweaks. Gerald PS: I applied the following fix on top of your last commit. Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.42 diff -u -3 -p -r1.42 changes.html --- changes.html29 Nov 2013 00:46:55 - 1.42 +++ changes.html30 Nov 2013 16:36:44 - @@ -46,7 +46,7 @@ Link-time optimization (LTO) improvements: Type merging was rewritten. The new implementation is significantly faster - and uses less memory. + and uses less memory. Better partitioning algorithm resulting in less streaming during link time. Early removal of virtual methods reduces the size of object files and @@ -70,7 +70,7 @@ Local aliases are introduced for symbols that are known to be semantically equivalent across shared libraries improving dynamic linking times. - + Feedback directed optimization improvements: Profiling of programs using C++ inline functions is now more reliable.
Re: [ping] [patch] contrib/config-list.mk: Allow to build all targets individually
On 11/26/13 17:43, Jan-Benedict Glaw wrote: On Sun, 2013-11-24 20:02:43 +0100, Jan-Benedict Glaw wrote: 2013-11-24 Jan-Benedict Glaw * config-list.mk (host_options): Allow to override it. (LIST): Change "=" to "EQUAL". (list): New target listing all configurations. ($(LIST)): Substitute "EQUAL" back to "=". Ping: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03121.html Additional to that, I'd suggest to also add microblazeel-elf and microblaze-rtems (cf. http://gcc.gnu.org/ml/gcc/2013-11/msg00547.html and http://gcc.gnu.org/ml/gcc/2013-11/msg00545.html), though Joern isn't fond of the idea (cf. http://gcc.gnu.org/ml/gcc/2013-11/msg00528.html). So I'd quite like to see a discussion about this. I have no objections to adding the two targets. I think that microblaze-rtems will duplicate microblaze-elf. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
libgo patch committed: Fix 386 MakeFunc when returning struct
On 386 when a function returns a struct the pointer to the return value is passed as a hidden first parameter, and the function is supposed to "ret 4" to pop the hidden parameter when returning to the caller. The implementation of reflect.MakeFunc in libgo was not doing that. This patch fixes the problem. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu, in both 32-bit and 64-bit mode. Committed to mainline and 4.8 branch. Ian diff -r fa6c22b293e8 libgo/go/reflect/makefunc_386.S --- a/libgo/go/reflect/makefunc_386.S Tue Nov 26 16:49:31 2013 -0800 +++ b/libgo/go/reflect/makefunc_386.S Sat Nov 30 09:05:42 2013 -0800 @@ -26,8 +26,11 @@ esp uint32 // 0x0 eax uint32 // 0x4 st0 uint64 // 0x8 + rs int32 // 0x10 } - */ + The rs field is set by the function to a non-zero value if + the function takes a struct hidden pointer that must be + popped off the stack. */ pushl %ebp .LCFI0: @@ -73,12 +76,19 @@ movsd -16(%ebp), %xmm0 #endif + movl -8(%ebp), %edx + addl $36, %esp popl %ebx .LCFI3: popl %ebp .LCFI4: + + testl %edx,%edx + jne 1f ret +1: + ret $4 .LFE1: #ifdef __ELF__ .size reflect.makeFuncStub, . - reflect.makeFuncStub diff -r fa6c22b293e8 libgo/go/reflect/makefuncgo_386.go --- a/libgo/go/reflect/makefuncgo_386.go Tue Nov 26 16:49:31 2013 -0800 +++ b/libgo/go/reflect/makefuncgo_386.go Sat Nov 30 09:05:42 2013 -0800 @@ -16,6 +16,7 @@ esp uint32 eax uint32 // Value to return in %eax. st0 uint64 // Value to return in %st(0). + sr int32 // Set to non-zero if hidden struct pointer. } // MakeFuncStubGo implements the 386 calling convention for MakeFunc. @@ -56,10 +57,12 @@ in := make([]Value, 0, len(ftyp.in)) ap := uintptr(regs.esp) + regs.sr = 0 var retPtr unsafe.Pointer if retStruct { retPtr = *(*unsafe.Pointer)(unsafe.Pointer(ap)) ap += ptrSize + regs.sr = 1 } for _, rt := range ftyp.in {
[patch] Fix failure of ACATS c52102c
Hi, this test started to fail very recently on 32-bit platforms with 64-bit HWI. Not sure exactly why, but the issue is straightforward and was latent. For the following reference, a call to ao_ref_init_from_ptr_and_size yields: (gdb) p debug_generic_expr((tree_node *) 0x76e01200) &a[0 ...]{lb: 4294967292 sz: 4} (gdb) p debug_generic_expr(size) 20 (gdb) p dref $36 = {ref = 0x0, base = 0x76dfd260, offset = -137438953344, size = 160, max_size = 160, ref_alias_set = 0, base_alias_set = 0, volatile_p = false} The offset is bogus. 'a' is an array with lower bound -4 so {lb: 4294967292 sz: 4} is actually {lb: -4 sz: 4}. The computation of the offset goes wrong in get_addr_base_and_unit_offset_1 because it is not done in sizetype. Fixed by copying the relevant bits from get_ref_base_and_extent, where the computation is correctly done in sizetype. Tested on x86_64-suse-linux, OK for the mainline? 2013-11-30 Eric Botcazou * tree-dfa.h (get_addr_base_and_unit_offset_1) : Do the offset computation using the precision of the index type. 2013-11-30 Eric Botcazou * gnat.dg/opt30.adb: New test. -- Eric BotcazouIndex: tree-dfa.h === --- tree-dfa.h (revision 205547) +++ tree-dfa.h (working copy) @@ -102,11 +102,11 @@ get_addr_base_and_unit_offset_1 (tree ex && (unit_size = array_ref_element_size (exp), TREE_CODE (unit_size) == INTEGER_CST)) { - HOST_WIDE_INT hindex = TREE_INT_CST_LOW (index); - - hindex -= TREE_INT_CST_LOW (low_bound); - hindex *= TREE_INT_CST_LOW (unit_size); - byte_offset += hindex; + double_int doffset + = (TREE_INT_CST (index) - TREE_INT_CST (low_bound)) + .sext (TYPE_PRECISION (TREE_TYPE (index))); + doffset *= tree_to_double_int (unit_size); + byte_offset += doffset.to_shwi (); } else return NULL_TREE; -- { dg-do run } -- { dg-options "-O" } procedure Opt30 is function Id_I (I : Integer) return Integer is begin return I; end; A : array (Integer range -4..4) of Integer; begin A := (-ID_I(4), -ID_I(3), -ID_I(2), -ID_I(1), ID_I(100), ID_I(1), ID_I(2), ID_I(3), ID_I(4)); A(-4..0) := A(0..4); if A /= (100, 1, 2, 3, 4, 1, 2, 3, 4) then raise Program_Error; end if; end;
Re: libgo patch committed: Fix 386 MakeFunc when returning struct
Ian Lance Taylor writes: > diff -r fa6c22b293e8 libgo/go/reflect/makefunc_386.S > --- a/libgo/go/reflect/makefunc_386.S Tue Nov 26 16:49:31 2013 -0800 > +++ b/libgo/go/reflect/makefunc_386.S Sat Nov 30 09:05:42 2013 -0800 > @@ -26,8 +26,11 @@ >esp uint32 // 0x0 >eax uint32 // 0x4 >st0 uint64 // 0x8 > + rs int32 // 0x10 rs ... > diff -r fa6c22b293e8 libgo/go/reflect/makefuncgo_386.go > --- a/libgo/go/reflect/makefuncgo_386.go Tue Nov 26 16:49:31 2013 -0800 > +++ b/libgo/go/reflect/makefuncgo_386.go Sat Nov 30 09:05:42 2013 -0800 > @@ -16,6 +16,7 @@ > esp uint32 > eax uint32 // Value to return in %eax. > st0 uint64 // Value to return in %st(0). > + sr int32 // Set to non-zero if hidden struct pointer. ... vs. sr. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Backport libbacktrace fix to GCC 4.8 branch
I backported the libbacktrace fix in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01445.html to the GCC 4.8 branch. Bootstrapped and ran libbacktrace testsuite on x86_64-unknown-linux-gnu. Committed to 4.8 branch. Ian 2013-11-30 Ian Lance Taylor Backport from mainline: 2013-10-17 Ian Lance Taylor * elf.c (elf_add): Don't get the wrong offsets if a debug section is missing. Index: elf.c === --- elf.c (revision 205552) +++ elf.c (working copy) @@ -725,6 +725,8 @@ elf_add (struct backtrace_state *state, { off_t end; + if (sections[i].size == 0) + continue; if (min_offset == 0 || sections[i].offset < min_offset) min_offset = sections[i].offset; end = sections[i].offset + sections[i].size; @@ -751,8 +753,13 @@ elf_add (struct backtrace_state *state, descriptor = -1; for (i = 0; i < (int) DEBUG_MAX; ++i) -sections[i].data = ((const unsigned char *) debug_view.data - + (sections[i].offset - min_offset)); +{ + if (sections[i].size == 0) + sections[i].data = NULL; + else + sections[i].data = ((const unsigned char *) debug_view.data + + (sections[i].offset - min_offset)); +} if (!backtrace_dwarf_add (state, base_address, sections[DEBUG_INFO].data,
Re: libgo patch committed: Fix 386 MakeFunc when returning struct
On Sat, Nov 30, 2013 at 9:54 AM, Andreas Schwab wrote: > Ian Lance Taylor writes: > >> diff -r fa6c22b293e8 libgo/go/reflect/makefunc_386.S >> --- a/libgo/go/reflect/makefunc_386.S Tue Nov 26 16:49:31 2013 -0800 >> +++ b/libgo/go/reflect/makefunc_386.S Sat Nov 30 09:05:42 2013 -0800 >> @@ -26,8 +26,11 @@ >>esp uint32 // 0x0 >>eax uint32 // 0x4 >>st0 uint64 // 0x8 >> + rs int32 // 0x10 > > rs ... > >> diff -r fa6c22b293e8 libgo/go/reflect/makefuncgo_386.go >> --- a/libgo/go/reflect/makefuncgo_386.go Tue Nov 26 16:49:31 2013 -0800 >> +++ b/libgo/go/reflect/makefuncgo_386.go Sat Nov 30 09:05:42 2013 -0800 >> @@ -16,6 +16,7 @@ >> esp uint32 >> eax uint32 // Value to return in %eax. >> st0 uint64 // Value to return in %st(0). >> + sr int32 // Set to non-zero if hidden struct pointer. > > ... vs. sr. Thanks. Fixed. Ian diff -r b9fc602e9b17 libgo/go/reflect/makefunc_386.S --- a/libgo/go/reflect/makefunc_386.S Sat Nov 30 09:13:14 2013 -0800 +++ b/libgo/go/reflect/makefunc_386.S Sat Nov 30 10:07:25 2013 -0800 @@ -26,9 +26,9 @@ esp uint32 // 0x0 eax uint32 // 0x4 st0 uint64 // 0x8 - rs int32 // 0x10 + sr int32 // 0x10 } - The rs field is set by the function to a non-zero value if + The sr field is set by the function to a non-zero value if the function takes a struct hidden pointer that must be popped off the stack. */
[GOMP4] SIMD enabled function for C/C++
Hello Jakub, I was looking at my elemental function for C patch that I fixed up and send as requested by Aldy, and I saw two changes there that were used for C and C++ and they were pretty obvious. Here are the changes. Can I just commit them? Thanks, Balaji V. Iyer. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 205457) +++ gcc/config/i386/i386.c (working copy) @@ -43701,7 +43701,7 @@ || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) { warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "unsupported simdlen %d\n", clonei->simdlen); + "unsupported simdlen %d", clonei->simdlen); return 0; } Index: gcc/omp-low.c === --- gcc/omp-low.c (revision 205457) +++ gcc/omp-low.c (working copy) @@ -12248,6 +12248,9 @@ tree attr = lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (node->decl)); + if (!attr) +attr = lookup_attribute ("cilk plus elemental", +DECL_ATTRIBUTES (node->decl)); if (!attr || targetm.simd_clone.compute_vecsize_and_simdlen == NULL) return; /* Ignore Here are the ChangeLogs: 2013-11-30 Balaji V. Iyer * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen): Removed a carriage return from the warning string. * omp-low.c (simd_clone_clauses_extract): Added a check for cilk plus SIMD-enabled function attributes.
[Patch, fortran] PR58410 - [4.8/4.9 Regression] Bogus uninitialized variable warning for allocatable derived type array function result
Dear All, This turned out to be a valid uninitialized variable warning. However, it was unlikely ever to cause problems at run-time. Nonetheless, here is the fix. I am disinclined to load the testsuite with a fix that is so specific and localized that it simply will not break. However, if reviewers think otherwise, I can easily add the original testcase. Bootstrapped and regtested on FC17/x86_64 - OK from trunk and 4.8? Cheers Paul 2013-11-30 Paul Thomas PR fortran/58410 * trans-array.c (gfc_alloc_allocatable_for_assignment): Do not use the array bounds of an unallocated array but set its size to zero instead. Index: gcc/fortran/trans-array.c === *** gcc/fortran/trans-array.c (revision 205031) --- gcc/fortran/trans-array.c (working copy) *** gfc_alloc_allocatable_for_assignment (gf *** 8068,8073 --- 8076,8082 tree size1; tree size2; tree array1; + tree cond_null; tree cond; tree tmp; tree tmp2; *** gfc_alloc_allocatable_for_assignment (gf *** 8143,8151 jump_label2 = gfc_build_label_decl (NULL_TREE); /* Allocate if data is NULL. */ ! cond = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, array1, build_int_cst (TREE_TYPE (array1), 0)); ! tmp = build3_v (COND_EXPR, cond, build1_v (GOTO_EXPR, jump_label1), build_empty_stmt (input_location)); gfc_add_expr_to_block (&fblock, tmp); --- 8152,8160 jump_label2 = gfc_build_label_decl (NULL_TREE); /* Allocate if data is NULL. */ ! cond_null = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, array1, build_int_cst (TREE_TYPE (array1), 0)); ! tmp = build3_v (COND_EXPR, cond_null, build1_v (GOTO_EXPR, jump_label1), build_empty_stmt (input_location)); gfc_add_expr_to_block (&fblock, tmp); *** gfc_alloc_allocatable_for_assignment (gf *** 8197,8209 tmp = build1_v (LABEL_EXPR, jump_label1); gfc_add_expr_to_block (&fblock, tmp); ! size1 = gfc_conv_descriptor_size (desc, expr1->rank); ! /* Get the rhs size. Fix both sizes. */ if (expr2) desc2 = rss->info->data.array.descriptor; else desc2 = NULL_TREE; size2 = gfc_index_one_node; for (n = 0; n < expr2->rank; n++) { --- 8206,8230 tmp = build1_v (LABEL_EXPR, jump_label1); gfc_add_expr_to_block (&fblock, tmp); ! /* If the lhs has not been allocated, its bounds will not have been ! initialized and so its size is set to zero. */ ! size1 = gfc_create_var (gfc_array_index_type, NULL); ! gfc_init_block (&alloc_block); ! gfc_add_modify (&alloc_block, size1, gfc_index_zero_node); ! gfc_init_block (&realloc_block); ! gfc_add_modify (&realloc_block, size1, ! gfc_conv_descriptor_size (desc, expr1->rank)); ! tmp = build3_v (COND_EXPR, cond_null, ! gfc_finish_block (&alloc_block), ! gfc_finish_block (&realloc_block)); ! gfc_add_expr_to_block (&fblock, tmp); ! /* Get the rhs size and fix it. */ if (expr2) desc2 = rss->info->data.array.descriptor; else desc2 = NULL_TREE; + size2 = gfc_index_one_node; for (n = 0; n < expr2->rank; n++) { *** gfc_alloc_allocatable_for_assignment (gf *** 8217,8224 gfc_array_index_type, tmp, size2); } - - size1 = gfc_evaluate_now (size1, &fblock); size2 = gfc_evaluate_now (size2, &fblock); cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, --- 8238,8243
[Patch, fortran] PR34547 - [4.8/4.9 regression] NULL(): Fortran 2003 changes, accepts invalid, ICE on invalid
Dear All, This one is trivial. NULL(...) is simply out of context in a transfer statement. Bootstrapped and regtested on FC17/x86_64. OK for trunk and 4.8? Cheers Paul 2013-11-30 Paul Thomas PR fortran/34547 * resolve.c (resolve_transfer): EXPR_NULL is always in an invalid context in a transfer statement. 2013-11-30 Paul Thomas PR fortran/34547 * gfortran.dg/null_5.f90 : Include new error. * gfortran.dg/null_6.f90 : Include new error. Index: gcc/fortran/resolve.c === *** gcc/fortran/resolve.c (revision 205031) --- gcc/fortran/resolve.c (working copy) *** resolve_transfer (gfc_code *code) *** 8247,8256 && exp->value.op.op == INTRINSIC_PARENTHESES) exp = exp->value.op.op1; ! if (exp && exp->expr_type == EXPR_NULL && exp->ts.type == BT_UNKNOWN) { ! gfc_error ("NULL intrinsic at %L in data transfer statement requires " !"MOLD=", &exp->where); return; } --- 8247,8257 && exp->value.op.op == INTRINSIC_PARENTHESES) exp = exp->value.op.op1; ! if (exp && exp->expr_type == EXPR_NULL ! && code->ext.dt) { ! gfc_error ("Invalid context for NULL () intrinsic at %L", !&exp->where); return; } Index: gcc/testsuite/gfortran.dg/null_5.f90 === *** gcc/testsuite/gfortran.dg/null_5.f90(revision 205031) --- gcc/testsuite/gfortran.dg/null_5.f90(working copy) *** subroutine test_PR34547_1 () *** 34,40 end subroutine test_PR34547_1 subroutine test_PR34547_2 () ! print *, null () ! { dg-error "in data transfer statement requires MOLD" } end subroutine test_PR34547_2 subroutine test_PR34547_3 () --- 34,40 end subroutine test_PR34547_1 subroutine test_PR34547_2 () ! print *, null () ! { dg-error "Invalid context" } end subroutine test_PR34547_2 subroutine test_PR34547_3 () Index: gcc/testsuite/gfortran.dg/null_6.f90 === *** gcc/testsuite/gfortran.dg/null_6.f90(revision 205031) --- gcc/testsuite/gfortran.dg/null_6.f90(working copy) *** end subroutine test_PR50375_2 *** 30,34 subroutine test_PR34547_3 () integer, allocatable :: i(:) ! print *, NULL(i) end subroutine test_PR34547_3 --- 30,34 subroutine test_PR34547_3 () integer, allocatable :: i(:) ! print *, NULL(i)! { dg-error "Invalid context for NULL" } end subroutine test_PR34547_3
[Patch, fortran] PR57354 - Wrong run-time assignment of allocatable array of derived type with allocatable component
Dear All, This is a partial fix for this problem in that it generates a temporary to provide a correct assignment but then goes on to do an unnecessary reallocation of the lhs. That is to say, the temporary could be taken over by the array descriptor. At the moment, I could not see a good way to do this. I propose to change the PR to reflect this. I will retain the PR and will have another go at suppressing the reallocation in a few weeks time. Bootstrapped and regtested on Fc17/x86_64 - OK for trunk? Cheers Paul 2013-11-30 Paul Thomas PR fortran/57354 * trans-array.c (gfc_conv_resolve_dependencies): For other than SS_SECTION, do a dependency check if the lhs is liable to be reallocated. 2013-11-30 Paul Thomas PR fortran/57354 * gfortran.dg/realloc_on_assign_23.f90 : New test Index: gcc/fortran/trans-array.c === *** gcc/fortran/trans-array.c (revision 205031) --- gcc/fortran/trans-array.c (working copy) *** gfc_conv_resolve_dependencies (gfc_loopi *** 4335,4344 for (ss = rss; ss != gfc_ss_terminator; ss = ss->next) { if (ss->info->type != GFC_SS_SECTION) ! continue; ! ss_expr = ss->info->expr; if (dest_expr->symtree->n.sym != ss_expr->symtree->n.sym) { --- 4335,4352 for (ss = rss; ss != gfc_ss_terminator; ss = ss->next) { + ss_expr = ss->info->expr; + if (ss->info->type != GFC_SS_SECTION) ! { ! if (gfc_option.flag_realloc_lhs ! && dest_expr != ss_expr ! && gfc_is_reallocatable_lhs (dest_expr) ! && ss_expr->rank) ! nDepend = gfc_check_dependency (dest_expr, ss_expr, true); ! continue; ! } if (dest_expr->symtree->n.sym != ss_expr->symtree->n.sym) { Index: gcc/testsuite/gfortran.dg/realloc_on_assign_23.f90 === *** gcc/testsuite/gfortran.dg/realloc_on_assign_23.f90 (revision 0) --- gcc/testsuite/gfortran.dg/realloc_on_assign_23.f90 (working copy) *** *** 0 --- 1,30 + ! { dg-do run } + ! + ! PR fortran/57354 + ! + ! Contributed by Vladimir Fuka + ! + type t + integer,allocatable :: i + end type + + type(t) :: e + type(t), allocatable :: a(:) + integer :: chksum = 0 + + do i=1,3 ! Was 100 in original + e%i = i + chksum = chksum + i + if (.not.allocated(a)) then + a = [e] + else + call foo + end if + end do + + if (sum ([(a(i)%i, i=1,size(a))]) .ne. chksum) call abort + contains + subroutine foo + a = [a, e] + end subroutine + end
RE: [GOMP4] SIMD enabled function for C/C++
Hi Jakub, Well, it turns out that I need to do a couple more changes than that one change in omp-low.c So, please ignore that. I will check in the changes in i386.c as obvious since all it involves is removing a '\n.' in the error string. Thanks, Balaji V. Iyer. > -Original Message- > From: Iyer, Balaji V > Sent: Saturday, November 30, 2013 1:16 PM > To: Jakub Jelinek > Cc: Aldy Hernandez (al...@redhat.com); 'gcc-patches@gcc.gnu.org' > Subject: [GOMP4] SIMD enabled function for C/C++ > > Hello Jakub, > I was looking at my elemental function for C patch that I fixed up and > send as requested by Aldy, and I saw two changes there that were used for C > and C++ and they were pretty obvious. Here are the changes. Can I just > commit them? > > Thanks, > > Balaji V. Iyer. > > Index: gcc/config/i386/i386.c > == > = > --- gcc/config/i386/i386.c (revision 205457) > +++ gcc/config/i386/i386.c (working copy) > @@ -43701,7 +43701,7 @@ > || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) > { >warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > - "unsupported simdlen %d\n", clonei->simdlen); > + "unsupported simdlen %d", clonei->simdlen); >return 0; > } > > Index: gcc/omp-low.c > == > = > --- gcc/omp-low.c (revision 205457) > +++ gcc/omp-low.c (working copy) > @@ -12248,6 +12248,9 @@ > >tree attr = lookup_attribute ("omp declare simd", > DECL_ATTRIBUTES (node->decl)); > + if (!attr) > +attr = lookup_attribute ("cilk plus elemental", > +DECL_ATTRIBUTES (node->decl)); >if (!attr || targetm.simd_clone.compute_vecsize_and_simdlen == NULL) > return; >/* Ignore > > > Here are the ChangeLogs: > > 2013-11-30 Balaji V. Iyer > > * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen): > Removed a carriage return from the warning string. > * omp-low.c (simd_clone_clauses_extract): Added a check for cilk plus > SIMD-enabled function attributes. >
RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C
Hello Aldy, Some of the middle end changes I made in the previous patch was not flying for the C++. Here is a fixed patch where the middle-end changes will work for both C and C++. With this email, I am attaching the patch for C along with the middle end changes. Is this Ok for the branch? Here are the ChangeLog entries: gcc/ChangeLog 2013-11-30 Balaji V. Iyer * omp-low.c (expand_simd_clones): Added a new parameter called "type." (ipa_omp_simd_clone): Added a call to expand_simd_clones when Cilk Plus is enabled. gcc/c-family/ChangeLog 2013-11-30 Balaji V. Iyer * c-common.c (c_common_attribute_table): Added "cilk plus elemental" attribute. gcc/c/ChangeLog 2013-11-30 Balaji V. Iyer * c-parser.c (struct c_parser::elem_fn_tokens): Added new field. (c_parser_declaration_or_fndef): Added a check if elem_fn_tokens field in parser is not empty. If not-empty, call the function c_parser_finish_omp_declare_simd. (c_parser_elem_fn_vectorlength): New function. (c_parser_elem_fn_expr_list): Likewise. (c_finish_elem_fn_tokens): Likewise. (c_parser_attributes): Added a elem_fn_tokens parameter. Added a check for vector attribute and if so call c_parser_elem_fn_expr_list. Also, called c_finish_elem_fn_tokens when Cilk Plus is enabled. (c_finish_omp_declare_simd): Added a check if elem_fn_tokens in parser field is non-empty. If so, parse them as you would parse the omp declare simd pragma. gcc/testsuite/ChangeLog 2013-11-30 Balaji V. Iyer * c-c++-common/cilk-plus/EF/ef_test.c: New test. * c-c++-common/cilk-plus/EF/ef_test2.c: Likewise. * c-c++-common/cilk-plus/EF/vlength_errors.c: Likewise. * c-c++-common/cilk-plus/EF/ef_error.c: Likewise. * c-c++-common/cilk-plus/EF/ef_error2.c: Likewise. * gcc.dg/cilk-plus/cilk-plus.exp: Added calls for the above tests. Thanks, Balaji V. Iyer. > -Original Message- > From: Iyer, Balaji V > Sent: Wednesday, November 27, 2013 1:15 PM > To: al...@redhat.com > Cc: Jakub Jelinek; gcc-patches@gcc.gnu.org > Subject: RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > Elemental functions) for C > > HI Aldy and Jakub, > Attached, please find a fixed patch. I have fixed all the changes you > have mentioned below. Is this OK to install? > > Here are the ChangeLog entries: > gcc/ChangeLog > 2013-11-27 Balaji V. Iyer > > * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen): > Removed a carriage return from the warning string. > * omp-low.c (simd_clone_clauses_extract): Added a check for cilk plus > SIMD-enabled function attributes. > > gcc/c/ChangeLog > 2013-11-27 Balaji V. Iyer > > * c-parser.c (struct c_parser::elem_fn_tokens): Added new field. > (c_parser_declaration_or_fndef): Added a check if elem_fn_tokens > field in parser is not empty. If not-empty, call the function > c_parser_finish_omp_declare_simd. > (c_parser_elem_fn_vectorlength): New function. > (c_parser_elem_fn_expr_list): Likewise. > (c_finish_elem_fn_tokens): Likewise. > (c_parser_attributes): Added a elem_fn_tokens parameter. Added a > check for vector attribute and if so call c_parser_elem_fn_expr_list. > Also, called c_finish_elem_fn_tokens when Cilk Plus is enabled. > (c_finish_omp_declare_simd): Added a check if elem_fn_tokens in > parser field is non-empty. If so, parse them as you would parse > the omp declare simd pragma. > > gcc/testsuite/ChangeLog > 2013-11-27 Balaji V. Iyer > > * c-c++-common/cilk-plus/EF/ef_test.c: New test. > * c-c++-common/cilk-plus/EF/ef_test2.c: Likewise. > * c-c++-common/cilk-plus/EF/vlength_errors.c: Likewise. > * c-c++-common/cilk-plus/EF/ef_error.c: Likewise. > * c-c++-common/cilk-plus/EF/ef_error2.c: Likewise. > * gcc.dg/cilk-plus/cilk-plus.exp: Added calls for the above tests. > > > Thanks, > > Balaji V. Iyer. > > > -Original Message- > > From: Aldy Hernandez [mailto:al...@redhat.com] > > Sent: Wednesday, November 27, 2013 10:52 AM > > To: Iyer, Balaji V > > Cc: Jakub Jelinek; gcc-patches@gcc.gnu.org > > Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > > Elemental functions) for C > > > > "Iyer, Balaji V" writes: > > > > > c_finish_omp_declare_simd (c_parser *parser, tree fndecl, tree parms, > > > vec clauses) > > > { > > > + > > > + if (flag_enable_cilkplus > > > + && clauses.exists () && !vec_safe_is_empty (parser- > > >elem_fn_tokens)) > > > +{ > > > + error ("%<#pragma omp declare simd%> cannot be used in the > same" > > > + "function marked as a SIMD-enabled function"); > > > + vec_free (parser->elem_fn_tokens); > > > + return; > >
RE: [GOMP4][PATCH] SIMD-enabled functions (formerly Elemental functions) for C++
Hello Everyone, The changes mentioned in http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html is also applicable to my C++ patch. With this email, I am attaching a fixed patch. Here are the ChangeLog entries: gcc/cp/ChangeLog 2013-11-30 Balaji V. Iyer * decl2.c (is_late_template_attribute): Added a check for SIMD-enabled functions attribute. If found, return true. * parser.c (cp_parser_direct_declarator): When Cilk Plus is enabled see if there is an attribute after function decl. If so, then parse them now. (cp_parser_late_return_type_opt): Handle parsing of Cilk Plus SIMD enabled function late parsing. (cp_parser_gnu_attribute_list): Parse all the tokens for the vector attribute for a SIMD-enabled function. (cp_parser_omp_all_clauses): Skip parsing to the end of pragma when the function is used by SIMD-enabled function (indicated by NULL pragma token). (cp_parser_elem_fn_vectorlength): New function. (cp_parser_elem_fn_expr_list): Likewise. (cp_parser_late_parsing_elem_fn_info): Likewise. * parser.h (cp_parser::elem_fn_info): New field. * decl.c (grokfndecl): Added a check if Cilk Plus is enabled and if so, adjust the Cilk Plus SIMD-enabled function attributes. gcc/testsuite/ChangeLog 2013-11-30 Balaji V. Iyer * g++.dg/cilk-plus/cilk-plus.exp: Called the C/C++ common tests for SIMD enabled function. * g++.dg/cilk-plus/ef_test.C: New test. Is this OK for branch? Thanks, Balaji V. Iyer. > -Original Message- > From: Iyer, Balaji V > Sent: Wednesday, November 20, 2013 6:19 PM > To: Jakub Jelinek > Cc: Aldy Hernandez (al...@redhat.com); Jeff Law; gcc-patches@gcc.gnu.org > Subject: [GOMP4][PATCH] SIMD-enabled functions (formerly Elemental > functions) for C++ > > Hello Everyone, > Attached, please find a patch that will implement SIMD-enabled > functions for C++ targeting the gomp-4_0-branch. Here are the Changelog > entries. Is this OK to install? > > gcc/cp/ChangeLog > 2013-11-20 Balaji V. Iyer > > * parser.c (cp_parser_direct_declarator): When Cilk Plus is enabled > see if there is an attribute after function decl. If so, then > parse them now. > (cp_parser_late_return_type_opt): Handle parsing of Cilk Plus SIMD > enabled function late parsing. > (cp_parser_gnu_attribute_list): Parse all the tokens for the vector > attribute for a SIMD-enabled function. > (cp_parser_omp_all_clauses): Skip parsing to the end of pragma when > the function is used by SIMD-enabled function (indicated by NULL > pragma token). > (cp_parser_elem_fn_vectorlength): New function. > (cp_parser_elem_fn_expr_list): Likewise. > (cp_parser_late_parsing_elem_fn_info): Likewise. > * parser.h (cp_parser::elem_fn_info): New field. > > gcc/testsuite/ChangeLog > 2013-11-20 Balaji V. Iyer > > * g++.dg/cilk-plus/cilk-plus.exp: Called the C/C++ common tests for > SIMD enabled function. > * g++.dg/cilk-plus/ef_test.C: New test. > > > Thanking You, > > Yours Sincerely, > > Balaji V. Iyer. Index: gcc/cp/decl.c === --- gcc/cp/decl.c (revision 205562) +++ gcc/cp/decl.c (working copy) @@ -7669,6 +7669,34 @@ } } + if (flag_enable_cilkplus) +{ + /* Adjust "cilk plus elemental attribute" attributes. */ + tree ods = lookup_attribute ("cilk plus elemental", *attrlist); + if (ods) + { + tree attr; + for (attr = ods; attr; + attr = lookup_attribute ("cilk plus elemental", + TREE_CHAIN (attr))) + { + if (TREE_CODE (type) == METHOD_TYPE) + walk_tree (&TREE_VALUE (attr), declare_simd_adjust_this, + DECL_ARGUMENTS (decl), NULL); + if (TREE_VALUE (attr) != NULL_TREE) + { + tree cl = TREE_VALUE (TREE_VALUE (attr)); + cl = c_omp_declare_simd_clauses_to_numbers + (DECL_ARGUMENTS (decl), cl); + if (cl) + TREE_VALUE (TREE_VALUE (attr)) = cl; + else + TREE_VALUE (attr) = NULL_TREE; + } + } + } +} + /* Caller will do the rest of this. */ if (check < 0) return decl; Index: gcc/cp/pt.c === --- gcc/cp/pt.c (revision 205562) +++ gcc/cp/pt.c (working copy) @@ -8603,9 +8603,12 @@ { *p = TREE_CHAIN (t); TREE_CHAIN (t) = NULL_TREE; - if (flag_openmp - && is_attribute_p ("omp declare simd", -get_attrib
Re: LRA vs reload on powerpc: 2 extra FAILs that are actually improvements?
> On Sat, Nov 2, 2013 at 6:48 PM, Steven Bosscher wrote: > > The failure of pr53199.c is because of different instruction selection > > for bswap. Test case is reduced to just one function: [snip] > > Is this an improvement or a regression? If it's an improvement then > > these two test cases should be adjusted :-) As David said, going through memory is bad, we get a load-hit-store flush. Definitely a regression on power7. Does anyone know why the bswapdi2_64bit r,r alternative is disparaged? Seems like it has been that way since the orginal mainline commit. int main (void) { int i; long ret = 0; long tmp1, tmp2, tmp3; for (i = 0; i < 10; i++) #if MEM == 1 /* From pr53199.c reg_reverse, -mlra -mcpu=power6 -mtune=power7. */ __asm__ __volatile__ ("\ addi %1,1,-16\n\ srdi %3,%0,32\n\ li %2,4\n\ stwbrx %0,0,%1\n\ stwbrx %3,%2,%1\n\ ld %0,-16(1)" : "+r" (ret), "=&b" (tmp1), "=&r" (tmp2), "=&r" (tmp3)); #elif MEM == 2 /* From pr53199.c reg_reverse, -mlra -mcpu=power6. */ __asm__ __volatile__ ("\ addi %1,1,-16\n\ srdi %3,%0,32\n\ addi %2,%1,4\n\ stwbrx %0,0,%1\n\ stwbrx %3,0,%2\n\ ld %0,-16(1)" : "+r" (ret), "=&b" (tmp1), "=&b" (tmp2), "=&r" (tmp3)); #elif MEM == 3 /* From pr53199.c reg_reverse, -mlra -mcpu=power7. */ __asm__ __volatile__ ("\ std %0,-16(1)\n\ addi %1,1,-16\n\ ldbrx %0,0,%1\n" : "+r" (ret), "=&b" (tmp1)); #else __asm__ __volatile__ ("\ srdi %1,%0,32\n\ rlwinm %2,%0,8,0x\n\ rlwinm %3,%1,8,0x\n\ rlwimi %2,%0,24,0,7\n\ rlwimi %2,%0,24,16,23\n\ rlwimi %3,%1,24,0,7\n\ rlwimi %3,%1,24,16,23\n\ sldi %2,%2,32\n\ or %2,%2,%3\n\ mr %0,%2" : "+r" (ret), "=&r" (tmp1), "=&r" (tmp2), "=&r" (tmp3)); #endif return ret; } /* amodra@bns:~> gcc -O2 bswap_mem.c amodra@bns:~> time ./a.out real0m3.096s user0m3.089s sys 0m0.001s amodra@bns:~> time ./a.out real0m3.096s user0m3.094s sys 0m0.002s amodra@bns:~> gcc -O2 -DMEM=1 bswap_mem.c amodra@bns:~> time ./a.out real0m12.661s user0m12.657s sys 0m0.003s amodra@bns:~> time ./a.out real0m12.660s user0m12.657s sys 0m0.003s amodra@bns:~> gcc -O2 -DMEM=2 bswap_mem.c amodra@bns:~> time ./a.out real0m12.660s user0m12.657s sys 0m0.003s amodra@bns:~> time ./a.out real0m12.660s user0m12.657s sys 0m0.004s amodra@bns:~> gcc -O2 -DMEM=3 bswap_mem.c amodra@bns:~> time ./a.out real0m10.279s user0m10.276s sys 0m0.003s amodra@bns:~> time ./a.out real0m10.279s user0m10.276s sys 0m0.003s I also looked at the register version and -DMEM=1 case with power7 simulators finding that the register version had a delay of 12 cycles from completion of the first instruction to completion of the last. The -DMEM=1 case had a corresponding delay of 49 cycles, which matches the loop timing above quite well. */ -- Alan Modra Australia Development Lab, IBM