Re: [RFC] add push/pop pragma to control the scope of "using"
On Wed, 15 Jan 2020, 马江 wrote: Hello, After some google, I find there is no way to control the scope of "using" for the moment. This seems strange as we definitely need this feature especially when writing inline member functions in c++ headers. Currently I am trying to build a simple class in a c++ header file as following: #include using namespace std; class mytest { string test_name; int test_val; public: inline string & get_name () {return test_name;} }; Why is mytest in the global namespace? As a experienced C coder, I know that inline functions must be put into headers or else users could only rely on LTO. And I know that to use "using" in a header file is a bad idea as it might silently change meanings of other codes. However, after I put all my inline functions into the header file, I found I must write many "std::string" instead of "string" which is totally a torture. Can we add something like "#pragma push_using" (just like #pragma pop_macro)? I believe it's feasible and probably not hard to implement. We try to avoid extensions in gcc, you may want to propose this to the C++ standard committee first. However, you should first check if modules (C++20) affect the issue. -- Marc Glisse
Re: How to get the data dependency of GIMPLE variables?
On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote: I am trying to analyze the following gimple statements, where the data dependency of _23 is a tree, whose leave nodes are three constant values {13, 4, 14}. Could anyone shed some light on how such a backward traversal can be implemented? Given _22 used in the last assignment, I have no idea of how to trace back to its definition on the fourth statement... Thank you very much! SSA_NAME_DEF_STMT _13 = 13; _14 = _13 + 4; _15 = 14; _22 = (unsigned long) _15; _23 = _22 + _14; -- Marc Glisse
Re: How to get the data dependency of GIMPLE variables?
On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote: Dear Marc, Thank you very much! Just another quick question.. Can I iterate the operands of a GIMPLE statement, like how I iterate a LLVM instruction in the following way? Instruction* instr; for (size_t i=0; i< instr->getNumOperands();i++) { instr->getOperand(i)) } Sorry for such naive questions.. I actually searched the documents and GIMPLE pretty print for a while but couldn't find such a way of accessing arbitrary numbers of operands... https://gcc.gnu.org/onlinedocs/gccint/GIMPLE_005fASSIGN.html or for lower level https://gcc.gnu.org/onlinedocs/gccint/Logical-Operators.html#Operand-vector-allocation But really you need to look at the code of gcc. Search for places that use SSA_NAME_DEF_STMT and see what they do with the result. -- Marc Glisse
Re: Local optimization options
On Sun, 5 Jul 2020, Thomas König wrote: Am 04.07.2020 um 19:11 schrieb Richard Biener : On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König" wrote: What could be a preferred way to achieve that? Could optimization options like -ffast-math be applied to blocks instead of functions? Could we set flags on the TREE codes to allow certain optinizations? Other things? The middle end can handle those things on function granularity only. Richard. OK, so that will not work (or not without a disproportionate amount of effort). Would it be possible to set something like a TREE_FAST_MATH flag on TREEs? An operation could then be optimized according to these rules iff both operands had that flag, and would also have it then. In order to support various semantics on floating point operations, I was planning to replace some trees with internal functions, with an extra operand to specify various behaviors (rounding, exception, etc). Although at least in the beginning, I was thinking of only using those functions in safe mode, to avoid perf regressions. https://gcc.gnu.org/pipermail/gcc-patches/2019-August/527040.html This may never happen now, but it sounds similar to setting flags like TREE_FAST_MATH that you are suggesting. I was going with functions for more flexibility, and to avoid all the existing assumptions about trees. While I guess for fast-math, the worst the assumptions could do is clear the flag, which would make use optimize less than possible, not so bad. -- Marc Glisse
Re: [RFC] Add new flag to specify output constraint in match.pd
On Fri, 21 Aug 2020, Feng Xue OS via Gcc wrote: There is a match-folding issue derived from pr94234. A piece of code like: int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); return t1 - t2; } It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", and be folded to constant "8". But this folding will fail if both v1 and v2 have multiple uses, as the following code. int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); use_fn (t1, t2); return t1 - t2; } Given an expression with non-single-use operands, folding it will introduce duplicated computation in most situations, and is deemed to be unprofitable. But it is always beneficial if final result is a constant or existing SSA value. And the rule is: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type) /* If @1 +- @2 is constant require a hard single-use on either original operand (but not on both). */ && (single_use (@3) || single_use (@4))) <- control whether match or not (mult (plusminus @1 @2) @0))) Current matcher only provides a way to check something before folding, but no mechanism to affect decision after folding. If has, for the above case, we can let it go when we find result is a constant. :s already has a counter-measure where it still folds if the output is at most one operation. So this transformation has a counter-counter-measure of checking single_use explicitly. And now we want a counter^3-measure... Like the way to describe input operand using flags, we could also add a new flag to specify this kind of constraint on output that we expect it is a simple gimple value. Proposed syntax is (opcode:v{ condition } ) The char "v" stands for gimple value, if more descriptive, other char is preferred. "condition" enclosed by { } is an optional c-syntax condition expression. If present, only when "condition" is met, matcher will check whether folding result is a gimple value using gimple_simplified_result_is_gimple_val (). Since there is no SSA concept in GENERIC, this is only for GIMPLE-match, not GENERIC-match. With this syntax, the rule is changed to #Form 1: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)) ( if (!single_use (@3) && !single_use (@4)) (mult:v (plusminus @1 @2) @0))) (mult (plusminus @1 @2) @0) That seems to match what you can do with '!' now (that's very recent). #Form 2: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)) (mult:v{ !single_use (@3) && !single_use (@4 } (plusminus @1 @2) @0 Indeed, something more flexible than '!' would be nice, but I am not so sure about this version. If we are going to allow inserting code after resimplification and before validation, maybe we should go even further and let people insert arbitrary code there... -- Marc Glisse
Re: [RFC] Add new flag to specify output constraint in match.pd
On Wed, 2 Sep 2020, Richard Biener via Gcc wrote: On Mon, Aug 24, 2020 at 8:20 AM Feng Xue OS via Gcc wrote: There is a match-folding issue derived from pr94234. A piece of code like: int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); return t1 - t2; } It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", and be folded to constant "8". But this folding will fail if both v1 and v2 have multiple uses, as the following code. int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); use_fn (t1, t2); return t1 - t2; } Given an expression with non-single-use operands, folding it will introduce duplicated computation in most situations, and is deemed to be unprofitable. But it is always beneficial if final result is a constant or existing SSA value. And the rule is: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type) /* If @1 +- @2 is constant require a hard single-use on either original operand (but not on both). */ && (single_use (@3) || single_use (@4))) <- control whether match or not (mult (plusminus @1 @2) @0))) Current matcher only provides a way to check something before folding, but no mechanism to affect decision after folding. If has, for the above case, we can let it go when we find result is a constant. :s already has a counter-measure where it still folds if the output is at most one operation. So this transformation has a counter-counter-measure of checking single_use explicitly. And now we want a counter^3-measure... Counter-measure is key factor to matching-cost. ":s" seems to be somewhat coarse-grained. And here we do need more control over it. But ideally, we could decouple these counter-measures from definitions of match-rule, and let gimple-matcher get a more reasonable match-or-not decision based on these counters. Anyway, it is another story. Like the way to describe input operand using flags, we could also add a new flag to specify this kind of constraint on output that we expect it is a simple gimple value. Proposed syntax is (opcode:v{ condition } ) The char "v" stands for gimple value, if more descriptive, other char is preferred. "condition" enclosed by { } is an optional c-syntax condition expression. If present, only when "condition" is met, matcher will check whether folding result is a gimple value using gimple_simplified_result_is_gimple_val (). Since there is no SSA concept in GENERIC, this is only for GIMPLE-match, not GENERIC-match. With this syntax, the rule is changed to #Form 1: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)) ( if (!single_use (@3) && !single_use (@4)) (mult:v (plusminus @1 @2) @0))) (mult (plusminus @1 @2) @0) That seems to match what you can do with '!' now (that's very recent). It's also what :s does but a slight bit more "local". When any operand is marked :s and it has more than a single-use we only allow simplifications that do not require insertion of extra stmts. So basically the above pattern doesn't behave any different than if you omit your :v. Only if you'd place :v on an inner expression there would be a difference. Correlating the inner expression we'd not want to insert new expressions for with a specific :s (or multiple ones) would be a more natural extension of what :s provides. Thus, for the above case (Form 1), you do not need :v at all and :s works. Let's consider that multiplication is expensive. We have code like 5*X-3*X, which can be simplified to 2*X. However, if both 5*X and 3*X have other uses, that would increase the number of multiplications. :s would not block a simplification to 2*X, which is a single stmt. So the existing transformation has extra explicit checks for single_use. And those extra checks block the transformation even for 5*X-4*X -> X which does not increase the number of multiplications. Which is where '!' (or :v here) comes in. Or we could decide that the extra multiplication is not that bad if it saves an addition, simplifies the expression, possibly gains more insn parallelism, etc, in which case we could just drop the existing hard single_use check... -- Marc Glisse
Re: A couple GIMPLE questions
On Sat, 5 Sep 2020, Gary Oblock via Gcc wrote: First off one of the questions just me being curious but second is quite serious. Note, this is GIMPLE coming into my optimization and not something I've modified. Here's the C code: type_t * do_comp( type_t *data, size_t len) { type_t *res; type_t *x = min_of_x( data, len); type_t *y = max_of_y( data, len); res = y; if ( x < y ) res = 0; return res; } And here's the resulting GIMPLE: ;; Function do_comp.constprop (do_comp.constprop.0, funcdef_no=5, decl_uid=4392, cgraph_uid=3, symbol_order=68) (executed once) do_comp.constprop (struct type_t * data) { struct type_t * res; struct type_t * x; struct type_t * y; size_t len; [local count: 1073741824]: [local count: 1073741824]: x_2 = min_of_x (data_1(D), 1); y_3 = max_of_y (data_1(D), 1); if (x_2 < y_3) goto ; [29.00%] else goto ; [71.00%] [local count: 311385128]: [local count: 1073741824]: # res_4 = PHI return res_4; } The silly question first. In the "if" stmt how does GCC get those probabilities? Which it shows as 29.00% and 71.00%. I believe they should both be 50.00%. See the profile_estimate pass dump. One branch makes the function return NULL, which makes gcc guess that it may be a bit less likely than the other. Those are heuristics, which are tuned to help on average, but of course they are sometimes wrong. The serious question is what is going on with this phi? res_4 = PHI This makes zero sense practicality wise to me and how is it supposed to be recognized and used? Note, I really do need to transform the "0B" into something else for my structure reorganization optimization. That's not a question? Are you asking why PHIs exist at all? They are the standard way to represent merging in SSA representations. You can iterate on the PHIs of a basic block, etc. CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Ampere Computing or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. Any unauthorized review, copying, or distribution of this email (or any attachments thereto) is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. Could you please get rid of this when posting on public mailing lists? -- Marc Glisse
Re: Installing a generated header file
On Thu, 12 Nov 2020, Bill Schmidt via Gcc wrote: Hi! I'm working on a project where it's desirable to generate a target-specific header file while building GCC, and install it with the rest of the target-specific headers (i.e., in lib/gcc//11.0.0/include). Today it appears that only those headers listed in "extra_headers" in config.gcc will be placed there, and those are assumed to be found in gcc/config/. In my case, the header file will end up in my build directory instead. Questions: * Has anyone tried something like this before? I didn't find anything. * If so, can you please point me to an example? * Otherwise, I'd be interested in advice about providing new infrastructure to support this. I'm a relative noob with respect to the configury code, and I'm sure my initial instincts will be wrong. :) Does the i386 mm_malloc.h file match your scenario? -- Marc Glisse
Re: Reassociation and trapping operations
On Wed, 25 Nov 2020, Ilya Leoshkevich via Gcc wrote: I have a C floating point comparison (a <= b && a >= b), which test_for_singularity turns into (a <= b && a == b) and vectorizer turns into ((a <= b) & (a == b)). So far so good. eliminate_redundant_comparison, however, turns it into just (a == b). I don't think this is correct, because (a <= b) traps and (a == b) doesn't. Hello, let me just mention the old https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53805 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53806 There has been some debate about the exact meaning of -ftrapping-math, but don't let that stop you. -- Marc Glisse
Re: The conditions when convert from double to float is permitted?
On Thu, 10 Dec 2020, Xionghu Luo via Gcc wrote: I have a maybe silly question about whether there is any *standard* or *options* (like -ffast-math) for GCC that allow double to float demotion optimization? For example, 1) from PR22326: #include float foo(float f, float x, float y) { return (fabs(f)*x+y); } The fabs will return double result but it could be demoted to float actually since the function returns float finally. With fp-contract, this is (float)fma((double)f,(double)x,(double)y). This could almost be transformed into fmaf(f,x,y), except that the double rounding may not be strictly equivalent. Still, that seems like it would be no problem with -funsafe-math-optimizations, just like turning (float)((double)x*(double)y) into x*y, as long as it is a single operation with casts on all inputs and output. Whether there are cases that can be optimized without -funsafe-math-optimizations is harder to tell. -- Marc Glisse
Re: Integer division on x86 -m32
On Thu, 10 Dec 2020, Lucas de Almeida via Gcc wrote: when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to __divdi3 is always output, even though it seems the use of the idiv instruction could be faster. IIRC, idiv requires that the quotient fit in 32 bits, while your C code doesn't. (1LL << 60) / 3 would cause an error with idiv. It would be possible to use idiv in some cases, if the compiler can prove that variables are in the right range, but that's not so easy. You can use inline asm to force the use of idiv if you know it is safe for your case, the most common being modular arithmetic: if you know that uint32_t a, b, c, d are smaller than m (and m!=0), you can compute a*b+c+d in uint64_t, then use div to compute that modulo m. -- Marc Glisse
Re: What is the type of vector signed + vector unsigned?
On Tue, 29 Dec 2020, Richard Sandiford via Gcc wrote: Any thoughts on what f should return in the following testcase, given the usual GNU behaviour of treating signed >> as arithmetic shift right? typedef int vs4 __attribute__((vector_size(16))); typedef unsigned int vu4 __attribute__((vector_size(16))); int f (void) { vs4 x = { -1, -1, -1, -1 }; vu4 y = { 0, 0, 0, 0 }; return ((x + y) >> 1)[0]; } The C frontend takes the type of x+y from the first operand, so x+y is signed and f returns -1. Symmetry is an important property of addition in C/C++. The C++ frontend applies similar rules to x+y as it would to scalars, with unsigned T having a higher rank than signed T, so x+y is unsigned and f returns 0x7fff. That looks like the most natural choice. FWIW, Clang treats x+y as signed, so f returns -1 for both C and C++. I think clang follows gcc and uses the type of the first operand. -- Marc Glisse
Re: bug in DSE?
On Fri, 12 Feb 2021, Andrew MacLeod via Gcc wrote: I dont't want to immediately open a PR, so I'll just ask about testsuite/gcc.dg/pr83609.c. the compilation string is -O2 -fno-tree-forwprop -fno-tree-ccp -fno-tree-fre -fno-tree-pre -fno-code-hoisting Which passes as is. if I however add -fno-tree-vrp as well, then it looks like dead store maybe does something wong... with EVRP running, we translate function foo() from complex float foo () { complex float c; complex float * c.0_1; complex float _4; : c.0_1 = &c; MEM[(long long unsigned int *)c.0_1] = 1311768467463790320; _4 = c; Isn't that a clear violation of strict aliasing? -- Marc Glisse
Re: Possible issue with ARC gcc 4.8
On Mon, 6 Jul 2015, Vineet Gupta wrote: It is the C language standard that says that shifts like this invoke undefined behavior. Right, but the compiler is a program nevertheless and it knows what to do when it sees 1 << 62 It's not like there is an uninitialized variable or something which will provide unexpected behaviour. More importantly, the question is can ports define a specific behaviour for such cases and whether that would be sufficient to guarantee the semantics. The point being ARC ISA provides a neat feature where core only considers lower 5 bits of bitpos operands. Thus we can make such behaviour not only deterministic in the context of ARC, but also optimal, eliding the need for doing specific masking/clamping to 5 bits. IMO, writing a << (b & 31) instead of a << b has only advantages. It documents the behavior you are expecting. It makes the code standard-conformant and portable. And the back-ends can provide patterns for exactly this so they generate a single insn (the same as for a << b). When I see x << 1024, 0 is the only value that makes sense to me, and I'd much rather get undefined behavior (detected by sanitizers) than silently get 'x' back. -- Marc Glisse
Re: [RFH] Move some flag_unsafe_math_optimizations using simplify and match
On Fri, 7 Aug 2015, Hurugalawadi, Naveen wrote: Please find attached the patch "simplify-1.patch" that moves some "flag_unsafe_math_optimizations" from fold-const.c to simplify and match. Some random comments (not a review). First, patches go to gcc-patc...@gcc.gnu.org. /* fold_builtin_logarithm */ (if (flag_unsafe_math_optimizations) Please indent everything below by one space. + +/* Simplify sqrt(x) * sqrt(x) -> x. */ +(simplify + (mult:c (SQRT @0) (SQRT @0)) (mult (SQRT@1 @0) @1) + (if (!HONOR_SNANS (element_mode (type))) You don't need element_mode here, HONOR_SNANS (type) should do the right thing. + @0)) + +/* Simplify root(x) * root(y) -> root(x*y). */ +/* FIXME : cbrt ICE's with AArch64. */ +(for root (SQRT CBRT) Indent below. +(simplify + (mult:c (root @0) (root @1)) No need to commute, it yields the same pattern. On the other hand, you may want root:s since if the roots are going to be computed anyway, a multiplication is cheaper than computing yet another root (I didn't check what the existing code does). (this applies to several other patterns) + (root (mult @0 @1 + +/* Simplify expN(x) * expN(y) -> expN(x+y). */ +(for exps (EXP EXP2) +/* FIXME : exp2 ICE's with AArch64. */ +(simplify + (mult:c (exps @0) (exps @1)) + (exps (plus @0 @1 I am wondering if we should handle mixed operations (say expf(x)*exp2(y)), for this pattern and others, but that's not a prerequisite. + +/* Simplify pow(x,y) * pow(x,z) -> pow(x,y+z). */ +(simplify + (mult:c (POW @0 @1) (POW @0 @2)) + (POW @0 (plus @1 @2))) + +/* Simplify pow(x,y) * pow(z,y) -> pow(x*z,y). */ +(simplify + (mult:c (POW @0 @1) (POW @2 @1)) + (POW (mult @0 @2) @1)) + +/* Simplify tan(x) * cos(x) -> sin(x). */ +(simplify + (mult:c (TAN @0) (COS @0)) + (SIN @0)) Since this will only trigger for the same version of cos and tan (say cosl with tanl or cosf with tanf), I am wondering if we get smaller code with a linear 'for' or with a quadratic 'for' which shares the same tail (I assume the above is quadratic, I did not check). This may depend on Richard's latest patches. + +/* Simplify x * pow(x,c) -> pow(x,c+1). */ +(simplify + (mult:c @0 (POW @0 @1)) + (if (TREE_CODE (@1) == REAL_CST + && !TREE_OVERFLOW (@1)) + (POW @0 (plus @1 { build_one_cst (type); } + +/* Simplify sin(x) / cos(x) -> tan(x). */ +(simplify + (rdiv (SIN @0) (COS @0)) + (TAN @0)) + +/* Simplify cos(x) / sin(x) -> 1 / tan(x). */ +(simplify + (rdiv (COS @0) (SIN @0)) + (rdiv { build_one_cst (type); } (TAN @0))) + +/* Simplify sin(x) / tan(x) -> cos(x). */ +(simplify + (rdiv (SIN @0) (TAN @0)) + (if (! HONOR_NANS (@0) + && ! HONOR_INFINITIES (element_mode (@0))) + (cos @0))) + +/* Simplify tan(x) / sin(x) -> 1.0 / cos(x). */ +(simplify + (rdiv (TAN @0) (SIN @0)) + (if (! HONOR_NANS (@0) + && ! HONOR_INFINITIES (element_mode (@0))) + (rdiv { build_one_cst (type); } (COS @0 + +/* Simplify pow(x,c) / x -> pow(x,c-1). */ +(simplify + (rdiv (POW @0 @1) @0) + (if (TREE_CODE (@1) == REAL_CST + && !TREE_OVERFLOW (@1)) + (POW @0 (minus @1 { build_one_cst (type); } + +/* Simplify a/root(b/c) into a*root(c/b). */ +/* FIXME : cbrt ICE's with AArch64. */ +(for root (SQRT CBRT) +(simplify + (rdiv @0 (root (rdiv @1 @2))) + (mult @0 (root (rdiv @2 @1) + +/* Simplify x / expN(y) into x*expN(-y). */ +/* FIXME : exp2 ICE's with AArch64. */ +(for exps (EXP EXP2) +(simplify + (rdiv @0 (exps @1)) + (mult @0 (exps (negate @1) + +/* Simplify x / pow (y,z) -> x * pow(y,-z). */ +(simplify + (rdiv @0 (POW @1 @2)) + (mult @0 (POW @1 (negate @2 + /* Special case, optimize logN(expN(x)) = x. */ (for logs (LOG LOG2 LOG10) exps (EXP EXP2 EXP10) -- Marc Glisse
Re: Replacing malloc with alloca.
On Sun, 13 Sep 2015, Ajit Kumar Agarwal wrote: The replacement of malloc with alloca can be done on the following analysis. If the lifetime of an object does not stretch beyond the immediate scope. In such cases the malloc can be replaced with alloca. This increases the performance to a great extent. Inlining helps to a great extent the scope of lifetime of an object doesn't stretch the immediate scope of an object. And the scope of replacing malloc with alloca can be identified. I am wondering what phases of our optimization pipeline the malloc is replaced with alloca and what analysis is done to transform The malloc with alloca. This greatly increases the performance of the benchmarks? Is the analysis done through Escape Analysis? If yes, then what data structure is used for the abstract execution interpretation? Did you try it? I don't think gcc ever replaces malloc with alloca. The only optimization we do with malloc/free is removing it when it is obviously unused. There are several PRs open about possible optimizations (19831 for instance). I posted a WIP patch a couple years ago to replace some malloc+free with local arrays (fixed length) but never had time to finish it. https://gcc.gnu.org/ml/gcc-patches/2013-11/msg03108.html -- Marc Glisse
Re: Multiprecision Arithmetic Builtins
On Mon, 21 Sep 2015, Florian Weimer wrote: On 09/21/2015 08:09 AM, Oleg Endo wrote: Hi all, I was thinking of adding some SH specific builtin functions for the addc, subc and negc instructions. Are there any plans to add clang's target independent multiprecision arithmetic builtins (http://clang.llvm.org/docs/LanguageExtensions.html) to GCC? Do you mean these? <https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html> Is there something else that is missing? http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins Those that take a carryin argument. -- Marc Glisse
Re: avoiding recursive calls of calloc due to optimization
On Mon, 21 Sep 2015, Daniel Gutson wrote: This is derived from https://gcc.gnu.org/ml/gcc-help/2015-03/msg00091.html Currently, gcc provides an optimization that transforms a call to malloc and a call to memset into a call to calloc. This is fine except when it takes place within the calloc() function implementation itself, causing a recursive call. Two alternatives have been proposed: -fno-malloc-builtin and disable optimizations in calloc(). I think the former is suboptimal since it affects all the code just because of the implementation of one function (calloc()), whereas the latter is suboptimal too since it disables the optimizations in the whole function (calloc too). I think of two alternatives: either make -fno-calloc-builtin to disable the optimization, or make the optimization aware of the function context where it is operating and prevent it to do the transformation if the function is calloc(). Please help me to find the best alternative so we can implent it. You may want to read this PR for more context https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888#c27 -- Marc Glisse
Re: complex support when using -std=c++11
On Thu, 12 Nov 2015, D Haley wrote: I am currently trying to understand an issue to do with complex number support in gcc. Consider the following code: #include int main() { float _Complex a = _Complex_I; } Attempting to compile this with these commands is fine: $ g++ tmp.cpp -std=gnu++11 $ g++ tmp.cpp Clang is also fine: $ clang tmp.cpp -std=c++11 Not here, I am getting the same error with clang (or "use of undeclared identifier '_Complex_I'" with libc++). This probably depends more on your libc. Attempting to compile with c++11 is not: $ g++ tmp.cpp -std=c++11 In file included from /usr/include/c++/5/complex.h:36:0, from tmp.cpp:2: tmp.cpp: In function ‘int main()’: tmp.cpp:5:29: error: unable to find numeric literal operator ‘operator""iF’ float _Complex a = _Complex_I; ^ tmp.cpp:5:29: note: use -std=gnu++11 or -fext-numeric-literals to enable more built-in suffixes I'm using debian testing's gcc: $ gcc --version gcc (Debian 5.2.1-17) 5.2.1 20150911 ... I discussed this on #gcc, and it was suggested (or I misunderstood) that this is intentional, and the library should not support c-type C++ primitives - however I can find no deprecation notice for this, nor does it appear that the c++11 standard (as far as I can see from a quick skim) has changed the behaviour in this regard. Is this intended behaviour, or is this a bug? This behaviour was noticed when troubleshooting compilation behaviours in mathgl. https://groups.google.com/forum/?_escaped_fragment_=topic/mathgl/cl4uYygPmOU#!topic/mathgl/cl4uYygPmOU C++11, for some unknown reason, decided to hijack the C header complex.h and make it equivalent to the C++ header complex. The fact that you are still getting _Complex_I defined is already a gcc extension, as is providing _Complex in C++. The C++ standard introduced User Defined Literals, which prevents the compiler from recognizing extra suffixes like iF in standard mode (why are so many people using c++11 and not gnu++11?). Our support for complex.h in C++11 in gcc is kind of best-effort. In this case, I can think of a couple ways we could improve this * _Complex_I is defined as (__extension__ 1.0iF). Maybe __extension__ could imply -fext-numeric-literals? * glibc could define _Complex_I some other way, or libstdc++ could redefine it to some other safer form (for some reason __builtin_complex is currently C-only). -- Marc Glisse
Re: GCC 5.4 Status report (2015-12-04)
On Fri, 4 Dec 2015, NightStrike wrote: Will there be another 4.9 release, too? I'm really hoping that branch can stay open a bit, since I can't upgrade to the new std::string implementation yet. Uh? The new ABI in libstdc++ is supposed to be optional, you can still use the old std::string in gcc-5, can't you? -- Marc Glisse
RE: GCC Front-End Questions
On Tue, 8 Dec 2015, Jodi A. Miller wrote: One algebraic simplification we are seeing is particularly interesting. Given the following code snippet intended to check for buffer overflow, which is actually undefined behavior in C++, we expected to maybe see the if check optimized away entirely. char buffer[100]; int length; //value received through argument or command line . . If (buffer + length < buffer) { cout << "Overflow" << endl; } Instead, our assembly code showed that the conditional was changed to length < 0, which is not what was intended at all. Again, this showed up in the first IR file generated with g++ so we are thinking it happened in the compiler front-end, which is surprising. Any thoughts on this? In addition, when the above conditional expression is not used as part of an if check (e.g., assigned to a Boolean), it is not simplified. Those optimizations during parsing exist mostly for historical reasons, and we are slowly moving away from them. You can look for any function call including "fold" in its name in the front-end. They work on expressions and mostly consist of matching patterns (described in fold-const.c and match.pd), like p + n < p in this case. -- Marc Glisse
Re: Strange C++ function pointer test
On Thu, 31 Dec 2015, Dominik Vogt wrote: This snippet ist from the Plumhall 2014 xvs test suite: #if CXX03 || CXX11 || CXX14 static float (*p1_)(float) = abs; ... checkthat(__LINE__, p1_ != 0); #endif (With the testsuite specific macros doing the obvious). abs() is declared as: int abs(int j) Am I missing some odd C++ feature or is that part of the test just plain wrong? I don't know where to look in the C++ standard; is this supposed to compile (with or without a warning?) or generate an error or is it just undefined? error: invalid conversion from ‘int (*)(int) throw ()’ to ‘float (*)(float)’ [-fpermissive] (Of course even with -fpermissive this won't work because (at least on my platform) ints are passed in different registers than floats.) There are other overloads of 'abs' declared in math.h / cmath (only in namespace std in the second case, and there are bugs (or standard issues) about having them in the global namespace for the first one). -- Marc Glisse
Re: Strange C++ function pointer test
On Thu, 31 Dec 2015, Jonathan Wakely wrote: There are other overloads of 'abs' declared in math.h / cmath (only in namespace std in the second case, and there are bugs (or standard issues) about having them in the global namespace for the first one). That's not quite accurate, C++11 was altered slightly to reflect reality. is required to declare std::abs and it's unspecified whether it also declares it as ::abs. is required to declare ::abs and it's unspecified whether it also declares it as std::abs. $ cat a.cc #include int main(){ abs(3.5); } $ g++-snapshot a.cc -c -Wall -W a.cc: In function 'int main()': a.cc:3:10: error: 'abs' was not declared in this scope abs(3.5); ^ That's what I called "bug" in my message (there are a few bugzilla PRs for this). It would probably work on Solaris. And I seem to remember there are at least 2 open LWG issues on the topic, one saying that the C++11 change didn't go far enough to match reality, since it still documents C headers differently from the C standard, and one saying that all overloads of abs should be declared as soon as one is (yes, they contradict each other). -- Marc Glisse
Re: Strange C++ function pointer test
On Thu, 31 Dec 2015, Dominik Vogt wrote: The minimal failing program is -- abs.C -- #include static float (*p1_)(float) = abs; -- abs.C -- This is allowed to fail. If you include math.h (in addition or instead of stdlib.h), it has to work (gcc bug if it doesn't). See also http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2294 -- Marc Glisse
Re: getting bugzilla access for my account
On Sat, 2 Jan 2016, Mike Frysinger wrote: seeing as how i have commit access to the gcc tree, could i have my bugzilla privs extended as well ? atm i only have normal ones which means i only get to edit my own bugs ... can't dupe/update other ones people have filed. couldn't seem to find docs for how to request this, so spamming this list. my account on gcc.gnu.org/bugzilla is "vap...@gentoo.org". Permissions are automatic for @gcc addresses, you should create a new account with that one (you can make it follow the old account, etc). -- Marc Glisse
Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct
On Sat, 20 Feb 2016, H.J. Lu wrote: On Fri, Feb 19, 2016 at 1:07 PM, Richard Smith wrote: On Fri, Feb 19, 2016 at 5:35 AM, Michael Matz wrote: Hi, On Thu, 18 Feb 2016, Richard Smith wrote: An empty type is a type where it and all of its subobjects (recursively) are of class, structure, union, or array type. No memory slot nor register should be used to pass or return an object of empty type. The trivially copyable is gone again. Why is it not necessary? The C++ ABI doesn't defer to the C psABI for types that aren't trivially-copyable. See http://mentorembedded.github.io/cxx-abi/abi.html#normal-call Hmm, yes, but we don't want to define something for only C and C++, but language independend (so far as possible). And given only the above language I think this type: struct S { S() {something();} }; would be an empty type, and that's not what we want. Yes it is. Did you mean to give S a copy constructor, copy assignment operator, or destructor instead? "Trivially copyable" is a reasonably common abstraction (if in doubt we could even define it in the ABI), and captures the idea that we need well (namely that a bit-copy is enough). In this case: struct dummy0 { }; struct dummy { dummy0 d[20]; dummy0 * foo (int i); }; dummy0 * dummy::foo (int i) { return &d[i]; } dummy0 * bar (dummy d, int i) { return d.foo (i); } dummy shouldn't be passed as empty type. Why not? We need to have a clear definition for what kinds of member functions are allowed in an empty type. -- Marc Glisse
Re: Subtyping support in GCC?
On Wed, 23 Mar 2016, Jason Chagas wrote: The the ARM compiler (armcc) provides a subtyping ($Sub/$Super) mechanism useful as a patching technique (see links below for details). Can someone tell me if GCC has similar support? If so, where can I learn more about it? FYI, before posting this question here, I researched the web extensivelly on this topic. There seems to be some GNU support for subtyping in C++. But I had no luck finding any information specifically for 'C'. Thanks, Jason How to use $Super$$ and $Sub$$ for patching data?: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15416.html Using $Super$$ and $Sub$$ to patch symbol definitions: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474c/Chdefdce.html (the best list would have been gcc-h...@gcc.gnu.org) GNU ld has an option --wrap=symbol. Does that roughly match your need? -- Marc Glisse
Re: Constexpr in intrinsics?
On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote: Would it be possible to add constexpr to the intrinsics headers? For instance _mm_set_XX and _mm_setzero intrinsics. Already suggested here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197 A patch would be welcome (I started doing it at some point, I don't remember if it was functional, the patch is attached). Ideally it could also be added all intrinsics that can be evaluated at compile time, but it is harder to tell which those are. Does gcc have a C extension we can use to set constexpr? What for? -- Marc GlisseIndex: gcc/config/i386/avx2intrin.h === --- gcc/config/i386/avx2intrin.h(revision 223886) +++ gcc/config/i386/avx2intrin.h(working copy) @@ -93,41 +93,45 @@ _mm256_packus_epi32 (__m256i __A, __m256 return (__m256i)__builtin_ia32_packusdw256 ((__v8si)__A, (__v8si)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_packus_epi16 (__m256i __A, __m256i __B) { return (__m256i)__builtin_ia32_packuswb256 ((__v16hi)__A, (__v16hi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi8 (__m256i __A, __m256i __B) { return (__m256i) ((__v32qu)__A + (__v32qu)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi16 (__m256i __A, __m256i __B) { return (__m256i) ((__v16hu)__A + (__v16hu)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi32 (__m256i __A, __m256i __B) { return (__m256i) ((__v8su)__A + (__v8su)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi64 (__m256i __A, __m256i __B) { return (__m256i) ((__v4du)__A + (__v4du)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_adds_epi8 (__m256i __A, __m256i __B) @@ -167,20 +171,21 @@ _mm256_alignr_epi8 (__m256i __A, __m256i } #else /* In that case (__N*8) will be in vreg, and insn will not be matched. */ /* Use define instead */ #define _mm256_alignr_epi8(A, B, N) \ ((__m256i) __builtin_ia32_palignr256 ((__v4di)(__m256i)(A), \ (__v4di)(__m256i)(B), \ (int)(N) * 8)) #endif +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_and_si256 (__m256i __A, __m256i __B) { return (__m256i) ((__v4du)__A & (__v4du)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_andnot_si256 (__m256i __A, __m256i __B) @@ -219,69 +224,77 @@ _mm256_blend_epi16 (__m256i __X, __m256i return (__m256i) __builtin_ia32_pblendw256 ((__v16hi)__X, (__v16hi)__Y, __M); } #else #define _mm256_blend_epi16(X, Y, M)\ ((__m256i) __builtin_ia32_pblendw256 ((__v16hi)(__m256i)(X), \ (__v16hi)(__m256i)(Y), (int)(M))) #endif +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi8 (__m256i __A, __m256i __B) { return (__m256i) ((__v32qi)__A == (__v32qi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi16 (__m256i __A, __m256i __B) { return (__m256i) ((__v16hi)__A == (__v16hi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi32 (__m256i __A, __m256i __B) { return (__m256i) ((__v8si)__A == (__v8si)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi64 (__m256i __A, __m256i __B) { return (__m256i) ((__v4di)__A == (__v4di)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi8 (__m256i __A, __m256i __B) { return (__m256i) ((__v32qi)__A > (__v32qi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi16 (__m256i __A, __m256i __B) { return (__m256i) ((__v16hi)__A > (__v16hi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi32 (__m256i __A, __m256i __B) { return (__m256i) (
Re: Constexpr in intrinsics?
On Mon, 28 Mar 2016, Allan Sandfeld Jensen wrote: On Sunday 27 March 2016, Marc Glisse wrote: On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote: Would it be possible to add constexpr to the intrinsics headers? For instance _mm_set_XX and _mm_setzero intrinsics. Already suggested here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197 A patch would be welcome (I started doing it at some point, I don't remember if it was functional, the patch is attached). That looks very similar to the patch I experimented with, and that at least works for using them in C++11 constexpr functions. Ideally it could also be added all intrinsics that can be evaluated at compile time, but it is harder to tell which those are. Does gcc have a C extension we can use to set constexpr? What for? To have similar functionality in C. For instance to explicitly allow those functions to be evaluated at compile time, and values with similar attributes be optimized completely out. Those intrinsics that are implemented without builtins can already be evaluated at compile time. #include __m128d f(){ __m128d a=_mm_set_pd(1,2); __m128d b=_mm_setr_pd(4,3); return _mm_add_pd(a, b); } The generated asm is just movapd .LC0(%rip), %xmm0 ret For the more esoteric intrinsics, what is missing is not in the parser, it is a folder that understands the behavior of each particular intrinsic. And of course avoid using precompiler noise, in shared C/C++ headers like these are. -- Marc Glisse
Re: Updating the GCC 6 release notes
On Tue, 3 May 2016, Damian Rouson wrote: Could someone please tell me how to edit or submit edits for the GCC 6 release notes at https://gcc.gnu.org/gcc-6/changes.html? Specially, the listed Fortran improvements are missing several significant items. I signed the copyright assignment in case hat helps. https://gcc.gnu.org/about.html#cvs You can send a diff to gcc-patc...@gcc.gnu.org to propose a patch (possibly Cc: the fortran mailing-list if your patch is related), same as code changes. -- Marc Glisse
Re: Implicit conversion to a generic vector type
On Thu, 26 May 2016, martin krastev wrote: Hello, I've been scratching my head over an implicit conversion issue, depicted in the following code: typedef __attribute__ ((vector_size(4 * sizeof(int int generic_int32x4; struct Foo { Foo() { } Foo(const generic_int32x4& src) { } operator generic_int32x4() const { return (generic_int32x4){ 42 }; } }; struct Bar { Bar() { } Bar(const int src) { } operator int() const { return 42; } }; int main(int, char**) { const Bar b = Bar() + Bar(); const generic_int32x4 v = (generic_int32x4){ 42 } + (generic_int32x4){ 42 }; const Foo e = generic_int32x4(Foo()) + generic_int32x4(Foo()); const Foo f = Foo() + Foo(); const Foo g = (generic_int32x4){ 42 } + Foo(); const Foo h = Foo() + (generic_int32x4){ 42 }; return 0; } In the above, the initialization expression for local 'b' compiles as expected, and so do the expressions for locals 'v' and 'e'. The initializations of locals 'f', 'g' and 'h', though, fail to compile (under g++-6.1.1, likewise under 5.x and 4.x) with: $ g++-6 xxx.cpp xxx.cpp: In function ‘int main(int, char**)’: xxx.cpp:28:22: error: no match for ‘operator+’ (operand types are ‘Foo’ and ‘Foo’) const Foo f = Foo() + Foo(); ~~^~~ xxx.cpp:29:40: error: no match for ‘operator+’ (operand types are ‘generic_int32x4 {aka __vector(4) int}’ and ‘Foo’) const Foo g = (generic_int32x4){ 42 } + Foo(); ~~~^~~ xxx.cpp:30:22: error: no match for ‘operator+’ (operand types are ‘Foo’ and ‘generic_int32x4 {aka __vector(4) int}’) const Foo h = Foo() + (generic_int32x4){ 42 }; ~~^ Apparently there is some implicit conversion rule that stops g++ from doing the expected implicit conversions, but I can't figure out which rule that is. The fact clang handles the code without an issue does not help either. Any help will be appreciated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57572 -- Marc Glisse
Re: Implicit conversion to a generic vector type
On Thu, 26 May 2016, martin krastev wrote: Thank you for the reply. So it's a known g++ issue with a candidate patch. Looking at the patch, I was wondering, what precludes the generic vector types form being proper arithmetic types? In some cases vectors act like arithmetic types (operator+, etc), and in others they don't (conversions in general). We have scalarish_type_p for things that are scalars or vectors, we could add arithmeticish_type_p ;-) (I think the name arithmetic comes directly from the standard, so we don't want to change its meaning) -- Marc Glisse
Re: Implicit conversion to a generic vector type
On Fri, 27 May 2016, martin krastev wrote: A new arithmeticish type would take more effort, I understand. Marc, are there plans to incorporate your patch, perhaps in an extended form, in a release any time soon? There is no plan either way. When someone is motivated enough (I am not, currently), they will submit a patch to gcc-patc...@gcc.gnu.org, which will be reviewed. Note that a patch needs to include testcases (see the files in gcc/testsuite/g++.dg for examples). If you are interested, you could give it a try... -- Marc Glisse
Re: An issue with GCC 6.1.0's make install?
On Sat, 4 Jun 2016, Ethin Probst wrote: Yesterday I managed to successfully build GCC and all of the accompanying languages that it supports by default (Ada, C, C++, Fortran, Go, Java, Objective-C, Objective-C++, and Link-time Optimization (LTO)). I did not build JIT support because I have not herd if it is stable or not. Anyways, seeing as I didn't (and still do not) want to wait another 12 hours for that to build, I compressed it into a .tar.bz2 archive, Did you use "make -j 8" (where 8 is roughly how many CPUs you have in your server)? 12 hours seems excessive. copied it over to another server, decompressed it, and here's when the Did you copy it to exactly the same path as on the original server, preserving time stamps, and do both servers have identical systems? problems start. Keep in mind that I did ensure that all files were compressed and extracted. When I go into my build subdirectory build tree, and type "make install -s", it installs gnat, gcc (and g++), gfortran, gccgo, and gcj, but it errors out (and, subsequently, bales out) and says the following: Making install in tools make[3]: *** [install-recursive] Error 1 make[2]: *** [install-recursive] Error 1 make[1]: *** [install-target-libjava] Error 2 make: *** [install] Error 2 And then: $ gcj gcj: error: libgcj.spec: No such file or directory A more common approach would be to run "make install DESTDIR=/some/where", tar that directory, copy this archive to other servers, and untar it in the right location. That's roughly what linux distributions do. I'm considering the test suite, but until it installs, I'm not sure if executing the test suite would be very wise at this point. To get it to say that no input file was specified, I have to manually run the following commands: $ cd x86_64-pc-linux-gnu/libjava $ cp libgcj.spec /usr/bin That seems like a strange location for this file. Has the transportation of the source code caused the build tree to be messed up? I know that it works perfectly fine on my other server. Running make install without the -s command line parameter yields nothing. Have I done something wrong? "nothing" is not very helpful... Surely it gave some error message. -- Marc Glisse
Re: [RFC][Draft patch] Introduce IntegerSanitizer in GCC.
On Mon, 4 Jul 2016, Maxim Ostapenko wrote: Is community interested in such a tool? On the one hand, it is clearly useful since you found bugs thanks to it. On the other hand: 1) I hope we never reach the situation caused by Microsoft's infamous warning C4146 (which is even an error if you enable "secure" mode), where projects writing perfectly legal bignum code keep getting misguided reports by users who see those warnings. 2) This kind of encourages people to keep using unsigned types for non-negative integers, whereas they would be better reserved to bignum and bitfields (sadly, the standards make it hard to avoid unsigned types...). -- Marc Glisse
Vector unaligned load/store x86 intrinsics
Hello, I was considering changing the implementation of _mm_loadu_pd in x86's emmintrin.h to avoid a builtin. Here are 3 versions: typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, aligned(1))); __m128d f (double const *__P) { return __builtin_ia32_loadupd (__P); } __m128d g (double const *__P) { return *(__m128d_u*)(__P); } __m128d h (double const *__P) { __m128d __r; __builtin_memcpy (&__r, __P, 16); return __r; } f is what we have currently. f and g generate the same code. h also generates the same code except at -O0 where it is slightly longer. (note that I haven't regtested either version yet) 1) I don't have any strong preference between g and h, is there a reason to pick one over the other? I may have a slight preference for g, which expands to __m128d _3; _3 = MEM[(__m128d_u * {ref-all})__P_2(D)]; while h yields __int128 unsigned _3; _3 = MEM[(char * {ref-all})__P_2(D)]; _4 = VIEW_CONVERT_EXPR(_3); 2) Reading Intel's doc for movupd, it says: "If alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary." Since we generate movupd for memcpy even when the alignment is presumably only 1 byte, I assume that this alignment-check stuff is not supported by gcc? -- Marc Glisse
Re: Vector unaligned load/store x86 intrinsics
On Fri, 26 Aug 2016, Richard Biener wrote: On Thu, Aug 25, 2016 at 9:40 PM, Marc Glisse wrote: Hello, I was considering changing the implementation of _mm_loadu_pd in x86's emmintrin.h to avoid a builtin. Here are 3 versions: typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, aligned(1))); __m128d f (double const *__P) { return __builtin_ia32_loadupd (__P); } __m128d g (double const *__P) { return *(__m128d_u*)(__P); } __m128d h (double const *__P) { __m128d __r; __builtin_memcpy (&__r, __P, 16); return __r; } f is what we have currently. f and g generate the same code. h also generates the same code except at -O0 where it is slightly longer. (note that I haven't regtested either version yet) 1) I don't have any strong preference between g and h, is there a reason to pick one over the other? I may have a slight preference for g, which expands to __m128d _3; _3 = MEM[(__m128d_u * {ref-all})__P_2(D)]; while h yields __int128 unsigned _3; _3 = MEM[(char * {ref-all})__P_2(D)]; _4 = VIEW_CONVERT_EXPR(_3); I prefer 'g' which is just more natural. Ok, thanks. Note that the C language requires that __P be aligned to alignof (double) (not sure what the Intel intrinsic specs say here), and thus it doesn't allow arbitrary misalignment. This means that you could use a slightly better aligned type with aligned(alignof(double)). I had thought about it, but since we already generate movupd with aligned(1), it didn't really seem worth the trouble for this prototype. Or to be conforming the parameter should not be double const * but a double type variant with alignment 1 ... Yeah, those intrinsics have issues: __m128i _mm_loadu_si128 (__m128i const* mem_addr) "mem_addr does not need to be aligned on any particular boundary." that doesn't really make sense. I may try to experiment with your suggestion, see if it breaks anything. Gcc seems happy to ignore those alignment differences when casting function pointers, so it should be fine. 2) Reading Intel's doc for movupd, it says: "If alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary." Since we generate movupd for memcpy even when the alignment is presumably only 1 byte, I assume that this alignment-check stuff is not supported by gcc? Huh, never heard of this. Does this mean that mov_u_XX do alignment-check exceptions? I believe this would break almost all code (glibc memcpy, GCC generated code, etc). Thus it would require kernel support, emulating the unaligned ops to still work (but record them somehow). Elsewhere ( https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_pd&expand=3106,3115,3106,3124,3106&techs=SSE2 ) Intel doesn't mention this at all, it just says: "mem_addr does not need to be aligned on any particular boundary." So it might be a provision in the spec that was added just in case, but never implemented... -- Marc Glisse
Re: Is this FE bug or am I missing something?
On Sun, 11 Sep 2016, Igor Shevlyakov wrote: Small sample below fails (at least on 6.1) for multiple targets. The difference between two functions start at the very first tree pass... You are missing -fsanitize=undefined (and #include ). Please use the mailing list gcc-h...@gcc.gnu.org next time. -- Marc Glisse
Re: Is this FE bug or am I missing something?
On Mon, 12 Sep 2016, Igor Shevlyakov wrote: Well, my concern is not what happens with overflow (which in second case -fsanitize=undefined will address), but rather consistency of that 2 cases. p[x+1] generates RTL which leads to better generated code at the expense of leading to overflow, while p[1+x] never overflows but leads to worse code. It would be beneficial to make the behaviour consistent between those 2 cases. True. Your example with undefined behavior confused me as to what your point was. For int* f1(int* p, int x) { return &p[x + 1]; } int* f2(int* p, int x) { return &p[1 + x]; } we get in the gimple dump _1 = (sizetype) x; _2 = _1 + 1; vs _1 = x + 1; _2 = (long unsigned int) _1; The second one is a better starting point (it has more information about potential overflow), but the first one has the advantage that all numbers have the same size, which saves an instruction in the end movslq %esi, %rsi leaq4(%rdi,%rsi,4), %rax vs addl$1, %esi movslq %esi, %rsi leaq(%rdi,%rsi,4), %rax We regularly discuss the potential benefits of a pass that would try to uniformize integer sizes... In the mean time, I agree that gimplifying x+1 and 1+x differently makes little sense, you could file a PR about that. -- Marc Glisse
Re: how to check if target supports andnot instruction ?
On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote: I was having a look at PR71636 and added the following pattern to match.pd: x & ((1U << b) - 1) -> x & ~(~0U << b) However the transform is useful only if the target supports "andnot" instruction. rth was selling the transformation as a canonicalization, which is beneficial when there is an andnot instruction, and neutral otherwise, so it could be done always. As pointed out by Marc in PR for -march=core2, lhs generates worse code than rhs, so we shouldn't do the transform if target doesn't support andnot insn. (perhaps we could do the reverse transform for target not supporting andnot?) Rereading my comment in the PR, I pointed out that instead of being neutral, the transformation was very slightly detrimental in one case (one extra mov) because of a RA issue. That doesn't mean we should avoid the transformation, just that we should fix the RA issue (by the way, if you have time to file a separate PR for the RA issue, that would be great, otherwise I'll try to do it at some point...). However it seems andnot isn't a standard pattern name, so am not sure how to check if target supports andnot insn ? -- Marc Glisse
Re: how to check if target supports andnot instruction ?
On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote: On 12 October 2016 at 14:43, Richard Biener wrote: On Wed, 12 Oct 2016, Marc Glisse wrote: On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote: I was having a look at PR71636 and added the following pattern to match.pd: x & ((1U << b) - 1) -> x & ~(~0U << b) However the transform is useful only if the target supports "andnot" instruction. rth was selling the transformation as a canonicalization, which is beneficial when there is an andnot instruction, and neutral otherwise, so it could be done always. Well, its three instructions to three instructions and a more expensive constant(?). ~0U might not be available as immediate for the shift instruction and 1U << b might be available as a bit-set instruction ... (vs. the andnot). True, I hadn't thought of bit-set. So yes, we might decide to canonicalize to andnot (and decide that three binary to two binary and one unary op is "better"). So no excuse to explore the target specific .pd fragment idea ... :/ Hi, I have attached patch that adds the transform. Does that look OK ? Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested in the PR? If we only do the transformation when (1<bit_and, then we probably want to require that it has a single use (maybe even the shift). I am not sure how to write test-cases for it though. For the test-case: unsigned f(unsigned x, unsigned b) { unsigned t1 = 1U << b; unsigned t2 = t1 - 1; unsigned t3 = x & t2; return t3; } forwprop dump shows: Applying pattern match.pd:523, gimple-match.c:47419 gimple_simplified to _6 = 4294967295 << b_1(D); _8 = ~_6; t3_5 = x_4(D) & _8; I could scan for "_6 = 4294967295 << b_1(D);" however I suppose ~0 would depend on width of int and not always be 4294967295 ? Or should I scan for "_6 = 4294967295 << b_1(D);" and add /* { dg-require-effective int32 } */ to the test-case ? You could check that you have ~, or that you don't have " 1 << ". -- Marc Glisse
Re: GCC 6.2.0 : What does the undocumented -r option ?
On Mon, 7 Nov 2016, Emmanuel Charpentier wrote: The Sage project (http://www.sagemath.org) has recently hit an interesting snag : its developers using Debian testing began to encounter difficulties compiling the flint package (http://groups.googl e.co.uk/group/flint-devel) with gcc 2.6.0. One of us found (see https://groups.google.com/d/msg/sage-devel/TduebNo ZuBE/sEULolL0BQAJ) that this was bound to a conflict between the -pie option (now default) and an undocumented -r option. We would like to know what is this -r option, what it does and why it is undocumented. (the mailing list you are looking for is gcc-h...@gcc.gnu.org) As can be seen in the first message of the conversation you link to "/usr/bin/ld: -r and -pie may not be used together" The option -r is passed to ld, so you have to look for it in ld's manual where it is clearly documented. (that hardening stuff is such a pain...) -- Marc Glisse
Re: Need some help with a possible bug
(should have been gcc-h...@gcc.gnu.org, please send any follow-ups there) On Wed, 23 Apr 2014, George R Goffe wrote: I'm trying to build the latest gcc Do you really need gcj? If not, please disable java. and am getting a message from the process "collect2: error: ld returned 1 exit status" for this library /usr/lsd/Linux/lib/libgmp.so. Here's the full msg: "/usr/lsd/Linux/lib/libgmp.so: could not read symbols: File in wrong format" You are doing a multilib build (--disable-multilib if you don't want that), so it tries to build both a 64 bit and a 32 bit versions of libjavamath.so, both of which want to link to GMP. So you need both versions of GMP installed as well. I thought the configure script in classpath would detect your missing 32 bit GMP and disable use of GMP in that case, but apparently not... You may want to file a PR in bugzilla about that if there isn't one already. But you'll need to provide more info there: your configure command line, the file config.log in the 32 bit version of classpath, etc. -- Marc Glisse
Re: RTL representation of i386 shrdl instruction is incorrect?
On Thu, 5 Jun 2014, Niranjan Hasabnis wrote: Thanks for your reply. I looked into some of the details of how that particular RTL template is used. It seems to me that the particular RTL template is used only when shifting 64-bit data type on a 32-bit machine. This is the underlying assumption encoded in i386.c file which generates that particular RTL only when instruction mode is DImode. If that is the case, then it won't matter whether one uses arithmetic shift or logical shift to right shift lower 4-bytes of a 8-byte value. In other words, the mapping between RTL template and shrdl is incorrect, but the underlying assumption in i386.c guards the bug. This is still a bug, please file a PR. The use of (match_dup 0) apparently prevents combine from matching the insn (that's just a guess from my notes in PR 55583, I don't have access to my gcc machine right now to check), but that doesn't mean we shouldn't fix things. -- Marc Glisse
Re: What is "fnspec function type attribute"?
On Fri, 6 Jun 2014, FX wrote: In fortran/trans-decl.c, we have a comment above the code building function decls, saying: The SPEC parameter specifies the function argument and return type specification according to the fnspec function type attribute. */ I was away from GCC development for some time, so this is news to me. The syntax is not immediately clear, and neither a Google search nor a grep of the trunk’s numerous .texi files reveals any information. I’m creating new decls, what I am to do with it? You can look at the 2 functions in gimple.c that use gimple_call_fnspec, and refer to tree-core.h for the meaning of EAF_*, etc. A string like "2x." means: '2': the first letter is about the return, here we are returning the second argument 'x': the first argument is ignored '.': not saying anything about the second argument. -- Marc Glisse
Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM
On Wed, 25 Jun 2014, Vladimir Makarov wrote: Maybe. But in this case LLVM did a right thing. The variable addressing was through a restrict pointer. Ah, gcc implements (on purpose?) a weak version of restrict, where it only considers that 2 restrict pointers don't alias, whereas all other compilers assume that restrict pointers don't alias other non-derived pointers (see several PRs in bugzilla). I believe Richard recently added code that would make implementing the strong version of restrict easier. Maybe that's what is missing here? -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Tue, 1 Jul 2014, Jeff Law wrote: On 07/01/14 13:27, Tom de Vries wrote: Vladimir, There are a few patterns which use both the read/write constraint modifier (+) and the earlyclobber constraint modifier (&): ... $ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0 gcc/config/aarch64/aarch64-simd.md:1 gcc/config/arc/arc.md:1 gcc/config/arm/ldmstm.md:30 gcc/config/rs6000/spe.md:8 ... F.i., this one in gcc/config/aarch64/aarch64-simd.md: ... (define_insn "vec_pack_trunc_" [(set (match_operand: 0 "register_operand" "+&w") (vec_concat: (truncate: (match_operand:VQN 1 "register_operand" "w")) (truncate: (match_operand:VQN 2 "register_operand" "w"] ... The documentation ( https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states: ... '‘&’ does not obviate the need to write ‘=’. ... which seems to state that '&' implies '='. An earlyclobber operand is defined as 'modified before the instruction is finished using the input operands'. AFAIU that would indeed exclude the possibility that the earlyclobber operand is an input/output operand it self, but perhaps I misunderstand. So my question is: is the combination of '&' and '+' supported ? If so, what is the exact semantics ? If not, should we warn or give an error ? I don't think we can define any reasonable semantics for &+. My recommendation would be for this to be considered a hard error. Uh? The doc explicitly says "An input operand can be tied to an earlyclobber operand" and goes on to explain why that is useful. It avoids using the same register for other input when they are identical. -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Tue, 1 Jul 2014, Tom de Vries wrote: On 01-07-14 21:58, Marc Glisse wrote: So my question is: is the combination of '&' and '+' supported ? If so, what is the exact semantics ? If not, should we warn or give an error ? I don't think we can define any reasonable semantics for &+. My recommendation would be for this to be considered a hard error. Uh? The doc explicitly says "An input operand can be tied to an earlyclobber operand" and goes on to explain why that is useful. It avoids using the same register for other input when they are identical. Hi Marc, That part of the doc refers to the mulsi3 insn for ARM as example: ... ;; Use `&' and then `0' to prevent the operands 0 and 1 being the same (define_insn "*arm_mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") (mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%0,r")))] "TARGET_32BIT && !arm_arch6" "mul%?\\t%0, %2, %1" [(set_attr "type" "mul") (set_attr "predicable" "yes")] ) ... Note that there's no combination of & and + here. I think it could have used (match_dup 0) instead of operand 1, if there had been only the first alternative. And then the constraint would have been +&. AFAIU, the 'tie' established here is from input operand 1 to an earlyclobber output operand 0 using the '0' matching constraint. Having said that, I don't understand the comment, AFAIU it should be: 'Use '0' to make sure operands 0 and 1 are the same, and use '&' to make sure operands 0 and 2 are not the same. Well, yeah, the comment doesn't seem completely in sync with the code. In the first example you gave, looking at the pattern (no match_dup, setting the full register), it seems that it may have wanted "=&" instead of "+&". (by the way, in the same aarch64-simd.md file, I noticed some define_expand with constraints, that looks strange) -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Wed, 2 Jul 2014, Tom de Vries wrote: On 02-07-14 08:23, Marc Glisse wrote: I think it could have used (match_dup 0) instead of operand 1, if there had been only the first alternative. And then the constraint would have been +&. isn't that explicitly listed as unsupported here ( https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3244 ): ... Note that match_dup should not be used to tell the compiler that a particular register is being used for two operands (example: add that adds one register to another; the second register is both an input operand and the output operand). Use a matching constraint (see Simple Constraints) for those. match_dup is for the cases where one operand is used in two places in the template, such as an instruction that computes both a quotient and a remainder, where the opcode takes two input operands but the RTL template has to refer to each of those twice; once for the quotient pattern and once for the remainder pattern. ... ? Well, looking for instance at x86_shrd... Ok, I didn't know it wasn't supported (though I did suggest using match_operand and "0" at some point). Still, the meaning of +&, in inline asm for instance, seems relatively clear, no? -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Wed, 2 Jul 2014, Tom de Vries wrote: On 02-07-14 09:02, Marc Glisse wrote: Still, the meaning of +&, in inline asm for instance, seems relatively clear, no? I can't find any testsuite examples using this construct. Furthermore, I'd expect the same semantics and restrictions for constraints in rtl templates and inline asm. So I'm not sure what you mean. Coming back to your original question: An earlyclobber operand is defined as 'modified before the instruction is finished using the input operands'. AFAIU that would indeed exclude the possibility that the earlyclobber operand is an input/output operand it self, but perhaps I misunderstand. So my question is: is the combination of '&' and '+' supported ? If so, what is the exact semantics ? If not, should we warn or give an error ? An earlyclobber operand X prevents *other* input operands from using the same register, but that does not include X itself (if it is using +) or operands explicitly using a matching constraint for X. At least that's how I understand it. -- Marc Glisse
Re: GCC version bikeshedding
On Wed, 6 Aug 2014, Jakub Jelinek wrote: - libstdc++ ABI changes It seems unlikely to be in the next release, it is too late in the cycle. Chances to break the ABI don't come often, and rushing one at the end of stage1 would be wasting a good opportunity. -- Marc Glisse
Re: GCC version bikeshedding
On Wed, 6 Aug 2014, Richard Biener wrote: It's an ABI change for all modes (but not a SONAME change because the old and new definitions will both be present in the .so). Ugh. That's going to be a nightmare to support. Yes. And IMO a waste of effort compared to a clean .so.7 break, but well... Is there a configure switch to change the default ABI used? That is, on a legacy system can I upgrate to 5.0 and get code that interoperates fine with code built with 4.8? (including ABI boundaries using the affected classes? I suspect APIs with std::string passing are _very_ common, not sure about std::list) What's the failure mode the user will see when linking against a 4.8 compiled library with a std::string interface using 5.0? In good cases, a linker error about a missing symbol (different mangling). In less good cases, a warning at compile-time about using a class marked with abi_tag in a class not marked with it. In worse cases (passing through void* for instance), a runtime crash. And how do libraries with such an API avoid silently changing their ABI dependent on the compiler used to compile them? That is, I suppose those need to change their SONAME dependent on the compiler version used?! Yes, just like a move to .so.7 would entail. -- Marc Glisse
Re: GCC version bikeshedding
On Wed, 6 Aug 2014, Jakub Jelinek wrote: On Wed, Aug 06, 2014 at 12:31:57PM +0200, Richard Biener wrote: Ok, so the problematical case is struct X { std::string s; }; void foo (X&); Yeah. then. OTOH I remember that then mangling of X changes as well? Only if you add abi_tag attribute to X. Note that -Wabi-tag can tell you where it is needed. struct __attribute__((abi_tag("marc"))) X {}; struct Y { X x; }; a.cc:2:8: warning: 'Y' does not have the "marc" abi tag that 'X' (used in the type of 'Y::x') has [-Wabi-tag] struct Y { X x; }; ^ a.cc:2:14: note: 'Y::x' declared here struct Y { X x; }; ^ a.cc:1:41: note: 'X' declared here struct __attribute__((abi_tag("marc"))) X {}; ^ I hope the libstdc++ folks will add some macro which will include the right abi_tag attribute for the std::list/std::string cases, so you'd in the end just add #ifndef _GLIBCXX_ABI_TAG_SOMETHING #define _GLIBCXX_ABI_TAG_SOMETHING #endif ... struct X _GLIBCXX_ABI_TAG_SOMETHING { std::string s; }; void foo (X&); or similar. So we only need to patch every project out there... A clean .so.7 break would be significantly worse nightmare. We've been there many years ago, e.g. 3.2/3.3 vs. 3.4, there has been significantly fewer C++ plugins etc. in packages and it still it was unsolvable. With the abi_tag stuff, you have the option to make stuff interoperable when mixing compiler, either with no effort at all, or some limited effort. With .so.7, you have no option, nothing will be interoperable. I disagree that it is worse, but you have more experience, I guess we will see the results in a few years... -- Marc Glisse
Re: Where does GCC pick passes for different opt. levels
On Mon, 11 Aug 2014, Steve Ellcey wrote: I have a basic question about optimization selection in GCC. There used to be some code in GCC (passes.c?) that would set various optimize pass flags depending on if the 'optimize' flag was > 0, > 1, or > 2; later I think there may have been a table. There is still a table in opts.c, with entries that look like: { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 }, This code seems gone now and I can't figure out how GCC is selecting what optimization passes to run at what optimization levels (-O1 vs. -O2 vs. -O3). How is this handled in the top-of-tree GCC code? I see passes.def but there doesn't seem to be anything in there to tie specific passes to specific optimization levels. Likewise in common.opt I see flags for various optimization passes but nothing to tie them to -O1 or -O2, etc. I'm probably missing something obvious, but a pointer would be much appreciated. -- Marc Glisse
Re: Conditional negation elimination in tree-ssa-phiopt.c
On Mon, 11 Aug 2014, Kyrill Tkachov wrote: The aarch64 target has a conditional negation instruction CSNEG Rd, Rs1, Rs2, cond with semantics Rd = if cond then Rs1 else -Rs2. This, however doesn't get end up getting matched for code such as: int foo2 (unsigned a, unsigned b) { int r = 0; r = a & b; if (a & b) return -r; return r; } Note that in this particular case, we should just return -(a&b) like llvm does. -- Marc Glisse
Re: gcc parallel make check
On Wed, 3 Sep 2014, VandeVondele Joost wrote: I've noticed that make -j -k check-fortran results in a serialized checking, while make -j32 -k check-fortran goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check target, while the other targets seem to do just fine. Is that a feature, or should I file a PR for that... ? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155 -- Marc Glisse
Re: Fwd: Building gcc-4.9 on OpenBSD
On Wed, 17 Sep 2014, Ian Grant wrote: And is there any way to disable the Intel library? --disable-libcilkrts (same as the other libs) If it explicitly doesn't support your system, I am a bit surprised it isn't disabled automatically, that seems like a bug. Please don't call it "the Intel library", that doesn't mean anything. -- Marc Glisse
Re: Fwd: Building gcc-4.9 on OpenBSD
On Wed, 17 Sep 2014, Ian Grant wrote: On Wed, Sep 17, 2014 at 1:36 PM, Marc Glisse wrote: On Wed, 17 Sep 2014, Ian Grant wrote: And is there any way to disable the Intel library? --disable-libcilkrts (same as the other libs) If it explicitly doesn't support your system, I am a bit surprised it isn't disabled automatically, that seems like a bug. Not necessarily a bug, but it would have been good if the --help option had mentioned it. I looked, really. Perhaps I missed it though. So many options for disabling one thing or another https://gcc.gnu.org/install/configure.html lists a number of others but not this one, maybe it should be added. Please don't call it "the Intel library", that doesn't mean anything. Doesn't it? How did you know what 'it' was then? Or is that a stupid question? This identity concept is much slipperier than it seems at first, isn't it? You included error messages... How about my question about the size of the binaries? Is that 60+MB what other systems show? I still see <20M here, but I don't know if there are reasons for what you are seeing. Are you maybe using different options? (debug information, optimization, lto, etc) -- Marc Glisse
Re: How to identify the type of the object being created using the new operator?
On Mon, 6 Oct 2014, Swati Rathi wrote: Statement : A *a = new B; gets translated in GIMPLE as 1. void * D.2805; 2. struct A * a; 3. D.2805 = operator new (20); 4. a = D.2805; A is the base class and B is the derived class. In statement 3, new operator is creating an object of derived class B. By analyzing the RHS of the assignment statement 3, how can we identify the type (in this case B) of the object being created? I strongly doubt you can. It is calling B's constructor that will turn this memory region into a B, operator new is the same as malloc, it only returns raw memory. (If A and B don't have the same size, the argument 20 can be a hint) -- Marc Glisse
Re: volatile access optimization (C++ / x86_64)
On Fri, 26 Dec 2014, Matt Godbolt wrote: I'm investigating ways to have single-threaded writers write to memory areas which are then (very infrequently) read from another thread for monitoring purposes. Things like "number of units of work done". I initially modeled this with relaxed atomic operations. This generates a "lock xadd" style instruction, as I can't convey that there are no other writers. As best I can tell, there's no memory order I can use to explain my usage characteristics. Giving up on the atomics, I tried volatiles. These are less than ideal as their power is less expressive, but in my instance I am not trying to fight the ISA's reordering; just prevent the compiler from eliding updates to my shared metrics. GCC's code generation uses a "load; add; store" for volatiles, instead of a single "add 1, [metric]". https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677 -- Marc Glisse
Re: C++ Standard Question
On Thu, 22 Jan 2015, Joel Sherrill wrote: I think this is a glibc issue but since this method is defined in the C++ standards, I thought there were plenty of language lawyers here. :) s/glibc/libstdc++/ and they have their own ML. That's deprecated, isn't it? class strstreambuf : public basic_streambuf > ISSUE > int pcount() const; <= ISSUE My reading of the C++03 and draft C++14 says that the int pcount() method in this class is not const. glibc has it const in the glibc shipped with Fedora 20 and CentOS 6. This is a simple test case: #include int main() { int (std::strstreambuf::*dummy)() = &std::strstreambuf::pcount; /*-- pcount is conformant --*/ return 0; } What's the consensus? The exact signature of member functions is not mandated by the standard, implementations are allowed to make the function const if that works (or provide both a const and a non-const version). Your code is not guaranteed to work. Lambdas usually provide a fine workaround. -- Marc Glisse
Re: unfused fma question
On Mon, 23 Feb 2015, Jeff Law wrote: On 02/23/15 11:38, Joseph Myers wrote: (I wonder if convert_mult_to_fma is something that should move to match-and-simplify infrastructure.) Yea, it probably should. Currently, it happens in a pass that is quite late. If it moves to match-and-simplify, I am afraid it might inhibit some other optimizations (we can turn plus+mult to fma but not the reverse), unless we use some way to inhibit some patterns until a certain pass (possibly a simple "if", if that's not too costly). Such "time-restricted" patterns might be useful for other purposes: don't introduce complicated vector/complex operations after the corresponding lowering passes, do narrowing until a certain point but then prefer fast integer sizes, etc (I haven't thought about those particular examples, they are only an illustration). -- Marc Glisse
Re: A bug (?) with inline functions at O0: undefined reference
On Fri, 6 Mar 2015, Ilya Verbin wrote: I've discovered a strange behaviour on trunk gcc, here is the reproducer: inline int foo () { return 0; } int main () { return foo (); } $ gcc main.c /tmp/ccD1LeXo.o: In function `main': main.c:(.text+0xa): undefined reference to `foo' collect2: error: ld returned 1 exit status Is this a bug? If yes, is it known? GCC 4.8.3 works fine though. Not a bug, that's what inline means in C99 and later. -- Marc Glisse
Re: Named parameters
On Mon, 16 Mar 2015, David Brown wrote: In a discussion on comp.lang.c, the subject of "named parameters" (or "designated parameters") has come up again. This is a feature that some of us feel would be very useful in C (and in C++). I think it would be possible to include it in the language without leading to any conflicts with existing code - it is therefore something that could be made as a gcc extension, with a hope of adding it to the standards for a later C standards revision. I wanted to ask opinions on the mailing list as to the feasibility of the idea - there is little point in my cluttering up bugzilla with an enhancement request if the gcc developers can spot obvious flaws in the idea. Filing a report in bugzilla would be quite useless: language extensions are now almost automatically rejected unless they come with a proposal that has already been favorably seen by the standardization committee. On the other hand, implementing the feature (in your own fork) is almost a requirement if you intend to propose this for standardization. And it should not be too hard. Basically, the idea is this: int foo(int a, int b, int c); void bar(void) { foo(1, 2, 3); // Normal call foo(.a = 1, .b = 2, .c = 3) // Same as foo(1, 2, 3) foo(.c = 3, .b = 2, .a = 1) // Same as foo(1, 2, 3) } struct foo_args { int a, b, c; }; void foo(struct foo_args); #define foo(...) foo((struct foo_args){__VA_ARGS__}) void g(){ foo(1,2,3); foo(.c=3,.b=2); } In C++ you could almost get away without the macro, calling f({1,2,3}), but f({.c=3}) currently gives "sorry, unimplemented". Maybe you would like to work on that? If only the first variant is allowed (with the named parameters in the order declared in the prototype), then this would not affect code generation at all - the designators could only be used for static error checking. If the second variant is allowed, then the parameters could be re-ordered. The aim of this is to make it easier and safer to call functions with a large number of parameters. The syntax is chosen to match that of designated initialisers - that should be clearer to the programmer, and hopefully also make implementation easier. If there is more than one declaration of the function, then the designators used should follow the most recent in-scope declaration. An error may be safer, you would at least want a warning. This feature could be particularly useful when combined with default arguments in C++, as it would allow the programmer to override later default arguments without specifying all earlier arguments. C++ is always more complicated (so many features can interact in strange ways), I suggest you start with C. At the moment, I am not asking for an implementation, or even /how/ it might be implemented (perhaps a MELT plugin?) - I would merely like opinions on whether it would be a useful and practical enhancement. This is not such a good list for that, comp.lang.c is better suited. This will be a good list if you have technical issues implementing the feature. -- Marc Glisse
Re: -Wno-c++11-extensions addition
On Wed, 25 Mar 2015, Jack Howarth wrote: On Wed, Mar 25, 2015 at 12:41 PM, Jonathan Wakely wrote: On 25 March 2015 at 16:16, Jack Howarth wrote: Does anyone remember which FSF gcc release first added the -Wno-c++11-extensions option for g++? I know it exists in 4.6.3 Are you sure? It doesn't exist for 4.6.4 or anything later. Are you thinking of -Wc++0x-compat ? On x86_64 Fedora 15... $ /usr/bin/g++ --version g++ (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) Copyright (C) 2011 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ /usr/bin/g++ -Wno-c++11-extensions hello.cc $ So gcc 4.6.3 appears to at least tolerate that warning without claiming that it is unknown. https://gcc.gnu.org/wiki/FAQ#The_warning_.22unrecognized_command-line_option.22_is_not_given_for_-Wno-foo -- Marc Glisse
Re: [i386] Scalar DImode instructions on XMM registers
On Fri, 24 Apr 2015, Uros Bizjak wrote: Please try to generate paradoxical subreg (V2DImode subreg of V1DImode pseudo). IIRC, there is some functionality in the compiler that is able to tell if the highpart of the paradoxical register is zeroed. Those are not currently legal (I tried to change that) https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html In this case, a subreg:V2DI of DImode should work. -- Marc Glisse
Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
On Fri, 12 Oct 2018, Thomas Schwinge wrote: Hmm, and without any OpenACC/OpenMP etc., actually the same problem is also present when running the following code through the vectorizer: for (int tmp = 0; tmp < N_J * N_I; ++tmp) { int j = tmp / N_I; int i = tmp % N_I; a[j][i] = 0; } ... whereas the following variant (obviously) does vectorize: int a[NJ * NI]; for (int tmp = 0; tmp < N_J * N_I; ++tmp) a[tmp] = 0; I had a quick look at the difference, and a[j][i] remains in this form throughout optimization. If I write instead *((*(a+j))+i) = 0; I get j_10 = tmp_17 / 1025; i_11 = tmp_17 % 1025; _1 = (long unsigned int) j_10; _2 = _1 * 1025; _3 = (sizetype) i_11; _4 = _2 + _3; or for a power of 2 j_10 = tmp_17 >> 10; i_11 = tmp_17 & 1023; _1 = (long unsigned int) j_10; _2 = _1 * 1024; _3 = (sizetype) i_11; _4 = _2 + _3; and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least I think that's true). So there are missing match.pd transformations in addition to whatever scev/ivdep/other work is needed. -- Marc Glisse
Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)
(resent because of mail issues on my end) On Mon, 22 Oct 2018, Thomas Schwinge wrote: I had a quick look at the difference, and a[j][i] remains in this form throughout optimization. If I write instead *((*(a+j))+i) = 0; I get j_10 = tmp_17 / 1025; i_11 = tmp_17 % 1025; _1 = (long unsigned int) j_10; _2 = _1 * 1025; _3 = (sizetype) i_11; _4 = _2 + _3; or for a power of 2 j_10 = tmp_17 >> 10; i_11 = tmp_17 & 1023; _1 = (long unsigned int) j_10; _2 = _1 * 1024; _3 = (sizetype) i_11; _4 = _2 + _3; and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least I think that's true). So there are missing match.pd transformations in addition to whatever scev/ivdep/other work is needed. With a very simplistic "match.pd" rule (not yet any special cases checking etc.): diff --git gcc/match.pd gcc/match.pd index b36d7ccb5dc3..4c23116308da 100644 --- gcc/match.pd +++ gcc/match.pd @@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { wide_int_to_tree (sizetype, off); }) { swap_p ? @0 : @2; })) { rhs_tree; }) + +/* Given: + + j = in / N_I + i = in % N_I + + ..., fold: + + out = j * N_I + i + + ..., into: + + out = in +*/ + +/* As long as only considering N_I being INTEGER_CST (which are always second + argument?), probably don't need ":c" variants? */ + +(simplify + (plus:c + (mult:c + (trunc_div @0 INTEGER_CST@1) + INTEGER_CST@1) + (trunc_mod @0 INTEGER_CST@1)) + (convert @0)) You should only specify INTEGER_CST@1 on the first occurence, the others can be just @1. (you may be interested in @@1 at some point, but that gets tricky) ..., the original code: int f1(int in) { int j = in / N_I; int i = in % N_I; int out = j * N_I + i; return out; } ... gets simplified from ("div-mod-0.c.027t.objsz1"): f1 (int in) { int out; int i; int j; int _1; int _6; : gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_return <_6> } ... to ("div-mod-0.c.028t.ccp1"): f1 (int in) { int out; int i; int j; int _1; : gimple_assign gimple_assign gimple_assign gimple_return } (The three dead "gimple_assign"s get eliminated later on.) So, that works. However, it doesn't work yet for the original construct that I'd ran into, which looks like this: [...] int i; int j; [...] signed int .offset.5_2; [...] unsigned int .offset.7_23; unsigned int .iter.0_24; unsigned int _25; unsigned int _26; [...] unsigned int .iter.0_32; [...] : # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Resolving the "a[j][i] = 123" we'll need to look into later. As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we get: [...] int i; int j; long unsigned int _1; long unsigned int _2; sizetype _3; sizetype _4; sizetype _5; int * _6; [...] signed int .offset.5_8; [...] unsigned int .offset.7_29; unsigned int .iter.0_30; unsigned int _31; unsigned int _32; [...] : # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30", but "match.pd" doesn't agree yet. Note the many "nop_expr"s here, which I have not yet figured out how to handle, I suppose? I tried some things but couldn't get it to work. Apparently the existing instances of "(match (nop_convert @0)" and "Basic strip-useless-type-conversions / strip_nops" rule also don't handle these; should they? Or, are in fact here the types mixed up too much? "(match (nop_convert @0)" defines a shortcut so some transformations can use nop_convert to detect some specific conversions, but it doesn't do anything by itself. "Basic strip-useless-type-conversions" strips conversions that are *useless*, essentially from a type to the same type. If you want to handle true conversions, you need to do that explicitly, see the many transformations that use convert? convert1? convert2? and specify for which particular conversions the transformation is valid. Finding out the right conditions to detect these conversions is often the most painful part of writing a match.pd transformation. I hope to get some time again soon to continue looking into this, but if anybody got any ideas, I'm all ears. -- Marc Glisse
Re: [RFC] -Weverything
On Tue, 22 Jan 2019, Thomas Koenig wrote: Hi, What would people think about a -Weverything option which turns on every warning there is? I think that could be quite useful in some circumstances, especially to find potential bugs with warnings that people, for some reason or other, found too noisy for -Wextra. The name could be something else, of course. In the best GNU tradition, -Wkitchen-sink could be another option :-) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31573 and duplicates already list quite a few arguments. Basically, it could be useful for debugging gcc or to discover warnings, but gcc devs fear that users will actually use it for real. -- Marc Glisse
Re: [RFC] -Weverything
On Wed, 23 Jan 2019, Jakub Jelinek wrote: We have that, gcc -Q --help=warning Of course, for warnings which do require arguments (numerical, or enumeration/string), one still needs to pick up his choices of those arguments; no idea what -Weverything would do here, while some warnings have different levels where a higher (or lower) level is a superset of another level, what numbers would you pick for e.g. warnings where the argument is bytes? For most of them, there is a value that maximizes the number of warnings, so the same superset argument applies. -Wframe-larger-than=0 so it shows the estimated frame size on every function, -Walloca-larger-than=0 so it is equivalent to -Walloca, etc. -- Marc Glisse
Re: On-Demand range technology [2/5] - Major Components : How it works
On Tue, 4 Jun 2019, Martin Sebor wrote: On 5/31/19 9:40 AM, Andrew MacLeod wrote: On 5/29/19 7:15 AM, Richard Biener wrote: On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod wrote: On 5/27/19 9:02 AM, Richard Biener wrote: On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod wrote: The above suggests that iff this is done at all it is not in GORI because those are not conditional stmts or ranges from feeding those. The machinery doing the use-def walking from stmt context also cannot come along these so I have the suspicion that Ranger cannot handle telling us that for the stmt following above, for example if (_5 != 0) that _5 is not zero? Can you clarify? So there are 2 aspects to this. the range-ops code for DIV_EXPR, if asked for the range of op2 () would return ~[0,0] for _5. But you are also correct in that the walk backwards would not find this. This is similar functionality to how null_derefs are currently handled, and in fact could probably be done simultaneously using the same code base. I didn't bring null derefs up, but this is a good time :-) There is a separate class used by the gori-cache which tracks the non-nullness property at the block level. It has a single API: non_null_deref_p (name, bb) which determines whether the is a dereference in any BB for NAME, which indicates whether the range has an implicit ~[0,0] range in that basic block or not. So when we then have _1 = *_2; // after this _2 is non-NULL _3 = _1 + 1; // _3 is non-NULL _4 = *_3; ... when a on-demand user asks whether _3 is non-NULL at the point of _4 = *_3 we don't have this information? Since the per-BB caching will only say _1 is non-NULL after the BB. I'm also not sure whether _3 ever gets non-NULL during non-NULL processing of the block since walking immediate uses doesn't really help here? presumably _3 is globally non-null due to the definition being (pointer + x) ... ie, _3 has a global range o f ~[0,0] ? No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer. I'm confused. _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no idea what the range of _1 is, so how do you assert _1 is [~0,0] ? The only way I see to determine _3 is non-NULL is through the _4 = *_3 statement. In the first two statements from the above (where _1 is a pointer): _1 = *_2; _3 = _1 + 1; _1 must be non-null because C/C++ define pointer addition only for non-null pointers, and therefore so must _3. (int*)0+0 is well-defined, so this uses the fact that 1 is non-null. This is all well done in extract_range_from_binary_expr already, although it seems to miss the (dangerous) optimization NULL + unknown == NULL. Just in case, a quote: "When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P. (4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value. (4.2) — Otherwise, if P points to element x[i] of an array object x with n elements, 80 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n. (4.3) — Otherwise, the behavior is undefined" Or does the middle-end allow arithmetic on null pointers? When people use -fno-delete-null-pointer-checks because their (embedded) platform has important stuff at address 0, they also want to be able to do arithmetic there. -- Marc Glisse
Re: Testsuite not passing and problem with xgcc executable
On Sat, 8 Jun 2019, Jonathan Wakely wrote: You can see which tests failed by looking in the .log files in the testsuite directories, There are .sum files for a quick summary. or by running the contrib/test_summary script. There is also contrib/compare_tests, although running it globally has been failing for a long time now, and running it for individual .sum files fails for jit and libphobos. Other scripts in contrib/ may be relevant. -- Marc Glisse
Re: Disappeared flag: -maes on -march=ivybridge, present in -march=native
On Mon, 29 Jul 2019, Kevin Weidemann wrote: I have recently randomly discovered the fact, that building with `-march=ivybridge` does not necessarily produce the same output as `-march=native` on an Intel Core i7 3770K (Ivy Bridge). Nothing so surprising there. Not all Ivy Bridge processors are equivalent, and -march=ivybridge has to conservatively target those with less features. 71c8e4e2f720bc7155ba2da7c0ee9136a9ab3283 is the first bad commit commit 71c8e4e2f720bc7155ba2da7c0ee9136a9ab3283 Author: hjl Date: Fri Feb 22 12:49:21 2019 + x86: (Reapply) Move AESNI generation to Skylake and Goldmont This is a repeat of commit r263989, which commit r264052 accidentally reverted. 2019-02-22 Thiago Macieira PR target/89444 * config/i386/i386.h (PTA_WESTMERE): Remove PTA_AES. (PTA_SKYLAKE): Add PTA_AES. (PTA_GOLDMONT): Likewise. As you can see, this is very much on purpose. See https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01940.html for the explanation that came with the patch. -- Marc Glisse
Re: [ARM] LLVM's -arm-assume-misaligned-load-store equivalent in GCC?
On Tue, 7 Jan 2020, Christophe Lyon wrote: I've received a support request where GCC generates strd/ldrd which require aligned memory addresses, while the user code actually provides sub-aligned pointers. The sample code is derived from CMSIS: #define __SIMD32_TYPE int #define __SIMD32(addr) (*(__SIMD32_TYPE **) & (addr)) void foo(short *pDst, int in1, int in2) { *__SIMD32(pDst)++ = in1; *__SIMD32(pDst)++ = in2; } compiled with arm-none-eabi-gcc -mcpu=cortex-m7 CMSIS.c -S -O2 generates: foo: strdr1, r2, [r0] bx lr Using -mno-unaligned-access of course makes no change, since the code is lying to the compiler by casting short* to int*. If the issue is as well isolated as this, can't they just edit the code? typedef int __SIMD32_TYPE __attribute__((aligned(1))); gets str r1, [r0]@ unaligned str r2, [r0, #4]@ unaligned instead of strdr1, r2, [r0] -- Marc Glisse
Re: Deprecating arithmetic on std::atomic
On Thu, 20 Apr 2017, Florian Weimer wrote: On 04/19/2017 07:07 PM, Jonathan Wakely wrote: I know it's a bit late, but I'd like to propose deprecating the libstdc++ extension that allows arithmetic on std::atomic. Currently we make it behave like arithmetic on void*, which is also a GNU extension (https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html). We also allow arithmetic on types such as std::atomic which is probably not useful (PR 69769). Why is it acceptable to have the extension for built-in types, but not for library types wrapping them? Why be inconsistent about this? I thought the extension was there for legacy code, to avoid breaking old programs, and we could deprecate it eventually. At least the manual is missing an example of where this extension is actually useful. For atomic, I don't see why we should encourage people to write new code that violates the standard... -- Marc Glisse
Re: Support Library Requirements for GCC 7.1
On Tue, 2 May 2017, Joel Sherrill wrote: I am trying to update the gcc version for rtems to 7.1 and running into trouble finding the correct versions of mpc, mpfr, and gmp. We build those as part of building gcc so we have configuration control over the set. With gcc 6.3.0, we have this in our build recipe: %define mpfr_version 2.4.2 %define mpc_version0.8.1 %define gmp_version4.3.2 I tried that with gcc 7.1.0 but the build failed complaining mpfr was too old. Could you be more precise about how the build failed? AFAIK mpfr-2.4.2 is still supposed to work. -- Marc Glisse
Re: Bug in GCC 7.1?
(I think you are looking for gcc-h...@gcc.gnu.org, or gcc's bugzilla, rather than this mailing list) On Fri, 5 May 2017, Helmut Zeisel wrote: The following program gives a warning under GCC 7.1 (built on cygwin, 64 bit) #include int main() { std::vector c {1,2,3,0}; while(c.size() > 0 && c.back() == 0) { auto sz = c.size() -1; c.resize(sz); } return 0; } $ c++7.1 -O3 tt.cxx Please use $ LC_ALL=C c++7.1 -O3 tt.cxx when you want to post the result, unless you are sending to a German forum. In Funktion »int main()«: cc1plus: Warnung: »void* __builtin_memset(void*, int, long unsigned int)«: angegebene Größe 18446744073709551612 überschreitet maximale Objektgröße 9223372036854775807 [-Wstringop-overflow=] Compiling with GCC 6.1 (c++6.1 -O3 tt.cxx) works fine. Is this a problem of my program or a problem of GCC 7.1? Sounds like a problem with gcc, maybe optimization creates a path that corresponds to size==0 and fails to notice that it cannot be taken. -- Marc Glisse
Re: Infering that the condition of a for loop is initially true?
On Thu, 14 Sep 2017, Niels Möller wrote: This is more of a question than a bug report, so I'm trying to send it to the list rather than filing a bugzilla issue. I think it's quite common to write for- and while-loops where the condition is always initially true. A simple example might be double average (const double *a, size_t n) { double sum; size_t i; assert (n > 0); for (i = 0, sum = 0; i < n; i++) sum += a[i]; return sum / n; } The programmer could do the microptimization to rewrite it as a do-while-loop instead. It would be nice if gcc could infer that the condition is initially true, and convert to a do-while loop automatically. Converting to a do-while-loop should produce slightly better code, omitting the typical jump to enter the loop at the end where the condition is checked. It would also make analysis of where variables are written more accurate, which is my main concern at the moment. Hello, assert is not what you want, since it completely disappears with -DNDEBUG. clang has __builtin_assume, with gcc you want a test and __builtin_unreachable. Replacing your assert with if(n==0)__builtin_unreachable(); gcc does skip the first test of the loop, as can be seen in the dump produced with -fdump-tree-optimized. -- Marc Glisse
Re: -pie option in ARM64 environment
On Fri, 29 Sep 2017, jacob navia wrote: I am getting this error: GNU ld (GNU Binutils for Debian) 2.28 /usr/bin/ld: error.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol `stderr@@GLIBC_2.17' can not be used when making a shared object; recompile with -fPIC The problem is, I do NOT want to make a shared object! Just a plain executable. The verbose linker options are as follows: collect2 version 6.3.0 20170516 /usr/bin/ld -plugin /usr/lib/gcc/aarch64-linux-gnu/6/liblto_plugin.so -plugin-opt=/usr/lib/gcc/aarch64-linux-gnu/6/lto-wrapper -plugin-opt=-fresolution=/tmp/cc9I00ft.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr --hash-style=gnu -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -EL -maarch64linux --fix-cortex-a53-843419 -pie -o lcc /usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/Scrt1.o /usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/crti.o /usr/lib/gcc/aarch64-linux-gnu/6/crtbeginS.o -L/usr/lib/gcc/aarch64-linux-gnu/6 -L/usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu -L/usr/lib/gcc/aarch64-linux-gnu/6/../../../../lib -L/lib/aarch64-linux-gnu -L/lib/../lib -L/usr/lib/aarch64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/aarch64-linux-gnu/6/../../.. alloc.o bind.o dag.o decl.o enode.o error.o backend-arm.o intrin.o event.o expr.o gen.o init.o input.o lex.o arm64.o list.o operators.o main.o ncpp.o output.o simp.o msg.o callwin64.o bitmasktable.o table.o stmt.o string.o stab.o sym.o Tree.o types.o analysis.o asm.o inline.o -lm ../lcclib.a ../bfd/libbfd.a ../asm/libopcodes.a -Map=lcc.map -v -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/aarch64-linux-gnu/6/crtendS.o /usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/crtn.o I think the problems lies in this mysterious "pie" option: ... --fix-cortex-a53-843419 -pie -o lcc... "PIE" could stand for Position Independent Executable. How could I get rid of that? -no-pie probably. Which text file where is responsible for adding this "pie" option to the ld command line? I am not so well versed in gcc's internals to figure out without your help. Does it show when you run "gcc -dumpspecs"? If so you could provide a different specs file. Otherwise, you could check the patches that your distribution applies to gcc, one of them likely has "pie" in its name. Easiest is likely to build gcc from the official sources, which shouldn't use pie by default. -- Marc Glisse
Re: GCC Buildbot Update - Definition of regression
On Wed, 11 Oct 2017, David Malcolm wrote: On Wed, 2017-10-11 at 11:18 +0200, Paulo Matos wrote: On 11/10/17 11:15, Christophe Lyon wrote: You can have a look at https://git.linaro.org/toolchain/gcc-compare-results.git/ where compare_tests is a patched version of the contrib/ script, it calls the main perl script (which is not the prettiest thing :-) Thanks, that's useful. I will take a look. You may also want to look at this script I wrote: https://github.com/davidmalcolm/jamais-vu (it has Python classes for working with DejaGnu output) By the way, David, how do you handle comparisons for the jit testsuite? jv gives Tests that went away in build/gcc/testsuite/jit/jit.sum: 81 --- PASS: t PASS: test- PASS: test-arith-overflow.c PASS: test-arith-overflow.c.exe iteration 1 of 5: verify_uint_over PASS: test-arith-overflow.c.exe iteration 2 of 5: verify_uint_o PASS: test-arith-overflow.c.exe iteration 3 of 5: verify [...] Tests appeared in build/gcc/testsuite/jit/jit.sum: 78 - PASS: test-arith-overflow.c.exe iteration 1 PASS: test-arith-overflow.c.exe iteration 2 of PASS: test-arith-overflow.c.exe iteration 4 of 5: verify_u PASS: test-combination. PASS: test-combination.c.exe it [...] The issue is more likely in the testsuite, but I assume you have a workflow that allows working around the issue? -- Marc Glisse
Re: gcc Bugzilla corrupt again?
On Thu, 23 Nov 2017, Jeffrey Walton wrote: On Thu, Nov 23, 2017 at 1:51 AM, Andrew Roberts wrote: I was adding a comment to bug: 81616 - Update -mtune=generic for the current Intel and AMD processors After clicking add comment it took me an an entirely different bug. I tried to add the comment again, and got a message about a "Mid Air Collision" The comment ended up the system twice (Comment 4/5). But I've never seen it take me to a different bug after adding a comment before. The "take me to a different bug" after submitting been happening for a while. In preferences, you get to choose the behavior "After changing a bug". Default is "Show next bug in my list". -- Marc Glisse
Re: gcc 7.3: Replacing global operator new/delete in shared libraries
On Tue, 6 Feb 2018, Paul Smith wrote: My environment has been using GCC 6.2 (locally compiled) on GNU/Linux systems. We use a separate heap management library (jemalloc) rather than the libc allocator. The way we did this in the past was to declare operator new/delete (all forms) as inline functions in a header Are you sure you still have all forms? The aligned versions were added in gcc-7 IIRC. and ensure that this header was always the very first thing in every source file, before even any standard header files. I know that inline operator new/delete isn't OK in the C++ standard, but in fact it has worked for us on the systems we care about. Inline usually works, but violating the ODR is harder... I would at least use the always_inline attribute to improve chances (I assume that static (or anonymous namespace) versions wouldn't work), since the optimizer may decide not to inline otherwise. Something based on visibility should be somewhat safer. But it still seems dangerous, some global libstdc++ object might be initialized using one allocator then used with another one... I'm attempting a toolchain upgrade which is switching to GCC 7.3 / binutils 2.30 (along with many other updates). Now when I run our code, I get a core on exit. It appears an STL container delete is invoking libc free() with a pointer to memory allocated by jemalloc. An example would help the discussion. My question is, what do I need to do to ensure this behavior persists if I create a global operator new/delete? Is it sufficient to ensure that the symbol for our shared library global new/delete symbols are hidden and not global, using a linker map or -fvisibility=hidden? I think so (hidden implies not-interposable, so locally bound), but I don't have much experience there. -- Marc Glisse
Re: gdb 8.x - g++ 7.x compatibility
On Wed, 7 Feb 2018, Simon Marchi wrote: On 2018-02-07 12:08, Jonathan Wakely wrote: Why would they not have a mangled name? Interesting. What do they look like, and in what context do they appear? Anywhere you need a name for linkage purposes, such as in a function signature, or as a template argument of another type, or in the std::type_info::name() for the type etc. etc. $ g++ -o test.o -c -x c++ - <<< 'struct X {}; void f(X) {} template struct Y { }; void g(Y) {}' && nm --defined-only test.o T _Z1f1X 0007 T _Z1g1YI1XE The mangled name for X is "X" and the mangled name for Y is "YI1XE" which includes the name "X". This isn't really on-topic for solving the GDB type lookup problem though. Ah ok, the class name appears mangled in other entities' mangled name. But from what I understand there's no mangled name for the class such that echo | c++filt outputs the class name (e.g. "Foo<10>"). That wouldn't make sense, since there's no symbol for the class itself. $ echo _Z1YI1XE | c++filt Y -- Marc Glisse
Re: gcc 7.3: Replacing global operator new/delete in shared libraries
On Wed, 7 Feb 2018, Paul Smith wrote: My question is, what do I need to do to ensure this behavior persists if I create a global operator new/delete? Is it sufficient to ensure that the symbol for our shared library global new/delete symbols are hidden and not global, using a linker map or -fvisibility=hidden? I think so (hidden implies not-interposable, so locally bound), but I don't have much experience there. OK I'll pursue this for now. I answered too fast. It isn't just new/delete that need to be hidden. It is also anything that uses them and might be used in both contexts. For instance, std::allocator::allocate is an inline function that calls operator new. You get one version that calls new1, and one version that calls new2. If you don't do anything special, the linker keeps only one (more or less arbitrarily). So I believe you need -fvisibility=hidden to hide everything but a few carefully chosen interfaces. -- Marc Glisse
Re: why C++ cannot alias an inline function, C can ?
(should have been on gcc-help I believe) On Sun, 1 Apr 2018, Max Filippov wrote: On Sun, Apr 1, 2018 at 5:33 AM, Jason Vas Dias wrote: Aha! But how to determine the mangled name beforehand ? Even if I compile the object without the alias, then inspect the object with objdump, there is no mangled symbol _ZL3foov defined in the object file . So I must run some name mangler / generator as a pre-processing step to get the correct mangled name string ? I guess so. Or you could define foo with C linkage: extern "C" { static inline __attribute__((always_inline)) void foo(void){} }; static inline __attribute__((always_inline,alias("foo"))) void bar(void); Or you can use an asm label to specify some arbitrary name. -- Marc Glisse
Re: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG
On Tue, 8 May 2018, Jonathan Wakely wrote: On 8 May 2018 at 14:00, Jonathan Wakely wrote: On 8 May 2018 at 13:44, Stephan Bergmann wrote: I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A process loads two dynamic libraries A and B both using std::regex, and A is compiled without -D_GLIBCXX_DEBUG while B is compiled with -D_GLIBCXX_DEBUG. This is only supported in very restricted cases. B creates an instance of std::regex, which internally creates a std::shared_ptr>>, where _NFA has various members of std::__debug::vector type (but which isn't reflected in the mangled name of that _NFA instantiation itself). Now, when that instance of std::regex is destroyed again in library B, the std::shared_ptr>>::~shared_ptr destructor (and functions it in turn calls) that happens to get picked is the (inlined, and exported due to default visibility) instance from library A. And that assumes that that _NFA instantiation has members of non-debug std::vector type, which causes a crash. Should it be considered a bug that such mixture of debug and non-debug std::regex usage causes ODR violations? Yes, but my frank response is "don't do that". The right fix here might be to ensure that _NFA always uses the non-debug vector even in Debug Mode, but I'm fairly certain there are other similar problems lurking. N.B. I think this discussion belongs on the libstdc++ list. Would it make sense to use the abi_tag attribute to help with that? (I didn't really think about it, maybe it doesn't) "don't do that" remains the most sensible answer. -- Marc Glisse
Re: About Bug 52485
On Wed, 9 May 2018, SHIH YEN-TE wrote: Want to comment on "Bug 52485 - [c++11] add an option to disable c++11 user-defined literals" It's a pity GCC doesn't support this, which forces me to give up introducing newer C++ standard into my project. I know it is ridiculous, but we must know the real world is somehow ridiculous as well as nothing is perfect. You have the wrong approach. Apparently, you are using an unmaintained library (if it was maintained, it would be compatible with C++11 by now), so there is no problem modifying it, especially just to add a few spaces. A single run of clang-tidy would likely fix all of them for you. -- Marc Glisse
Re: Unused __builtin_ia32_* builtins
On Thu, 10 May 2018, Jakub Jelinek wrote: for i in `grep __builtin_ia32 config/i386/i386-builtin.def | sed 's/^.*__builtin_ia32_/__builtin_ia32_/;s/".*$//' | sort -u`; do grep -q -w $i config/i386/*.h || echo $i; done shows many builtins not used in any of the intrinsic headers. I believe for the __builtin_ia32_* builtins we only support the intrinsics and not the builtins directly. Can we remove some of these (not necessarily all of them), after checking when and why they were added and if they were added for the intrinsic headers which now e.g. uses generic vector arith instead? When I removed their use in the intrinsic headers, I tried to remove them, but Ada people asked us to keep them https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00843.html -- Marc Glisse
Re: Generating gimple assign stmt that changes sign
On Tue, 22 May 2018, Kugan Vivekanandarajah wrote: Hi, I am looking to introduce ABSU_EXPR and that would create: unsigned short res = ABSU_EXPR (short); Note that the argument is signed and result is unsigned. As per the review, I have a match.pd entry to generate this as: (simplify (abs (convert @0)) (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))) (convert (absu @0 Not sure, but we may want a few more restrictions on this transformation. Now when gimplifying the converted tree, how do we tell that ABSU_EXPR will take a signed arg and return unsigned. I will have other match.pd entries so this will be generated while in gimple.passes too. Should I add new functions in gimple.[h|c] for this. Is there any examples I can refer to. Conversion expressions seems to be the only place where sign can change in gimple assignment but they are very specific. You'll probably want to patch genmatch.c (near get_operand_type maybe?) so it doesn't try to guess that the type of absu is the same as its argument. You can also specify a type in transformations, look for :utype or :etype in match.pd. -- Marc Glisse
Re: How to get GCC on par with ICC?
On Fri, 8 Jun 2018, Steve Ellcey wrote: On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote: When we do our own comparisons of GCC vs. ICC on benchmarks like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC (in fact it even trails in some benchmarks) unless you get to "SPEC tricks" like data structure re-organization optimizations that probably never apply in practice on real-world code (and people should fix such things at the source level being pointed at them via actually profiling their codes). Richard, I was wondering if you have any more details about these comparisions you have done that you can share? Compiler versions, options used, hardware, etc Also, were there any tests that stood out in terms of icc outperforming GCC? I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4). I used '-xHost -O3' for icc and '-march=native -mtune=native -O3' for gcc. You should use -Ofast for gcc. As mentionned earlier in the discussion, ICC has some equivalent of -ffast-math by default. The int rate numbers (running 1 copy only) were not too bad, GCC was only about 2% slower and only 525.x264_r seemed way slower with GCC. The fp rate numbers (again only 1 copy) showed a larger difference, around 20%. 521.wrf_r was more than twice as slow when compiled with GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed significant slowdowns when compiled with GCC vs. ICC. -- Marc Glisse
Re: -Wclass-memaccess warning should be in -Wextra, not -Wall
On Fri, 6 Jul 2018, Martin Sebor wrote: On 07/05/2018 05:14 PM, Soul Studios wrote: Simply because a struct has a constructor does not mean it isn't a viable target/source for use with memcpy/memmove/memset. As the documentation that Segher quoted explains, it does mean exactly that. Some classes have user-defined copy and default ctors with the same effect as memcpy/memset. In modern C++ those ctors should be defaulted (= default) and GCC should emit optimal code for them. What if I want to memcpy a std::pair? Some classes may have several states, some that are memcpy-safe, and some that are not. A user may know that at some point in their program, all the objects in a given array are safe, and want to memcpy the whole array somewhere. memcpy can also be used to work around the lack of a destructive move in C++. For instance, vector>::resize could safely use memcpy (and skip destroy before deallocate). In this particular case, we could imagine at some point in the future that the compiler would notice it is equivalent to memcpy+bzero, and then that the bzero is dead, but there are more complicated use cases for destructive move. In fact, in loops they can result in more efficient code than the equivalent memset/memcpy calls. In any case, "native" operations lend themselves more readily to code analysis than raw memory accesses and as a result allow all compilers (not just GCC) do a better a job of detecting bugs or performing interesting transformations that they may not be able to do otherwise. Having benchmarked the alternatives memcpy/memmove/memset definitely makes a difference in various scenarios. Please open bugs with small test cases showing the inefficiencies so the optimizers can be improved. Some already exist (PR 86024 seems related, there are probably some closer matches), but indeed more would be helpful. -- Marc Glisse
Re: -Wclass-memaccess warning should be in -Wextra, not -Wall
On Sun, 8 Jul 2018, Jason Merrill wrote: On Sun, Jul 8, 2018 at 6:40 PM, Marc Glisse wrote: On Fri, 6 Jul 2018, Martin Sebor wrote: On 07/05/2018 05:14 PM, Soul Studios wrote: Simply because a struct has a constructor does not mean it isn't a viable target/source for use with memcpy/memmove/memset. As the documentation that Segher quoted explains, it does mean exactly that. Some classes have user-defined copy and default ctors with the same effect as memcpy/memset. In modern C++ those ctors should be defaulted (= default) and GCC should emit optimal code for them. What if I want to memcpy a std::pair? That's fine, since the pair copy constructor is defaulted, and trivial for pair. G++ does currently warn for #include #include typedef std::pair P; void f(P*d, P const*s){ std::memcpy(d,s,sizeof(P)); } because copy-assignment is not trivial. IIRC std::pair and std::tuple are not as trivial as they could be for ABI reasons. Boost.Container chose to disable the warning ( https://github.com/boostorg/container/commit/62a8beb0f12242fb1e99daa98533ce74e735 ) instead of making their version of pair trivial. I don't know why, but maybe that was to avoid a mess of #ifdef to maintain a C++03 version of the code. Some classes may have several states, some that are memcpy-safe, and some that are not. A user may know that at some point in their program, all the objects in a given array are safe, and want to memcpy the whole array somewhere. The user may know that, but the language only defines the semantics of memcpy for trivially copyable classes. If you want to assume that the compiler will do what you expect with this instance of undefined behavior, you can turn off the warning. You may well be right, but I don't think it follows that putting this warning about undefined behavior in -Wall is wrong. (note that I am not the original reporter, I am only trying to help find examples) I don't mind the warning so much, I am more scared of the optimizations that may follow. memcpy can also be used to work around the lack of a destructive move in C++. I wonder what you mean by "the lack of a destructive move in C++", given that much of C++11 was about supporting destructive move semantics. There is a misunderstanding here. C++11 added move semantics that one might call "conservative", i.e. the moved-from object is still alive and one should eventually run its destructor. "destructive move" is used in some papers / blogs to refer to a move that also destructs the original object. For some types that are not trivially default constructible (like libstdc++'s std::deque IIRC), a conservative move is still expensive while a destructive move is trivial (memcpy). Libstdc++'s std::string is one of the rare types that are not trivially destructively movable (it can contain a pointer to itself). Most operations in std::vector could use a destructive move of V very naturally. The denomination conservative/destructive is certainly not canonical, I don't know if there are better words to describe it. For instance, vector>::resize could safely use memcpy (and skip destroy before deallocate). In this particular case, we could imagine at some point in the future that the compiler would notice it is equivalent to memcpy+bzero, and then that the bzero is dead, but there are more complicated use cases for destructive move. Indeed, resizing a vector> will loop over the outer vector calling the move constructor for each inner vector, which will copy the pointer and zero out the moved-from object, which the optimizer could then coalesce into memcpy/bzero. This sort of pattern is common enough in C++11 containers that this seems like an attractive optimization, if we don't already perform it. What more complicated uses don't reduce to memcpy/bzero, but you would still want to use memcpy for somehow? Noticing that it reduces to memcpy can be hard. For std::deque, you have to cancel a new/delete pair (which we still do not handle), and for that you may first need some loop fusion to put the new and delete next to each other. For GMP's mpz_class, the allocation is hidden in opaque mpz_init / mpz_clear functions, so the compiler cannot simplify move+destruct into memcpy. I would certainly welcome optimizer improvements that make it less useful to specialize the library, but some things are easier to do at the level of the library. -- Marc Glisse
Re: -Wclass-memaccess warning should be in -Wextra, not -Wall
On Mon, 9 Jul 2018, Martin Sebor wrote: My point to all of this (and I'm annoyed that I'm having to repeat it again, as it my first post wasn't clear enough - which it was) was that any programmer using memcpy/memmove/memset is going to know what they're getting into. No, programmers don't always know that. In fact, it's easy even for an expert programmer to make the mistake that what looks like a POD struct can safely be cleared by memset or copied by memcpy when doing so is undefined because one of the struct members is of a non-trivial type (such a container like string). Indeed, especially since some other compilers have implemented string in a way that is safe (even if theoretically UB) to memset/memcpy. Therefore it makes no sense to penalize them by getting them to write ugly, needless code - regardless of the surrounding politics/codewars. Quite a lot of thought and discussion went into the design and implementation of the warning, so venting your frustrations or insulting those of us involved in the process is unlikely to help you effect a change. To make a compelling argument you need to provide convincing evidence that we have missed an important use case. The best way to do that in this forum is with test cases and/or real world designs that are hampered by our choice. That's a high bar to meet for warnings whose documented purpose is to diagnose "constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning)." I guess the phrasing is a bit weak, "some users" obviously has to refer to a significant proportion of users, "easy to avoid" cannot have too many drawbacks (in particular, generated code should be of equivalent quality), etc. -Wclass-memaccess fits the "easy to avoid" quite well, since a simple cast disables it. -Wmaybe-uninitialized is much worse: it produces many false positives, that change with every release and are super hard to avoid. And even in the "easy to avoid" category where we don't want to litter the code with casts to quiet the warnings, I find -Wsign-compare way worse in practice than -Wclass-memaccess. -- Marc Glisse
Re: r227907 and AIX 5.[23]
On Wed, 25 Jul 2018, David Edelsohn wrote: AIX 5.3 no longer is under supported or maintained. If gcc-5+ fails to build on AIX 5.3 and patches to make it compile are not welcome, maybe some cleanup removing aix43.h, aix5*.h and whatever configure bits could help clarify things? Only when someone has the time, of course. -- Marc Glisse
Re: Can offsetting a non-null pointer result in a null one?
On Mon, 20 Aug 2018, Richard Biener wrote: On Mon, Aug 20, 2018 at 10:53 AM Andreas Schwab wrote: On Aug 20 2018, Richard Biener wrote: Btw, I can't find wording in the standards that nullptr + 1 is invoking undefined behavior, that is, that pointer arithmetic is only allowed on pointers pointing to a valid object. Any specific pointers? All of 5.7 talks about pointers pointing to objects (except when adding 0). Thanks all for the response. Working on a patch introducing infrastructure for this right now but implementing this we'd need to make sure to not hoist pointer arithmetic into blocks that might otherwise not be executed. Like if (p != 0) { q = p + 1; foo (q); } may not be optimized to q = p + 1; if (p != 0) foo (q); because then we'd elide the p != 0 check. I'm implementing the infrastructure to assume y != 0 after a stmt like z = x / y; where we'd already avoid such hoisting because it may trap at runtime. Similar "issues" would be exposed when hoisting undefined overflow stmts and we'd derive ranges for their operands. So I'm not entirely sure it's worth the likely trouble. The opposite direction may be both easier and safer, even if it won't handle everything: P p+ N is nonnull if P or N is known to be nonnull (and something similar for &p->field and others) -- Marc Glisse
Re: Can offsetting a non-null pointer result in a null one?
On Mon, 20 Aug 2018, Richard Biener wrote: P p+ N is nonnull if P or N is known to be nonnull (and something similar for &p->field and others) But we already do that. Oups... I never noticed, I should have checked. else if (code == POINTER_PLUS_EXPR) { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ if (range_is_nonnull (&vr0) || range_is_nonnull (&vr1)) set_value_range_to_nonnull (vr, expr_type); else if (range_is_null (&vr0) && range_is_null (&vr1)) set_value_range_to_null (vr, expr_type); else set_value_range_to_varying (vr); } Ah, range_is_nonnull (&vr1) is only matching ~[0,0]. We'd probably want VR_RANGE && !range_includes_zero_p here. That range_is_nonnull is probably never true due to canonicalization. That explains it. Yes please. I am surprised there isn't a helper like range_includes_zero_p or value_inside_range that takes a value_range* as argument so we don't have to worry about the type of range (the closest seems to be value_ranges_intersect_p with a singleton range, but that function seems dead and broken). When POINTER_PLUS_EXPR is changed to take a signed argument, your suggested test will need updating :-( -- Marc Glisse
__builtin_clzll and uintmax_t
Hello, the following question came up for a libstdc++ patch. We have a variable of type uintmax_t and want to count the leading zeros. Can we just call __builtin_clzll on it? In particular, can uintmax_t be larger than unsigned long long in gcc? Is __builtin_clzll available on all platforms? Is there a good reason to use __builtin_clzl instead on platforms where long and long long have the same size? In case it matters, this is strictly for compile-time computations (templates, constexpr). -- Marc Glisse
Re: __builtin_clzll and uintmax_t
Coucou FX, On Sat, 5 Mar 2011, FX wrote: uintmax_t is the largest of the standard unsigned C types, so it cannot be larger than unsigned long long. That's a gcc property then. The C99 standard only guarantees that uintmax_t is at least as large as unsigned long long, but it is allowed to be some other larger type: "The following type designates an unsigned integer type capable of representing any value of any unsigned integer type: uintmax_t" On x86_64, for example: #include #include int main (void) { printf ("%lu ", sizeof (uintmax_t)); printf ("%lu ", sizeof (int)); printf ("%lu ", sizeof (long int)); printf ("%lu ", sizeof (long long int)); printf ("%lu\n", sizeof (__int128)); } gives : 8 4 8 8 16 I am not sure how legal that is. __int128 is an extended signed integer type, and thus the statement about intmax_t should apply to it as well. So gcc is just pretending that __int128 is not really there. Is __builtin_clzll available on all platforms? Yes, we emit calls to this built-in unconditionally in the Fortran front-end, and it has caused no trouble. Thank you, that's the best guarantee I could ask for about the existence of __builtin_clzll. -- Marc Glisse
Re: Environment setting LDFLAGS ineffective after installation stage 1. Any workaround?
(gcc-help ?) On Tue, 31 May 2011, Thierry Moreau wrote: But with the gcc (latest 4.6.1 snapshot), -rpath (requested through LDFLAGS as indicated above) is effective only for executables built in stage 1 (and fixincl), but not for the installed gcc executables. Is it intentional that the LDFLAGS environment setting is partially effective during gcc build? Yes. For further stages, there is BOOT_LDFLAGS. There is also a configure option with a similar name. --with-stage1-ldflags= --with-boot-ldflags= see: http://gcc.gnu.org/install/configure.html -- Marc Glisse
Re: badly broken?!?
On Mon, 6 Jun 2011, Paolo Carlini wrote: I just built Rev 174696 and if I run the following snippet in the bash shell of an x86_64-linux machine, today I don't get any meaningful output, in particular I don't get 'ok', instead '|+000|', which I have no idea what it means: #include int main() { std::cout << "ok\n"; } Can anybody else see this crazy breakage? May be a few days old, AFAICS. 4_6-branch is perfectly fine. 174683 here on linux x64 and everything is fine. -- Marc Glisse