Re: [RFC] add push/pop pragma to control the scope of "using"

2020-01-15 Thread Marc Glisse

On Wed, 15 Jan 2020, 马江 wrote:


Hello,
 After  some google, I find there is no way to control the scope of
"using" for the moment.  This seems strange as we definitely need this
feature especially when writing inline member functions in c++
headers.

 Currently I am trying to build a simple class in a c++ header file
as following:

#include 
using namespace std;
class mytest
{
 string test_name;
 int test_val;
public:
 inline string & get_name () {return test_name;}
};


Why is mytest in the global namespace?


 As a experienced  C coder, I know that inline functions must be put
into headers or else users could only rely on LTO. And I know that to
use "using" in a header file is a bad idea as it might silently change
meanings of other codes. However, after I put all my inline functions
into the header file, I found I must write many "std::string" instead
of "string" which is totally a torture.
 Can we add something like "#pragma push_using"  (just like #pragma
pop_macro)? I believe it's feasible and probably not hard to
implement.


We try to avoid extensions in gcc, you may want to propose this to the C++ 
standard committee first. However, you should first check if modules 
(C++20) affect the issue.


--
Marc Glisse


Re: How to get the data dependency of GIMPLE variables?

2020-06-14 Thread Marc Glisse

On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote:


I am trying to analyze the following gimple statements, where the data
dependency of _23 is a tree, whose leave nodes are three constant values
{13, 4, 14}.

Could anyone shed some light on how such a backward traversal can be
implemented? Given _22 used in the last assignment, I have no idea of how
to trace back to its definition on the fourth statement... Thank you
very much!


SSA_NAME_DEF_STMT


_13 = 13;
_14 = _13 + 4;
_15 = 14;
_22 = (unsigned long) _15;
_23 = _22 + _14;


--
Marc Glisse


Re: How to get the data dependency of GIMPLE variables?

2020-06-15 Thread Marc Glisse

On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote:


Dear Marc,

Thank you very much! Just another quick question.. Can I iterate the
operands of a GIMPLE statement, like how I iterate a LLVM instruction in
the following way?

   Instruction* instr;
   for (size_t i=0; i< instr->getNumOperands();i++) {
instr->getOperand(i))
   }

Sorry for such naive questions.. I actually searched the documents and
GIMPLE pretty print for a while but couldn't find such a way of accessing
arbitrary numbers of operands...


https://gcc.gnu.org/onlinedocs/gccint/GIMPLE_005fASSIGN.html
or for lower level
https://gcc.gnu.org/onlinedocs/gccint/Logical-Operators.html#Operand-vector-allocation

But really you need to look at the code of gcc. Search for places that use 
SSA_NAME_DEF_STMT and see what they do with the result.


--
Marc Glisse


Re: Local optimization options

2020-07-05 Thread Marc Glisse

On Sun, 5 Jul 2020, Thomas König wrote:




Am 04.07.2020 um 19:11 schrieb Richard Biener :

On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König"  wrote:


What could be a preferred way to achieve that? Could optimization
options like -ffast-math be applied to blocks instead of functions?
Could we set flags on the TREE codes to allow certain optinizations?
Other things?


The middle end can handle those things on function granularity only.

Richard.


OK, so that will not work (or not without a disproportionate
amount of effort).  Would it be possible to set something like a
TREE_FAST_MATH flag on TREEs? An operation could then be
optimized according to these rules iff both operands
had that flag, and would also have it then.


In order to support various semantics on floating point operations, I was 
planning to replace some trees with internal functions, with an extra 
operand to specify various behaviors (rounding, exception, etc). Although 
at least in the beginning, I was thinking of only using those functions in 
safe mode, to avoid perf regressions.


https://gcc.gnu.org/pipermail/gcc-patches/2019-August/527040.html

This may never happen now, but it sounds similar to setting flags like 
TREE_FAST_MATH that you are suggesting. I was going with functions for 
more flexibility, and to avoid all the existing assumptions about trees. 
While I guess for fast-math, the worst the assumptions could do is clear 
the flag, which would make use optimize less than possible, not so bad.


--
Marc Glisse


Re: [RFC] Add new flag to specify output constraint in match.pd

2020-08-23 Thread Marc Glisse

On Fri, 21 Aug 2020, Feng Xue OS via Gcc wrote:


 There is a match-folding issue derived from pr94234.  A piece of code like:

 int foo (int n)
 {
int t1 = 8 * n;
int t2 = 8 * (n - 1);

return t1 - t2;
 }

It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", and
be folded to constant "8". But this folding will fail if both v1 and v2 have
multiple uses, as the following code.

 int foo (int n)
 {
int t1 = 8 * n;
int t2 = 8 * (n - 1);

use_fn (t1, t2);
return t1 - t2;
 }

Given an expression with non-single-use operands, folding it will introduce
duplicated computation in most situations, and is deemed to be unprofitable.
But it is always beneficial if final result is a constant or existing SSA value.

And the rule is:
 (simplify
  (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
  (if ((!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type)
 || (INTEGRAL_TYPE_P (type)
 && tree_expr_nonzero_p (@0)
 && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
/* If @1 +- @2 is constant require a hard single-use on either
   original operand (but not on both).  */
&& (single_use (@3) || single_use (@4)))   <- control whether match 
or not
   (mult (plusminus @1 @2) @0)))

Current matcher only provides a way to check something before folding,
but no mechanism to affect decision after folding. If has, for the above
case, we can let it go when we find result is a constant.


:s already has a counter-measure where it still folds if the output is at 
most one operation. So this transformation has a counter-counter-measure 
of checking single_use explicitly. And now we want a counter^3-measure...



Like the way to describe input operand using flags, we could also add
a new flag to specify this kind of constraint on output that we expect
it is a simple gimple value.

Proposed syntax is

 (opcode:v{ condition } )

The char "v" stands for gimple value, if more descriptive, other char is
preferred. "condition" enclosed by { } is an optional c-syntax condition
expression. If present, only when "condition" is met, matcher will check
whether folding result is a gimple value using
gimple_simplified_result_is_gimple_val ().

Since there is no SSA concept in GENERIC, this is only for GIMPLE-match,
not GENERIC-match.

With this syntax, the rule is changed to

#Form 1:
 (simplify
  (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
  (if ((!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type)
 || (INTEGRAL_TYPE_P (type)
 && tree_expr_nonzero_p (@0)
 && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))
  ( if (!single_use (@3) && !single_use (@4))
 (mult:v (plusminus @1 @2) @0)))
 (mult (plusminus @1 @2) @0)


That seems to match what you can do with '!' now (that's very recent).


#Form 2:
 (simplify
  (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
  (if ((!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type)
 || (INTEGRAL_TYPE_P (type)
 && tree_expr_nonzero_p (@0)
 && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))
 (mult:v{ !single_use (@3) && !single_use (@4 } (plusminus @1 @2) @0


Indeed, something more flexible than '!' would be nice, but I am not so 
sure about this version. If we are going to allow inserting code after 
resimplification and before validation, maybe we should go even further 
and let people insert arbitrary code there...


--
Marc Glisse


Re: [RFC] Add new flag to specify output constraint in match.pd

2020-09-02 Thread Marc Glisse

On Wed, 2 Sep 2020, Richard Biener via Gcc wrote:


On Mon, Aug 24, 2020 at 8:20 AM Feng Xue OS via Gcc  wrote:



  There is a match-folding issue derived from pr94234.  A piece of code like:

  int foo (int n)
  {
 int t1 = 8 * n;
 int t2 = 8 * (n - 1);

 return t1 - t2;
  }

 It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", 
and
 be folded to constant "8". But this folding will fail if both v1 and v2 have
 multiple uses, as the following code.

  int foo (int n)
  {
 int t1 = 8 * n;
 int t2 = 8 * (n - 1);

 use_fn (t1, t2);
 return t1 - t2;
  }

 Given an expression with non-single-use operands, folding it will introduce
 duplicated computation in most situations, and is deemed to be unprofitable.
 But it is always beneficial if final result is a constant or existing SSA 
value.

 And the rule is:
  (simplify
   (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
   (if ((!ANY_INTEGRAL_TYPE_P (type)
|| TYPE_OVERFLOW_WRAPS (type)
|| (INTEGRAL_TYPE_P (type)
&& tree_expr_nonzero_p (@0)
&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
   /* If @1 +- @2 is constant require a hard single-use on either
  original operand (but not on both).  */
   && (single_use (@3) || single_use (@4)))   <- control whether match 
or not
(mult (plusminus @1 @2) @0)))

 Current matcher only provides a way to check something before folding,
 but no mechanism to affect decision after folding. If has, for the above
 case, we can let it go when we find result is a constant.


:s already has a counter-measure where it still folds if the output is at
most one operation. So this transformation has a counter-counter-measure
of checking single_use explicitly. And now we want a counter^3-measure...


Counter-measure is key factor to matching-cost.  ":s" seems to be somewhat
coarse-grained. And here we do need more control over it.

But ideally, we could decouple these counter-measures from definitions of
match-rule, and let gimple-matcher get a more reasonable match-or-not
decision based on these counters. Anyway, it is another story.


 Like the way to describe input operand using flags, we could also add
 a new flag to specify this kind of constraint on output that we expect
 it is a simple gimple value.

 Proposed syntax is

  (opcode:v{ condition } )

 The char "v" stands for gimple value, if more descriptive, other char is
 preferred. "condition" enclosed by { } is an optional c-syntax condition
 expression. If present, only when "condition" is met, matcher will check
 whether folding result is a gimple value using
 gimple_simplified_result_is_gimple_val ().

 Since there is no SSA concept in GENERIC, this is only for GIMPLE-match,
 not GENERIC-match.

 With this syntax, the rule is changed to

 #Form 1:
  (simplify
   (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
   (if ((!ANY_INTEGRAL_TYPE_P (type)
|| TYPE_OVERFLOW_WRAPS (type)
|| (INTEGRAL_TYPE_P (type)
&& tree_expr_nonzero_p (@0)
&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))
   ( if (!single_use (@3) && !single_use (@4))
  (mult:v (plusminus @1 @2) @0)))
  (mult (plusminus @1 @2) @0)


That seems to match what you can do with '!' now (that's very recent).


It's also what :s does but a slight bit more "local".  When any operand is
marked :s and it has more than a single-use we only allow simplifications
that do not require insertion of extra stmts.  So basically the above pattern
doesn't behave any different than if you omit your :v.  Only if you'd
place :v on an inner expression there would be a difference.  Correlating
the inner expression we'd not want to insert new expressions for with
a specific :s (or multiple ones) would be a more natural extension of what
:s provides.

Thus, for the above case (Form 1), you do not need :v at all and :s works.


Let's consider that multiplication is expensive. We have code like 
5*X-3*X, which can be simplified to 2*X. However, if both 5*X and 3*X have 
other uses, that would increase the number of multiplications. :s would 
not block a simplification to 2*X, which is a single stmt. So the existing 
transformation has extra explicit checks for single_use. And those extra 
checks block the transformation even for 5*X-4*X -> X which does not 
increase the number of multiplications. Which is where '!' (or :v here) 
comes in.


Or we could decide that the extra multiplication is not that bad if it 
saves an addition, simplifies the expression, possibly gains more insn 
parallelism, etc, in which case we could just drop the existing hard 
single_use check...


--
Marc Glisse


Re: A couple GIMPLE questions

2020-09-05 Thread Marc Glisse

On Sat, 5 Sep 2020, Gary Oblock via Gcc wrote:


First off one of the questions just me being curious but
second is quite serious. Note, this is GIMPLE coming
into my optimization and not something I've modified.

Here's the C code:

type_t *
do_comp( type_t *data, size_t len)
{
 type_t *res;
 type_t *x = min_of_x( data, len);
 type_t *y = max_of_y( data, len);

 res = y;
 if ( x < y ) res = 0;
 return res;
}

And here's the resulting GIMPLE:

;; Function do_comp.constprop (do_comp.constprop.0, funcdef_no=5, 
decl_uid=4392, cgraph_uid=3, symbol_order=68) (executed once)

do_comp.constprop (struct type_t * data)
{
 struct type_t * res;
 struct type_t * x;
 struct type_t * y;
 size_t len;

  [local count: 1073741824]:

  [local count: 1073741824]:
 x_2 = min_of_x (data_1(D), 1);
 y_3 = max_of_y (data_1(D), 1);
 if (x_2 < y_3)
   goto ; [29.00%]
 else
   goto ; [71.00%]

  [local count: 311385128]:

  [local count: 1073741824]:
 # res_4 = PHI 
 return res_4;

}

The silly question first. In the "if" stmt how does GCC
get those probabilities? Which it shows as 29.00% and
71.00%. I believe they should both be 50.00%.


See the profile_estimate pass dump. One branch makes the function return 
NULL, which makes gcc guess that it may be a bit less likely than the 
other. Those are heuristics, which are tuned to help on average, but of 
course they are sometimes wrong.



The serious question is what is going on with this phi?
   res_4 = PHI 

This makes zero sense practicality wise to me and how is
it supposed to be recognized and used? Note, I really do
need to transform the "0B" into something else for my
structure reorganization optimization.


That's not a question? Are you asking why PHIs exist at all? They are the 
standard way to represent merging in SSA representations. You can iterate 
on the PHIs of a basic block, etc.



CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and contains information that is 
confidential and proprietary to Ampere Computing or its subsidiaries. It is to 
be used solely for the purpose of furthering the parties' business 
relationship. Any unauthorized review, copying, or distribution of this email 
(or any attachments thereto) is strictly prohibited. If you are not the 
intended recipient, please contact the sender immediately and permanently 
delete the original and any copies of this email and any attachments thereto.


Could you please get rid of this when posting on public mailing lists?

--
Marc Glisse


Re: Installing a generated header file

2020-11-12 Thread Marc Glisse

On Thu, 12 Nov 2020, Bill Schmidt via Gcc wrote:

Hi!  I'm working on a project where it's desirable to generate a 
target-specific header file while building GCC, and install it with the 
rest of the target-specific headers (i.e., in 
lib/gcc//11.0.0/include).  Today it appears that only those 
headers listed in "extra_headers" in config.gcc will be placed there, 
and those are assumed to be found in gcc/config/.  In my case, 
the header file will end up in my build directory instead.


Questions:

* Has anyone tried something like this before?  I didn't find anything.
* If so, can you please point me to an example?
* Otherwise, I'd be interested in advice about providing new infrastructure to 
support
 this.  I'm a relative noob with respect to the configury code, and I'm sure my
 initial instincts will be wrong. :)


Does the i386 mm_malloc.h file match your scenario?

--
Marc Glisse


Re: Reassociation and trapping operations

2020-11-24 Thread Marc Glisse

On Wed, 25 Nov 2020, Ilya Leoshkevich via Gcc wrote:


I have a C floating point comparison (a <= b && a >= b), which
test_for_singularity turns into (a <= b && a == b) and vectorizer turns
into ((a <= b) & (a == b)).  So far so good.

eliminate_redundant_comparison, however, turns it into just (a == b).
I don't think this is correct, because (a <= b) traps and (a == b)
doesn't.



Hello,

let me just mention the old
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53805
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53806

There has been some debate about the exact meaning of -ftrapping-math, but 
don't let that stop you.


--
Marc Glisse


Re: The conditions when convert from double to float is permitted?

2020-12-10 Thread Marc Glisse

On Thu, 10 Dec 2020, Xionghu Luo via Gcc wrote:


I have a maybe silly question about whether there is any *standard*
or *options* (like -ffast-math) for GCC that allow double to float
demotion optimization?  For example,

1) from PR22326:

#include 

float foo(float f, float x, float y) {
return (fabs(f)*x+y);
}

The fabs will return double result but it could be demoted to float
actually since the function returns float finally.


With fp-contract, this is (float)fma((double)f,(double)x,(double)y). This 
could almost be transformed into fmaf(f,x,y), except that the double 
rounding may not be strictly equivalent. Still, that seems like it would 
be no problem with -funsafe-math-optimizations, just like turning 
(float)((double)x*(double)y) into x*y, as long as it is a single operation 
with casts on all inputs and output. Whether there are cases that can be 
optimized without -funsafe-math-optimizations is harder to tell.


--
Marc Glisse


Re: Integer division on x86 -m32

2020-12-10 Thread Marc Glisse

On Thu, 10 Dec 2020, Lucas de Almeida via Gcc wrote:


when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to
__divdi3 is always output, even though it seems the use of the idiv
instruction could be faster.


IIRC, idiv requires that the quotient fit in 32 bits, while your C code 
doesn't. (1LL << 60) / 3 would cause an error with idiv.


It would be possible to use idiv in some cases, if the compiler can prove 
that variables are in the right range, but that's not so easy. You can use 
inline asm to force the use of idiv if you know it is safe for your case, 
the most common being modular arithmetic: if you know that uint32_t a, b, 
c, d are smaller than m (and m!=0), you can compute a*b+c+d in uint64_t, 
then use div to compute that modulo m.


--
Marc Glisse


Re: What is the type of vector signed + vector unsigned?

2020-12-29 Thread Marc Glisse

On Tue, 29 Dec 2020, Richard Sandiford via Gcc wrote:


Any thoughts on what f should return in the following testcase, given the
usual GNU behaviour of treating signed >> as arithmetic shift right?

   typedef int vs4 __attribute__((vector_size(16)));
   typedef unsigned int vu4 __attribute__((vector_size(16)));
   int
   f (void)
   {
 vs4 x = { -1, -1, -1, -1 };
 vu4 y = { 0, 0, 0, 0 };
 return ((x + y) >> 1)[0];
   }

The C frontend takes the type of x+y from the first operand, so x+y
is signed and f returns -1.


Symmetry is an important property of addition in C/C++.


The C++ frontend applies similar rules to x+y as it would to scalars,
with unsigned T having a higher rank than signed T, so x+y is unsigned
and f returns 0x7fff.


That looks like the most natural choice.


FWIW, Clang treats x+y as signed, so f returns -1 for both C and C++.


I think clang follows gcc and uses the type of the first operand.

--
Marc Glisse


Re: bug in DSE?

2021-02-12 Thread Marc Glisse

On Fri, 12 Feb 2021, Andrew MacLeod via Gcc wrote:

I dont't want to immediately open a PR,  so I'll just ask about 
testsuite/gcc.dg/pr83609.c.


the compilation string  is
  -O2 -fno-tree-forwprop -fno-tree-ccp -fno-tree-fre -fno-tree-pre 
-fno-code-hoisting


Which passes as is.

if I however add   -fno-tree-vrp   as well, then it looks like dead store 
maybe does something wong...


with EVRP running, we translate function foo() from


complex float foo ()
{
  complex float c;
  complex float * c.0_1;
  complex float _4;

   :
  c.0_1 = &c;
  MEM[(long long unsigned int *)c.0_1] = 1311768467463790320;
  _4 = c;


Isn't that a clear violation of strict aliasing?

--
Marc Glisse


Re: Possible issue with ARC gcc 4.8

2015-07-05 Thread Marc Glisse

On Mon, 6 Jul 2015, Vineet Gupta wrote:


It is the C language standard that says that shifts like this invoke
undefined behavior.


Right, but the compiler is a program nevertheless and it knows what to do when 
it
sees 1 << 62
It's not like there is an uninitialized variable or something which will provide
unexpected behaviour.
More importantly, the question is can ports define a specific behaviour for such
cases and whether that would be sufficient to guarantee the semantics.

The point being ARC ISA provides a neat feature where core only considers lower 
5
bits of bitpos operands. Thus we can make such behaviour not only deterministic 
in
the context of ARC, but also optimal, eliding the need for doing specific
masking/clamping to 5 bits.


IMO, writing a << (b & 31) instead of a << b has only advantages. It 
documents the behavior you are expecting. It makes the code 
standard-conformant and portable. And the back-ends can provide patterns 
for exactly this so they generate a single insn (the same as for a << b).


When I see x << 1024, 0 is the only value that makes sense to me, and I'd 
much rather get undefined behavior (detected by sanitizers) than silently 
get 'x' back.


--
Marc Glisse


Re: [RFH] Move some flag_unsafe_math_optimizations using simplify and match

2015-08-11 Thread Marc Glisse

On Fri, 7 Aug 2015, Hurugalawadi, Naveen wrote:


Please find attached the patch "simplify-1.patch" that moves some
"flag_unsafe_math_optimizations" from fold-const.c to simplify and match.


Some random comments (not a review).

First, patches go to gcc-patc...@gcc.gnu.org.


 /* fold_builtin_logarithm */
 (if (flag_unsafe_math_optimizations)


Please indent everything below by one space.


+
+/* Simplify sqrt(x) * sqrt(x) -> x.  */
+(simplify
+ (mult:c (SQRT @0) (SQRT @0))


(mult (SQRT@1 @0) @1)


+ (if (!HONOR_SNANS (element_mode (type)))


You don't need element_mode here, HONOR_SNANS (type) should do the right
thing.


+  @0))
+
+/* Simplify root(x) * root(y) -> root(x*y).  */
+/* FIXME : cbrt ICE's with AArch64.  */
+(for root (SQRT CBRT)


Indent below.


+(simplify
+ (mult:c (root @0) (root @1))


No need to commute, it yields the same pattern. On the other hand, you
may want root:s since if the roots are going to be computed anyway, a
multiplication is cheaper than computing yet another root (I didn't
check what the existing code does).
(this applies to several other patterns)


+  (root (mult @0 @1
+
+/* Simplify expN(x) * expN(y) -> expN(x+y). */
+(for exps (EXP EXP2)
+/* FIXME : exp2 ICE's with AArch64.  */
+(simplify
+ (mult:c (exps @0) (exps @1))
+  (exps (plus @0 @1


I am wondering if we should handle mixed operations (say
expf(x)*exp2(y)), for this pattern and others, but that's not a
prerequisite.


+
+/* Simplify pow(x,y) * pow(x,z) -> pow(x,y+z). */
+(simplify
+ (mult:c (POW @0 @1) (POW @0 @2))
+  (POW @0 (plus @1 @2)))
+
+/* Simplify pow(x,y) * pow(z,y) -> pow(x*z,y). */
+(simplify
+ (mult:c (POW @0 @1) (POW @2 @1))
+  (POW (mult @0 @2) @1))
+
+/* Simplify tan(x) * cos(x) -> sin(x). */
+(simplify
+ (mult:c (TAN @0) (COS @0))
+  (SIN @0))


Since this will only trigger for the same version of cos and tan (say cosl 
with tanl or cosf with tanf), I am wondering if we get smaller code with a 
linear 'for' or with a quadratic 'for' which shares the same tail (I 
assume the above is quadratic, I did not check). This may depend on 
Richard's latest patches.



+
+/* Simplify x * pow(x,c) -> pow(x,c+1). */
+(simplify
+ (mult:c @0 (POW @0 @1))
+ (if (TREE_CODE (@1) == REAL_CST
+  && !TREE_OVERFLOW (@1))
+  (POW @0 (plus @1 { build_one_cst (type); }
+
+/* Simplify sin(x) / cos(x) -> tan(x). */
+(simplify
+ (rdiv (SIN @0) (COS @0))
+  (TAN @0))
+
+/* Simplify cos(x) / sin(x) -> 1 / tan(x). */
+(simplify
+ (rdiv (COS @0) (SIN @0))
+  (rdiv { build_one_cst (type); } (TAN @0)))
+
+/* Simplify sin(x) / tan(x) -> cos(x). */
+(simplify
+ (rdiv (SIN @0) (TAN @0))
+ (if (! HONOR_NANS (@0)
+  && ! HONOR_INFINITIES (element_mode (@0)))
+  (cos @0)))
+
+/* Simplify tan(x) / sin(x) -> 1.0 / cos(x). */
+(simplify
+ (rdiv (TAN @0) (SIN @0))
+ (if (! HONOR_NANS (@0)
+  && ! HONOR_INFINITIES (element_mode (@0)))
+  (rdiv { build_one_cst (type); } (COS @0
+
+/* Simplify pow(x,c) / x -> pow(x,c-1). */
+(simplify
+ (rdiv (POW @0 @1) @0)
+ (if (TREE_CODE (@1) == REAL_CST
+  && !TREE_OVERFLOW (@1))
+  (POW @0 (minus @1 { build_one_cst (type); }
+
+/* Simplify a/root(b/c) into a*root(c/b).  */
+/* FIXME : cbrt ICE's with AArch64.  */
+(for root (SQRT CBRT)
+(simplify
+ (rdiv @0 (root (rdiv @1 @2)))
+  (mult @0 (root (rdiv @2 @1)
+
+/* Simplify x / expN(y) into x*expN(-y). */
+/* FIXME : exp2 ICE's with AArch64.  */
+(for exps (EXP EXP2)
+(simplify
+ (rdiv @0 (exps @1))
+  (mult @0 (exps (negate @1)
+
+/* Simplify x / pow (y,z) -> x * pow(y,-z). */
+(simplify
+ (rdiv @0 (POW @1 @2))
+  (mult @0 (POW @1 (negate @2
+
  /* Special case, optimize logN(expN(x)) = x.  */
  (for logs (LOG LOG2 LOG10)
   exps (EXP EXP2 EXP10)


--
Marc Glisse


Re: Replacing malloc with alloca.

2015-09-14 Thread Marc Glisse

On Sun, 13 Sep 2015, Ajit Kumar Agarwal wrote:


The replacement of malloc with alloca can be done on the following analysis.

If the lifetime of an object does not stretch beyond the immediate scope. In 
such cases the malloc can be replaced with alloca.
This increases the performance to a great extent.

Inlining helps to a great extent the scope of lifetime of an object doesn't 
stretch the immediate scope of an object.
And the scope of replacing malloc with alloca can be identified.

I am wondering what phases of our optimization pipeline the malloc is replaced 
with alloca and what analysis is done to transform
The malloc with alloca. This greatly increases the performance of the 
benchmarks? Is the analysis done through Escape Analysis?

If yes, then what data structure is used for the abstract execution 
interpretation?


Did you try it? I don't think gcc ever replaces malloc with alloca. The 
only optimization we do with malloc/free is removing it when it is 
obviously unused. There are several PRs open about possible optimizations 
(19831 for instance).


I posted a WIP patch a couple years ago to replace some malloc+free with 
local arrays (fixed length) but never had time to finish it.

https://gcc.gnu.org/ml/gcc-patches/2013-11/msg03108.html

--
Marc Glisse


Re: Multiprecision Arithmetic Builtins

2015-09-21 Thread Marc Glisse

On Mon, 21 Sep 2015, Florian Weimer wrote:


On 09/21/2015 08:09 AM, Oleg Endo wrote:

Hi all,

I was thinking of adding some SH specific builtin functions for the
addc, subc and negc instructions.

Are there any plans to add clang's target independent multiprecision
arithmetic builtins (http://clang.llvm.org/docs/LanguageExtensions.html)
to GCC?


Do you mean these?

<https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html>

Is there something else that is missing?


http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins

Those that take a carryin argument.

--
Marc Glisse


Re: avoiding recursive calls of calloc due to optimization

2015-09-21 Thread Marc Glisse

On Mon, 21 Sep 2015, Daniel Gutson wrote:


This is derived from https://gcc.gnu.org/ml/gcc-help/2015-03/msg00091.html

Currently, gcc provides an optimization that transforms a call to
malloc and a call to memset into a call to calloc.
This is fine except when it takes place within the calloc() function
implementation itself, causing a recursive call.
Two alternatives have been proposed: -fno-malloc-builtin and disable
optimizations in calloc().
I think the former is suboptimal since it affects all the code just
because of the implementation of one function (calloc()),
whereas the latter is suboptimal too since it disables the
optimizations in the whole function (calloc too).
I think of two alternatives: either make -fno-calloc-builtin to
disable the optimization, or make the optimization aware of the
function context where it is operating and prevent it to do the
transformation if the function is calloc().

Please help me to find the best alternative so we can implent it.


You may want to read this PR for more context

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888#c27

--
Marc Glisse


Re: complex support when using -std=c++11

2015-11-12 Thread Marc Glisse

On Thu, 12 Nov 2015, D Haley wrote:

I am currently trying to understand an issue to do with complex number 
support in gcc.


Consider the following code:

#include 
int main()
{
   float _Complex  a = _Complex_I;

}

Attempting to compile this with  these commands is fine:
$ g++ tmp.cpp -std=gnu++11
$ g++ tmp.cpp

Clang is also fine:
$ clang tmp.cpp -std=c++11


Not here, I am getting the same error with clang (or "use of undeclared 
identifier '_Complex_I'" with libc++). This probably depends more on your 
libc.



Attempting to compile with c++11 is not:
$ g++ tmp.cpp -std=c++11
In file included from /usr/include/c++/5/complex.h:36:0,
from tmp.cpp:2:
tmp.cpp: In function ‘int main()’:
tmp.cpp:5:29: error: unable to find numeric literal operator ‘operator""iF’
float _Complex  a = _Complex_I;
^
tmp.cpp:5:29: note: use -std=gnu++11 or -fext-numeric-literals to enable more 
built-in suffixes


I'm using debian testing's gcc:
$ gcc --version
gcc (Debian 5.2.1-17) 5.2.1 20150911
...


I discussed this on #gcc, and it was suggested (or I misunderstood) that this 
is intentional, and the library should not support c-type C++ primitives - 
however I can find no deprecation notice for this, nor does it appear that 
the c++11 standard (as far as I can see from a quick skim) has changed the 
behaviour in this regard.


Is this intended behaviour, or is this a bug? This behaviour was noticed when 
troubleshooting compilation behaviours in mathgl.


https://groups.google.com/forum/?_escaped_fragment_=topic/mathgl/cl4uYygPmOU#!topic/mathgl/cl4uYygPmOU


C++11, for some unknown reason, decided to hijack the C header complex.h 
and make it equivalent to the C++ header complex. The fact that you are 
still getting _Complex_I defined is already a gcc extension, as is 
providing _Complex in C++.


The C++ standard introduced User Defined Literals, which prevents the 
compiler from recognizing extra suffixes like iF in standard mode (why are 
so many people using c++11 and not gnu++11?).


Our support for complex.h in C++11 in gcc is kind of best-effort. In this 
case, I can think of a couple ways we could improve this


* _Complex_I is defined as (__extension__ 1.0iF). Maybe __extension__ 
could imply -fext-numeric-literals?


* glibc could define _Complex_I some other way, or libstdc++ could 
redefine it to some other safer form (for some reason __builtin_complex is 
currently C-only).


--
Marc Glisse


Re: GCC 5.4 Status report (2015-12-04)

2015-12-04 Thread Marc Glisse

On Fri, 4 Dec 2015, NightStrike wrote:


Will there be another 4.9 release, too?  I'm really hoping that branch
can stay open a bit, since I can't upgrade to the new std::string
implementation yet.


Uh? The new ABI in libstdc++ is supposed to be optional, you can still use 
the old std::string in gcc-5, can't you?


--
Marc Glisse


RE: GCC Front-End Questions

2015-12-08 Thread Marc Glisse

On Tue, 8 Dec 2015, Jodi A. Miller wrote:

One algebraic simplification we are seeing is particularly interesting.  
Given the following code snippet intended to check for buffer overflow, 
which is actually undefined behavior in C++, we expected to maybe see 
the if check optimized away entirely.




char buffer[100];
int length;  //value received through argument or command line
.
.
If (buffer + length < buffer)
{
    cout << "Overflow" << endl;
}


Instead, our assembly code showed that the conditional was changed to 
length < 0, which is not what was intended at all.  Again, this showed 
up in the first IR file generated with g++ so we are thinking it 
happened in the compiler front-end, which is surprising.  Any thoughts 
on this?  In addition, when the above conditional expression is not used 
as part of an if check (e.g., assigned to a Boolean), it is not 
simplified.


Those optimizations during parsing exist mostly for historical reasons, 
and we are slowly moving away from them. You can look for any function 
call including "fold" in its name in the front-end. They work on 
expressions and mostly consist of matching patterns (described in 
fold-const.c and match.pd), like p + n < p in this case.


--
Marc Glisse


Re: Strange C++ function pointer test

2015-12-31 Thread Marc Glisse

On Thu, 31 Dec 2015, Dominik Vogt wrote:


This snippet ist from the Plumhall 2014 xvs test suite:

 #if CXX03 || CXX11 || CXX14
 static float (*p1_)(float) = abs;
 ...
 checkthat(__LINE__, p1_ != 0);
 #endif

(With the testsuite specific macros doing the obvious).  abs() is
declared as:

 int abs(int j)

Am I missing some odd C++ feature or is that part of the test just
plain wrong?  I don't know where to look in the C++ standard; is
this supposed to compile (with or without a warning?) or generate
an error or is it just undefined?

 error: invalid conversion from ‘int (*)(int) throw ()’ to ‘float (*)(float)’ 
[-fpermissive]

(Of course even with -fpermissive this won't work because (at
least on my platform) ints are passed in different registers than
floats.)


There are other overloads of 'abs' declared in math.h / cmath (only in 
namespace std in the second case, and there are bugs (or standard issues) 
about having them in the global namespace for the first one).


--
Marc Glisse


Re: Strange C++ function pointer test

2015-12-31 Thread Marc Glisse

On Thu, 31 Dec 2015, Jonathan Wakely wrote:


There are other overloads of 'abs' declared in math.h / cmath (only in
namespace std in the second case, and there are bugs (or standard issues)
about having them in the global namespace for the first one).


That's not quite accurate, C++11 was altered slightly to reflect reality.

 is required to declare std::abs and it's unspecified whether
it also declares it as ::abs.

 is required to declare ::abs and it's unspecified whether it
also declares it as std::abs.


$ cat a.cc
#include 
int main(){
  abs(3.5);
}

$ g++-snapshot a.cc -c -Wall -W
a.cc: In function 'int main()':
a.cc:3:10: error: 'abs' was not declared in this scope
   abs(3.5);
  ^

That's what I called "bug" in my message (there are a few bugzilla PRs for 
this). It would probably work on Solaris.


And I seem to remember there are at least 2 open LWG issues on the topic, 
one saying that the C++11 change didn't go far enough to match reality, 
since it still documents C headers differently from the C standard, and 
one saying that all overloads of abs should be declared as soon as one is 
(yes, they contradict each other).


--
Marc Glisse


Re: Strange C++ function pointer test

2015-12-31 Thread Marc Glisse

On Thu, 31 Dec 2015, Dominik Vogt wrote:


The minimal failing program is

-- abs.C --
#include 
static float (*p1_)(float) = abs;
-- abs.C --


This is allowed to fail. If you include math.h (in addition or instead of 
stdlib.h), it has to work (gcc bug if it doesn't).


See also
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2294

--
Marc Glisse


Re: getting bugzilla access for my account

2016-01-02 Thread Marc Glisse

On Sat, 2 Jan 2016, Mike Frysinger wrote:


seeing as how i have commit access to the gcc tree, could i have
my bugzilla privs extended as well ?  atm i only have normal ones
which means i only get to edit my own bugs ... can't dupe/update
other ones people have filed.  couldn't seem to find docs for how
to request this, so spamming this list.

my account on gcc.gnu.org/bugzilla is "vap...@gentoo.org".


Permissions are automatic for @gcc addresses, you should create a new 
account with that one (you can make it follow the old account, etc).


--
Marc Glisse


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-20 Thread Marc Glisse

On Sat, 20 Feb 2016, H.J. Lu wrote:


On Fri, Feb 19, 2016 at 1:07 PM, Richard Smith  wrote:

On Fri, Feb 19, 2016 at 5:35 AM, Michael Matz  wrote:

Hi,

On Thu, 18 Feb 2016, Richard Smith wrote:


An empty type is a type where it and all of its subobjects
(recursively) are of class, structure, union, or array type.  No
memory slot nor register should be used to pass or return an object
of empty type.


The trivially copyable is gone again.  Why is it not necessary?


The C++ ABI doesn't defer to the C psABI for types that aren't
trivially-copyable. See
http://mentorembedded.github.io/cxx-abi/abi.html#normal-call


Hmm, yes, but we don't want to define something for only C and C++, but
language independend (so far as possible).  And given only the above
language I think this type:

struct S {
  S() {something();}
};

would be an empty type, and that's not what we want.


Yes it is. Did you mean to give S a copy constructor, copy assignment
operator, or destructor instead?


"Trivially copyable"
is a reasonably common abstraction (if in doubt we could even define it in
the ABI), and captures the idea that we need well (namely that a bit-copy
is enough).


In this case:

struct dummy0
{
};

struct dummy
{
 dummy0 d[20];

 dummy0 * foo (int i);
};

dummy0 *
dummy::foo (int i)
{
 return &d[i];
}

dummy0 *
bar (dummy d, int i)
{
 return d.foo (i);
}

dummy shouldn't be passed as empty type.


Why not?

We need to have a clear definition for what kinds of member functions 
are allowed in an empty type.


--
Marc Glisse


Re: Subtyping support in GCC?

2016-03-23 Thread Marc Glisse

On Wed, 23 Mar 2016, Jason Chagas wrote:


The the ARM compiler (armcc) provides a subtyping ($Sub/$Super)
mechanism useful as a patching technique (see links below for
details). Can someone tell me if GCC has similar support? If so, where
can I learn more about it?

FYI, before posting this question here, I researched the web
extensivelly on this topic. There seems to be some GNU support for
subtyping in C++.  But I had no luck finding any information
specifically for 'C'.

Thanks,

Jason

How to use $Super$$ and $Sub$$ for patching data?:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15416.html

Using $Super$$ and $Sub$$ to patch symbol definitions:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474c/Chdefdce.html


(the best list would have been gcc-h...@gcc.gnu.org)

GNU ld has an option --wrap=symbol. Does that roughly match your need?

--
Marc Glisse


Re: Constexpr in intrinsics?

2016-03-27 Thread Marc Glisse

On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote:


Would it be possible to add constexpr to the intrinsics headers?

For instance _mm_set_XX and _mm_setzero intrinsics.


Already suggested here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197

A patch would be welcome (I started doing it at some point, I don't 
remember if it was functional, the patch is attached).



Ideally it could also be added all intrinsics that can be evaluated at compile
time, but it is harder to tell which those are.

Does gcc have a C extension we can use to set constexpr?


What for?

--
Marc GlisseIndex: gcc/config/i386/avx2intrin.h
===
--- gcc/config/i386/avx2intrin.h(revision 223886)
+++ gcc/config/i386/avx2intrin.h(working copy)
@@ -93,41 +93,45 @@ _mm256_packus_epi32 (__m256i __A, __m256
   return (__m256i)__builtin_ia32_packusdw256 ((__v8si)__A, (__v8si)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_packus_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i)__builtin_ia32_packuswb256 ((__v16hi)__A, (__v16hi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi8 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v32qu)__A + (__v32qu)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v16hu)__A + (__v16hu)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi32 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v8su)__A + (__v8su)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi64 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v4du)__A + (__v4du)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_adds_epi8 (__m256i __A, __m256i __B)
@@ -167,20 +171,21 @@ _mm256_alignr_epi8 (__m256i __A, __m256i
 }
 #else
 /* In that case (__N*8) will be in vreg, and insn will not be matched. */
 /* Use define instead */
 #define _mm256_alignr_epi8(A, B, N)   \
   ((__m256i) __builtin_ia32_palignr256 ((__v4di)(__m256i)(A), \
(__v4di)(__m256i)(B),  \
(int)(N) * 8))
 #endif
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_and_si256 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v4du)__A & (__v4du)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_andnot_si256 (__m256i __A, __m256i __B)
@@ -219,69 +224,77 @@ _mm256_blend_epi16 (__m256i __X, __m256i
   return (__m256i) __builtin_ia32_pblendw256 ((__v16hi)__X,
  (__v16hi)__Y,
   __M);
 }
 #else
 #define _mm256_blend_epi16(X, Y, M)\
   ((__m256i) __builtin_ia32_pblendw256 ((__v16hi)(__m256i)(X), \
(__v16hi)(__m256i)(Y), (int)(M)))
 #endif
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi8 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v32qi)__A == (__v32qi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v16hi)__A == (__v16hi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi32 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v8si)__A == (__v8si)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi64 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v4di)__A == (__v4di)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi8 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v32qi)__A > (__v32qi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v16hi)__A > (__v16hi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi32 (__m256i __A, __m256i __B)
 {
   return (__m256i) (

Re: Constexpr in intrinsics?

2016-03-28 Thread Marc Glisse

On Mon, 28 Mar 2016, Allan Sandfeld Jensen wrote:


On Sunday 27 March 2016, Marc Glisse wrote:

On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote:

Would it be possible to add constexpr to the intrinsics headers?

For instance _mm_set_XX and _mm_setzero intrinsics.


Already suggested here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197

A patch would be welcome (I started doing it at some point, I don't
remember if it was functional, the patch is attached).


That looks very similar to the patch I experimented with, and that at least
works for using them in C++11 constexpr functions.


Ideally it could also be added all intrinsics that can be evaluated at
compile time, but it is harder to tell which those are.

Does gcc have a C extension we can use to set constexpr?


What for?


To have similar functionality in C. For instance to explicitly allow those
functions to be evaluated at compile time, and values with similar attributes
be optimized completely out.


Those intrinsics that are implemented without builtins can already be 
evaluated at compile time.


#include 

__m128d f(){
  __m128d a=_mm_set_pd(1,2);
  __m128d b=_mm_setr_pd(4,3);
  return _mm_add_pd(a, b);
}

The generated asm is just

movapd  .LC0(%rip), %xmm0
ret

For the more esoteric intrinsics, what is missing is not in the parser, it 
is a folder that understands the behavior of each particular intrinsic.



And of course avoid using precompiler noise, in
shared C/C++ headers like these are.


--
Marc Glisse


Re: Updating the GCC 6 release notes

2016-05-03 Thread Marc Glisse

On Tue, 3 May 2016, Damian Rouson wrote:


Could someone please tell me how to edit or submit edits for the GCC 6 release 
notes at https://gcc.gnu.org/gcc-6/changes.html?  Specially, the listed Fortran 
improvements are missing several significant items.   I signed the copyright 
assignment in case hat helps.


https://gcc.gnu.org/about.html#cvs

You can send a diff to gcc-patc...@gcc.gnu.org to propose a patch 
(possibly Cc: the fortran mailing-list if your patch is related), same as 
code changes.


--
Marc Glisse


Re: Implicit conversion to a generic vector type

2016-05-25 Thread Marc Glisse

On Thu, 26 May 2016, martin krastev wrote:


Hello,

I've been scratching my head over an implicit conversion issue,
depicted in the following code:


typedef __attribute__ ((vector_size(4 * sizeof(int int generic_int32x4;

struct Foo {
   Foo() {
   }
   Foo(const generic_int32x4& src) {
   }
   operator generic_int32x4() const {
   return (generic_int32x4){ 42 };
   }
};

struct Bar {
   Bar() {
   }
   Bar(const int src) {
   }
   operator int() const {
   return 42;
   }
};

int main(int, char**) {

   const Bar b = Bar() + Bar();
   const generic_int32x4 v = (generic_int32x4){ 42 } + (generic_int32x4){ 42 };
   const Foo e = generic_int32x4(Foo()) + generic_int32x4(Foo());
   const Foo f = Foo() + Foo();
   const Foo g = (generic_int32x4){ 42 } + Foo();
   const Foo h = Foo() + (generic_int32x4){ 42 };
   return 0;
}

In the above, the initialization expression for local 'b' compiles as
expected, and so do the expressions for locals 'v' and 'e'. The
initializations of locals 'f', 'g' and 'h', though, fail to compile
(under g++-6.1.1, likewise under 5.x and 4.x) with:

$ g++-6 xxx.cpp
xxx.cpp: In function ‘int main(int, char**)’:
xxx.cpp:28:22: error: no match for ‘operator+’ (operand types are
‘Foo’ and ‘Foo’)
 const Foo f = Foo() + Foo();
   ~~^~~
xxx.cpp:29:40: error: no match for ‘operator+’ (operand types are
‘generic_int32x4 {aka __vector(4) int}’ and ‘Foo’)
 const Foo g = (generic_int32x4){ 42 } + Foo();
~~~^~~
xxx.cpp:30:22: error: no match for ‘operator+’ (operand types are
‘Foo’ and ‘generic_int32x4 {aka __vector(4) int}’)
 const Foo h = Foo() + (generic_int32x4){ 42 };
   ~~^

Apparently there is some implicit conversion rule that stops g++ from
doing the expected implicit conversions, but I can't figure out which
rule that is. The fact clang handles the code without an issue does
not help either. Any help will be appreciated.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57572

--
Marc Glisse


Re: Implicit conversion to a generic vector type

2016-05-26 Thread Marc Glisse

On Thu, 26 May 2016, martin krastev wrote:


Thank you for the reply. So it's a known g++ issue with a candidate
patch. Looking at the patch, I was wondering, what precludes the
generic vector types form being proper arithmetic types?


In some cases vectors act like arithmetic types (operator+, etc), and in 
others they don't (conversions in general). We have scalarish_type_p for 
things that are scalars or vectors, we could add arithmeticish_type_p ;-)


(I think the name arithmetic comes directly from the standard, so we don't 
want to change its meaning)


--
Marc Glisse


Re: Implicit conversion to a generic vector type

2016-05-27 Thread Marc Glisse

On Fri, 27 May 2016, martin krastev wrote:


A new arithmeticish type would take more effort, I understand. Marc,
are there plans to incorporate your patch, perhaps in an extended
form, in a release any time soon?


There is no plan either way. When someone is motivated enough (I am not, 
currently), they will submit a patch to gcc-patc...@gcc.gnu.org, which 
will be reviewed. Note that a patch needs to include testcases (see the 
files in gcc/testsuite/g++.dg for examples). If you are interested, you 
could give it a try...


--
Marc Glisse


Re: An issue with GCC 6.1.0's make install?

2016-06-04 Thread Marc Glisse

On Sat, 4 Jun 2016, Ethin Probst wrote:


Yesterday I managed to successfully build GCC and all of the
accompanying languages that it supports by default (Ada, C, C++,
Fortran, Go, Java, Objective-C, Objective-C++, and Link-time
Optimization (LTO)). I did not build JIT support because I have not
herd if it is stable or not.
Anyways, seeing as I didn't (and still do not) want to wait another 12
hours for that to build, I compressed it into a .tar.bz2 archive,


Did you use "make -j 8" (where 8 is roughly how many CPUs you have in your 
server)? 12 hours seems excessive.



copied it over to another server, decompressed it, and here's when the


Did you copy it to exactly the same path as on the original server, 
preserving time stamps, and do both servers have identical systems?



problems start. Keep in mind that I did ensure that all files were
compressed and extracted.
When I go into my build subdirectory build tree, and type "make
install -s", it installs gnat, gcc (and g++), gfortran, gccgo, and
gcj, but it errors out (and, subsequently, bales out) and says the
following:
Making install in tools
make[3]: *** [install-recursive] Error 1
make[2]: *** [install-recursive] Error 1
make[1]: *** [install-target-libjava] Error 2
make: *** [install] Error 2
And then:
$ gcj
gcj: error: libgcj.spec: No such file or directory


A more common approach would be to run "make install DESTDIR=/some/where", 
tar that directory, copy this archive to other servers, and untar it in 
the right location. That's roughly what linux distributions do.



I'm considering the test suite, but until it installs, I'm not sure if
executing the test suite would be very wise at this point. To get it
to say that no input file was specified, I have to manually run the
following commands:
$ cd x86_64-pc-linux-gnu/libjava
$ cp libgcj.spec /usr/bin


That seems like a strange location for this file.


Has the transportation of the source code caused the build tree to be
messed up? I know that it works perfectly fine on my other server.
Running make install without the -s command line parameter yields
nothing. Have I done something wrong?


"nothing" is not very helpful... Surely it gave some error message.

--
Marc Glisse


Re: [RFC][Draft patch] Introduce IntegerSanitizer in GCC.

2016-07-04 Thread Marc Glisse

On Mon, 4 Jul 2016, Maxim Ostapenko wrote:


Is community interested in such a tool?


On the one hand, it is clearly useful since you found bugs thanks to it.

On the other hand:

1) I hope we never reach the situation caused by Microsoft's infamous
warning C4146 (which is even an error if you enable "secure" mode),
where projects writing perfectly legal bignum code keep getting
misguided reports by users who see those warnings.

2) This kind of encourages people to keep using unsigned types for 
non-negative integers, whereas they would be better reserved to bignum and 
bitfields (sadly, the standards make it hard to avoid unsigned types...).


--
Marc Glisse


Vector unaligned load/store x86 intrinsics

2016-08-25 Thread Marc Glisse

Hello,

I was considering changing the implementation of _mm_loadu_pd in x86's 
emmintrin.h to avoid a builtin. Here are 3 versions:


typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__));
typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, 
aligned(1)));

__m128d f (double const *__P)
{
  return __builtin_ia32_loadupd (__P);
}

__m128d g (double const *__P)
{
  return *(__m128d_u*)(__P);
}

__m128d h (double const *__P)
{
  __m128d __r;
  __builtin_memcpy (&__r, __P, 16);
  return __r;
}


f is what we have currently. f and g generate the same code. h also 
generates the same code except at -O0 where it is slightly longer.


(note that I haven't regtested either version yet)

1) I don't have any strong preference between g and h, is there a reason 
to pick one over the other? I may have a slight preference for g, which 
expands to


  __m128d _3;
  _3 = MEM[(__m128d_u * {ref-all})__P_2(D)];

while h yields

  __int128 unsigned _3;
  _3 = MEM[(char * {ref-all})__P_2(D)];
  _4 = VIEW_CONVERT_EXPR(_3);


2) Reading Intel's doc for movupd, it says: "If alignment checking is 
enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check 
exception (#AC) may or may not be generated (depending on processor 
implementation) when the operand is not aligned on an 8-byte boundary." 
Since we generate movupd for memcpy even when the alignment is presumably 
only 1 byte, I assume that this alignment-check stuff is not supported by 
gcc?


--
Marc Glisse


Re: Vector unaligned load/store x86 intrinsics

2016-08-26 Thread Marc Glisse

On Fri, 26 Aug 2016, Richard Biener wrote:


On Thu, Aug 25, 2016 at 9:40 PM, Marc Glisse  wrote:

Hello,

I was considering changing the implementation of _mm_loadu_pd in x86's
emmintrin.h to avoid a builtin. Here are 3 versions:

typedef double __m128d __attribute__ ((__vector_size__ (16),
__may_alias__));
typedef double __m128d_u __attribute__ ((__vector_size__ (16),
__may_alias__, aligned(1)));

__m128d f (double const *__P)
{
  return __builtin_ia32_loadupd (__P);
}

__m128d g (double const *__P)
{
  return *(__m128d_u*)(__P);
}

__m128d h (double const *__P)
{
  __m128d __r;
  __builtin_memcpy (&__r, __P, 16);
  return __r;
}


f is what we have currently. f and g generate the same code. h also
generates the same code except at -O0 where it is slightly longer.

(note that I haven't regtested either version yet)

1) I don't have any strong preference between g and h, is there a reason to
pick one over the other? I may have a slight preference for g, which expands
to

  __m128d _3;
  _3 = MEM[(__m128d_u * {ref-all})__P_2(D)];

while h yields

  __int128 unsigned _3;
  _3 = MEM[(char * {ref-all})__P_2(D)];
  _4 = VIEW_CONVERT_EXPR(_3);


I prefer 'g' which is just more natural.


Ok, thanks.

Note that the C language requires that __P be aligned to alignof 
(double)  (not sure what the Intel intrinsic specs say here), and thus 
it doesn't allow arbitrary misalignment.  This means that you could use 
a slightly better aligned type with aligned(alignof(double)).


I had thought about it, but since we already generate movupd with 
aligned(1), it didn't really seem worth the trouble for this prototype.


Or to be conforming the parameter should not be double const * but a 
double type variant with alignment 1 ...


Yeah, those intrinsics have issues:

__m128i _mm_loadu_si128 (__m128i const* mem_addr)
"mem_addr does not need to be aligned on any particular boundary."

that doesn't really make sense.

I may try to experiment with your suggestion, see if it breaks anything. 
Gcc seems happy to ignore those alignment differences when casting 
function pointers, so it should be fine.



2) Reading Intel's doc for movupd, it says: "If alignment checking is
enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check
exception (#AC) may or may not be generated (depending on processor
implementation) when the operand is not aligned on an 8-byte boundary."
Since we generate movupd for memcpy even when the alignment is presumably
only 1 byte, I assume that this alignment-check stuff is not supported by
gcc?


Huh, never heard of this.  Does this mean that mov_u_XX do alignment-check
exceptions?  I believe this would break almost all code (glibc memcpy, GCC
generated code, etc).  Thus it would require kernel support, emulating
the unaligned ops to still work (but record them somehow).


Elsewhere ( 
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_pd&expand=3106,3115,3106,3124,3106&techs=SSE2 
) Intel doesn't mention this at all, it just says: "mem_addr does not need 
to be aligned on any particular boundary." So it might be a provision in 
the spec that was added just in case, but never implemented...


--
Marc Glisse


Re: Is this FE bug or am I missing something?

2016-09-12 Thread Marc Glisse

On Sun, 11 Sep 2016, Igor Shevlyakov wrote:


Small sample below fails (at least on 6.1) for multiple targets. The
difference between two functions start at the very first tree pass...


You are missing -fsanitize=undefined (and #include ).

Please use the mailing list gcc-h...@gcc.gnu.org next time.

--
Marc Glisse


Re: Is this FE bug or am I missing something?

2016-09-13 Thread Marc Glisse

On Mon, 12 Sep 2016, Igor Shevlyakov wrote:


Well, my concern is not what happens with overflow (which in second
case -fsanitize=undefined will address), but rather consistency of
that 2 cases.

p[x+1] generates RTL which leads to better generated code at the
expense of leading to overflow, while p[1+x] never overflows but leads
to worse code.
It would be beneficial to make the behaviour consistent between those 2 cases.


True. Your example with undefined behavior confused me as to what your 
point was.


For

int* f1(int* p, int x) { return &p[x + 1]; }
int* f2(int* p, int x) { return &p[1 + x]; }

we get in the gimple dump

  _1 = (sizetype) x;
  _2 = _1 + 1;
vs
  _1 = x + 1;
  _2 = (long unsigned int) _1;

The second one is a better starting point (it has more information about 
potential overflow), but the first one has the advantage that all numbers 
have the same size, which saves an instruction in the end


movslq  %esi, %rsi
leaq4(%rdi,%rsi,4), %rax
vs
addl$1, %esi
movslq  %esi, %rsi
leaq(%rdi,%rsi,4), %rax

We regularly discuss the potential benefits of a pass that would try to 
uniformize integer sizes...


In the mean time, I agree that gimplifying x+1 and 1+x differently makes 
little sense, you could file a PR about that.


--
Marc Glisse


Re: how to check if target supports andnot instruction ?

2016-10-12 Thread Marc Glisse

On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote:


I was having a look at PR71636 and added the following pattern to match.pd:
x & ((1U << b) - 1) -> x & ~(~0U << b)
However the transform is useful only if the target supports "andnot"
instruction.


rth was selling the transformation as a canonicalization, which is 
beneficial when there is an andnot instruction, and neutral otherwise, so 
it could be done always.



As pointed out by Marc in PR for -march=core2, lhs generates worse
code than rhs,
so we shouldn't do the transform if target doesn't support andnot insn.
(perhaps we could do the reverse transform for target not supporting andnot?)


Rereading my comment in the PR, I pointed out that instead of being 
neutral, the transformation was very slightly detrimental in one case (one 
extra mov) because of a RA issue. That doesn't mean we should avoid the 
transformation, just that we should fix the RA issue (by the way, if you 
have time to file a separate PR for the RA issue, that would be great, 
otherwise I'll try to do it at some point...).


However it seems andnot isn't a standard pattern name, so am not sure 
how to check if target supports andnot insn ?


--
Marc Glisse


Re: how to check if target supports andnot instruction ?

2016-10-13 Thread Marc Glisse

On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote:


On 12 October 2016 at 14:43, Richard Biener  wrote:

On Wed, 12 Oct 2016, Marc Glisse wrote:


On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote:


I was having a look at PR71636 and added the following pattern to match.pd:
x & ((1U << b) - 1) -> x & ~(~0U << b)
However the transform is useful only if the target supports "andnot"
instruction.


rth was selling the transformation as a canonicalization, which is beneficial
when there is an andnot instruction, and neutral otherwise, so it could be
done always.


Well, its three instructions to three instructions and a more expensive
constant(?).  ~0U might not be available as immediate for the shift
instruction and 1U << b might be available as a bit-set instruction ...
(vs. the andnot).


True, I hadn't thought of bit-set.


So yes, we might decide to canonicalize to andnot (and decide that
three binary to two binary and one unary op is "better").

So no excuse to explore the target specific .pd fragment idea ... :/

Hi,
I have attached patch that adds the transform.
Does that look OK ?


Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested 
in the PR? If we only do the transformation when (1<bit_and, then we probably want to require that it has a single use (maybe 
even the shift).



I am not sure how to write test-cases for it though.
For the test-case:
unsigned f(unsigned x, unsigned b)
{
 unsigned t1 = 1U << b;
 unsigned t2 = t1 - 1;
 unsigned t3 = x & t2;
 return t3;
}

forwprop dump shows:
Applying pattern match.pd:523, gimple-match.c:47419
gimple_simplified to _6 = 4294967295 << b_1(D);
_8 = ~_6;
t3_5 = x_4(D) & _8;

I could scan for "_6 = 4294967295 << b_1(D);"  however I suppose
~0 would depend on width of int and not always be 4294967295 ?
Or should I scan for "_6 = 4294967295 << b_1(D);"
and add /* { dg-require-effective int32 } */  to the test-case ?


You could check that you have ~, or that you don't have " 1 << ".

--
Marc Glisse


Re: GCC 6.2.0 : What does the undocumented -r option ?

2016-11-07 Thread Marc Glisse

On Mon, 7 Nov 2016, Emmanuel Charpentier wrote:


The Sage project (http://www.sagemath.org) has recently hit an
interesting snag : its developers using Debian testing began to
encounter difficulties compiling the flint package (http://groups.googl
e.co.uk/group/flint-devel) with gcc 2.6.0.

One of us found (see https://groups.google.com/d/msg/sage-devel/TduebNo
ZuBE/sEULolL0BQAJ) that this was bound to a conflict between the -pie
option (now default) and an undocumented -r option.

We would like to know what is this -r option, what it does and why it
is undocumented.


(the mailing list you are looking for is gcc-h...@gcc.gnu.org)

As can be seen in the first message of the conversation you link to
"/usr/bin/ld: -r and -pie may not be used together"

The option -r is passed to ld, so you have to look for it in ld's manual 
where it is clearly documented.


(that hardening stuff is such a pain...)

--
Marc Glisse


Re: Need some help with a possible bug

2014-04-23 Thread Marc Glisse

(should have been gcc-h...@gcc.gnu.org, please send any follow-ups there)

On Wed, 23 Apr 2014, George R Goffe wrote:


I'm trying to build the latest gcc


Do you really need gcj? If not, please disable java.

and am getting a message from the 
process "collect2: error: ld returned 1 exit status" for this library 
/usr/lsd/Linux/lib/libgmp.so. Here's the full msg: 
"/usr/lsd/Linux/lib/libgmp.so: could not read symbols: File in wrong 
format"


You are doing a multilib build (--disable-multilib if you don't want 
that), so it tries to build both a 64 bit and a 32 bit versions of 
libjavamath.so, both of which want to link to GMP. So you need both 
versions of GMP installed as well.


I thought the configure script in classpath would detect your missing 32 
bit GMP and disable use of GMP in that case, but apparently not... You may 
want to file a PR in bugzilla about that if there isn't one already. But 
you'll need to provide more info there: your configure command line, the 
file config.log in the 32 bit version of classpath, etc.


--
Marc Glisse


Re: RTL representation of i386 shrdl instruction is incorrect?

2014-06-05 Thread Marc Glisse

On Thu, 5 Jun 2014, Niranjan Hasabnis wrote:


Thanks for your reply. I looked into some of the details of how that
particular RTL template is used. It seems to me that the particular
RTL template is used only when shifting 64-bit data type on a 32-bit
machine. This is the underlying assumption encoded in i386.c file
which generates that particular RTL only when instruction mode is
DImode. If that is the case, then it won't matter whether one uses
arithmetic shift or logical shift to right shift lower 4-bytes of a 8-byte
value. In other words, the mapping between RTL template and shrdl
is incorrect, but the underlying assumption in i386.c guards the bug.


This is still a bug, please file a PR. The use of (match_dup 0) apparently 
prevents combine from matching the insn (that's just a guess from my notes 
in PR 55583, I don't have access to my gcc machine right now to check), 
but that doesn't mean we shouldn't fix things.


--
Marc Glisse


Re: What is "fnspec function type attribute"?

2014-06-06 Thread Marc Glisse

On Fri, 6 Jun 2014, FX wrote:


In fortran/trans-decl.c, we have a comment above the code building function 
decls, saying:


   The SPEC parameter specifies the function argument and return type
   specification according to the fnspec function type attribute.  */


I was away from GCC development for some time, so this is news to me. The 
syntax is not immediately clear, and neither a Google search nor a grep of the 
trunk’s numerous .texi files reveals any information. I’m creating new decls, 
what I am to do with it?


You can look at the 2 functions in gimple.c that use gimple_call_fnspec, 
and refer to tree-core.h for the meaning of EAF_*, etc. A string like 
"2x." means:
'2': the first letter is about the return, here we are returning the 
second argument

'x': the first argument is ignored
'.': not saying anything about the second argument.

--
Marc Glisse


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Marc Glisse

On Wed, 25 Jun 2014, Vladimir Makarov wrote:

Maybe.  But in this case LLVM did a right thing.  The variable addressing was 
through a restrict pointer.


Ah, gcc implements (on purpose?) a weak version of restrict, where it only 
considers that 2 restrict pointers don't alias, whereas all other 
compilers assume that restrict pointers don't alias other non-derived 
pointers (see several PRs in bugzilla). I believe Richard recently added 
code that would make implementing the strong version of restrict easier. 
Maybe that's what is missing here?


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Marc Glisse

On Tue, 1 Jul 2014, Jeff Law wrote:


On 07/01/14 13:27, Tom de Vries wrote:

Vladimir,

There are a few patterns which use both the read/write constraint
modifier (+) and the earlyclobber constraint modifier (&):
...
$ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0
gcc/config/aarch64/aarch64-simd.md:1
gcc/config/arc/arc.md:1
gcc/config/arm/ldmstm.md:30
gcc/config/rs6000/spe.md:8
...

F.i., this one in gcc/config/aarch64/aarch64-simd.md:
...
(define_insn "vec_pack_trunc_"
  [(set (match_operand: 0 "register_operand" "+&w")
(vec_concat:
  (truncate: (match_operand:VQN 1 "register_operand"
"w"))
  (truncate: (match_operand:VQN 2 "register_operand"
"w"]
...

The documentation (
https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states:
...
'‘&’ does not obviate the need to write ‘=’.
...
which seems to state that '&' implies '='.

An earlyclobber operand is defined as 'modified before the instruction
is finished using the input operands'. AFAIU that would indeed exclude
the possibility that the earlyclobber operand is an input/output operand
it self, but perhaps I misunderstand.

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?
I don't think we can define any reasonable semantics for &+.  My 
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an 
earlyclobber operand" and goes on to explain why that is useful. It avoids 
using the same register for other input when they are identical.


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Marc Glisse

On Tue, 1 Jul 2014, Tom de Vries wrote:


On 01-07-14 21:58, Marc Glisse wrote:

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?

I don't think we can define any reasonable semantics for &+.  My
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an 
earlyclobber
operand" and goes on to explain why that is useful. It avoids using the 
same

register for other input when they are identical.


Hi Marc,

That part of the doc refers to the mulsi3 insn for ARM as example:
...
;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
(define_insn "*arm_mulsi3"
 [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
   (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
(match_operand:SI 1 "s_register_operand" "%0,r")))]
 "TARGET_32BIT && !arm_arch6"
 "mul%?\\t%0, %2, %1"
 [(set_attr "type" "mul")
  (set_attr "predicable" "yes")]
)
...

Note that there's no combination of & and + here.


I think it could have used (match_dup 0) instead of operand 1, if there 
had been only the first alternative. And then the constraint would have 
been +&.


AFAIU, the 'tie' established here is from input operand 1 to an earlyclobber 
output operand 0 using the '0' matching constraint.


Having said that, I don't understand the comment, AFAIU it should be: 'Use 
'0' to make sure operands 0 and 1 are the same, and use '&' to make sure 
operands 0 and 2 are not the same.


Well, yeah, the comment doesn't seem completely in sync with the code.

In the first example you gave, looking at the pattern (no match_dup, 
setting the full register), it seems that it may have wanted "=&" instead 
of "+&".


(by the way, in the same aarch64-simd.md file, I noticed some 
define_expand with constraints, that looks strange)


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-02 Thread Marc Glisse

On Wed, 2 Jul 2014, Tom de Vries wrote:


On 02-07-14 08:23, Marc Glisse wrote:
I think it could have used (match_dup 0) instead of operand 1, if there 
had been only the first alternative. And then the constraint would have 
been +&.


isn't that explicitly listed as unsupported here ( 
https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3244 
):

...
Note that match_dup should not be used to tell the compiler that a particular 
register is being used for two operands (example: add that adds one register 
to another; the second register is both an input operand and the output 
operand). Use a matching constraint (see Simple Constraints) for those. 
match_dup is for the cases where one operand is used in two places in the 
template, such as an instruction that computes both a quotient and a 
remainder, where the opcode takes two input operands but the RTL template has 
to refer to each of those twice; once for the quotient pattern and once for 
the remainder pattern.

...
?


Well, looking for instance at x86_shrd... Ok, I didn't know it wasn't 
supported (though I did suggest using match_operand and "0" at some 
point).


Still, the meaning of +&, in inline asm for instance, seems relatively 
clear, no?


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-02 Thread Marc Glisse

On Wed, 2 Jul 2014, Tom de Vries wrote:


On 02-07-14 09:02, Marc Glisse wrote:
Still, the meaning of +&, in inline asm for instance, seems relatively 
clear, no?


I can't find any testsuite examples using this construct.

Furthermore, I'd expect the same semantics and restrictions for constraints 
in rtl templates and inline asm.


So I'm not sure what you mean.


Coming back to your original question:

An earlyclobber operand is defined as 'modified before the instruction is 
finished using the input operands'. AFAIU that would indeed exclude the 
possibility that the earlyclobber operand is an input/output operand it 
self, but perhaps I misunderstand.


So my question is: is the combination of '&' and '+' supported ? If so, 
what is the exact semantics ? If not, should we warn or give an error ?


An earlyclobber operand X prevents *other* input operands from using the 
same register, but that does not include X itself (if it is using +) or 
operands explicitly using a matching constraint for X. At least that's how 
I understand it.


--
Marc Glisse


Re: GCC version bikeshedding

2014-08-06 Thread Marc Glisse

On Wed, 6 Aug 2014, Jakub Jelinek wrote:


- libstdc++ ABI changes


It seems unlikely to be in the next release, it is too late in the cycle. 
Chances to break the ABI don't come often, and rushing one at the end of 
stage1 would be wasting a good opportunity.


--
Marc Glisse


Re: GCC version bikeshedding

2014-08-06 Thread Marc Glisse

On Wed, 6 Aug 2014, Richard Biener wrote:


It's an ABI change for all modes (but not a SONAME change because the
old and new definitions will both be present in the .so).


Ugh.  That's going to be a nightmare to support.


Yes. And IMO a waste of effort compared to a clean .so.7 break, but 
well...



 Is there a configure
switch to change the default ABI used?  That is, on a legacy system
can I upgrate to 5.0 and get code that interoperates fine with code
built with 4.8?  (including ABI boundaries using the affected classes?
I suspect APIs with std::string passing are _very_ common, not
sure about std::list)

What's the failure mode the user will see when linking against a
4.8 compiled library with a std::string interface using 5.0?


In good cases, a linker error about a missing symbol (different mangling). 
In less good cases, a warning at compile-time about using a class marked 
with abi_tag in a class not marked with it. In worse cases (passing 
through void* for instance), a runtime crash.



And how do libraries with such an API avoid silently changing their
ABI dependent on the compiler used to compile them?  That is,
I suppose those need to change their SONAME dependent on
the compiler version used?!


Yes, just like a move to .so.7 would entail.

--
Marc Glisse


Re: GCC version bikeshedding

2014-08-06 Thread Marc Glisse

On Wed, 6 Aug 2014, Jakub Jelinek wrote:


On Wed, Aug 06, 2014 at 12:31:57PM +0200, Richard Biener wrote:

Ok, so the problematical case is

struct X { std::string s; };
void foo (X&);


Yeah.


then.  OTOH I remember that then mangling of X changes as well?


Only if you add abi_tag attribute to X.


Note that -Wabi-tag can tell you where it is needed.

struct __attribute__((abi_tag("marc"))) X {};
struct Y { X x; };

a.cc:2:8: warning: 'Y' does not have the "marc" abi tag that 'X' (used in 
the type of 'Y::x') has [-Wabi-tag]

 struct Y { X x; };
^
a.cc:2:14: note: 'Y::x' declared here
 struct Y { X x; };
  ^
a.cc:1:41: note: 'X' declared here
 struct __attribute__((abi_tag("marc"))) X {};
 ^


I hope the libstdc++ folks will add some macro which will
include the right abi_tag attribute for the std::list/std::string
cases, so you'd in the end just add
#ifndef _GLIBCXX_ABI_TAG_SOMETHING
#define _GLIBCXX_ABI_TAG_SOMETHING
#endif
...
struct X _GLIBCXX_ABI_TAG_SOMETHING { std::string s; };
void foo (X&);
or similar.


So we only need to patch every project out there...



A clean .so.7 break would be significantly worse nightmare.  We've been
there many years ago, e.g. 3.2/3.3 vs. 3.4, there has been significantly
fewer C++ plugins etc. in packages and it still it was unsolvable.
With the abi_tag stuff, you have the option to make stuff interoperable
when mixing compiler, either with no effort at all, or some limited
effort.  With .so.7, you have no option, nothing will be interoperable.


I disagree that it is worse, but you have more experience, I guess we
will see the results in a few years...

--
Marc Glisse


Re: Where does GCC pick passes for different opt. levels

2014-08-11 Thread Marc Glisse

On Mon, 11 Aug 2014, Steve Ellcey  wrote:


I have a basic question about optimization selection in GCC.  There used to
be some code in GCC (passes.c?) that would set various optimize pass flags
depending on if the 'optimize' flag was > 0, > 1, or > 2; later I think
there may have been a table.


There is still a table in opts.c, with entries that look like:

{ OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },



This code seems gone now and I can't figure
out how GCC is selecting what optimization passes to run at what optimization
levels (-O1 vs. -O2 vs. -O3).  How is this handled in the top-of-tree GCC code?

I see passes.def but there doesn't seem to be anything in there to tie
specific passes to specific optimization levels.  Likewise in common.opt
I see flags for various optimization passes but nothing to tie them to
-O1 or -O2, etc.

I'm probably missing something obvious, but a pointer would be much
appreciated.


--
Marc Glisse


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Marc Glisse

On Mon, 11 Aug 2014, Kyrill Tkachov wrote:


The aarch64 target has a conditional negation instruction
CSNEG Rd, Rs1, Rs2, cond

with semantics Rd = if cond then Rs1 else -Rs2.

This, however doesn't get end up getting matched for code such as:
int
foo2 (unsigned a, unsigned b)
{
 int r = 0;
 r = a & b;
 if (a & b)
   return -r;
 return r;
}


Note that in this particular case, we should just return -(a&b) like llvm 
does.


--
Marc Glisse


Re: gcc parallel make check

2014-09-03 Thread Marc Glisse

On Wed, 3 Sep 2014, VandeVondele  Joost wrote:


I've noticed that

make -j -k check-fortran

results in a serialized checking, while

make -j32 -k check-fortran

goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check 
target, while the other targets seem to do just fine. Is that a feature, or 
should I file a PR for that... ?


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155

--
Marc Glisse


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-17 Thread Marc Glisse

On Wed, 17 Sep 2014, Ian Grant wrote:


And is there any way to disable the Intel library?


--disable-libcilkrts (same as the other libs)
If it explicitly doesn't support your system, I am a bit surprised it 
isn't disabled automatically, that seems like a bug.


Please don't call it "the Intel library", that doesn't mean anything.

--
Marc Glisse


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-17 Thread Marc Glisse

On Wed, 17 Sep 2014, Ian Grant wrote:


On Wed, Sep 17, 2014 at 1:36 PM, Marc Glisse  wrote:

On Wed, 17 Sep 2014, Ian Grant wrote:


And is there any way to disable the Intel library?



--disable-libcilkrts (same as the other libs)
If it explicitly doesn't support your system, I am a bit surprised it isn't
disabled automatically, that seems like a bug.


Not necessarily a bug, but it would have been good if the --help
option had mentioned it. I looked, really. Perhaps I missed it though.
So many options for disabling one thing or another 


https://gcc.gnu.org/install/configure.html
lists a number of others but not this one, maybe it should be added.


Please don't call it "the Intel library", that doesn't mean anything.


Doesn't it? How did you know what 'it' was then? Or is that a stupid
question? This identity concept is much slipperier than it seems at
first, isn't it?


You included error messages...


How about my question about the size of the binaries? Is that 60+MB
what other systems show?


I still see <20M here, but I don't know if there are reasons for what you 
are seeing. Are you maybe using different options? (debug information, 
optimization, lto, etc)


--
Marc Glisse


Re: How to identify the type of the object being created using the new operator?

2014-10-06 Thread Marc Glisse

On Mon, 6 Oct 2014, Swati Rathi wrote:


Statement : A *a = new B;

gets translated in GIMPLE as
1. void * D.2805;
2. struct A * a;
3. D.2805 = operator new (20);
4. a = D.2805;

A is the base class and B is the derived class.
In statement 3, new operator is creating an object of derived class B.
By analyzing the RHS of the assignment statement 3, how can we identify the 
type (in this case B) of the object being created?


I strongly doubt you can. It is calling B's constructor that will turn 
this memory region into a B, operator new is the same as malloc, it 
only returns raw memory.


(If A and B don't have the same size, the argument 20 can be a hint)

--
Marc Glisse


Re: volatile access optimization (C++ / x86_64)

2014-12-26 Thread Marc Glisse

On Fri, 26 Dec 2014, Matt Godbolt wrote:


I'm investigating ways to have single-threaded writers write to memory
areas which are then (very infrequently) read from another thread for
monitoring purposes. Things like "number of units of work done".

I initially modeled this with relaxed atomic operations. This
generates a "lock xadd" style instruction, as I can't convey that
there are no other writers.

As best I can tell, there's no memory order I can use to explain my
usage characteristics. Giving up on the atomics, I tried volatiles.
These are less than ideal as their power is less expressive, but in my
instance I am not trying to fight the ISA's reordering; just prevent
the compiler from eliding updates to my shared metrics.

GCC's code generation uses a "load; add; store" for volatiles, instead
of a single "add 1, [metric]".


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677

--
Marc Glisse


Re: C++ Standard Question

2015-01-22 Thread Marc Glisse

On Thu, 22 Jan 2015, Joel Sherrill wrote:


I think this is a glibc issue but since this method is defined in the C++
standards, I thought there were plenty of language lawyers here. :)


s/glibc/libstdc++/ and they have their own ML.





That's deprecated, isn't it?


  class strstreambuf : public basic_streambuf >
  ISSUE > int pcount() const;   <= ISSUE

My reading of the C++03 and draft C++14 says that the int pcount() method
in this class is not const. glibc has it const in the glibc shipped with
Fedora 20
and CentOS 6.

This is a simple test case:

   #include 

   int main() {
   int (std::strstreambuf::*dummy)() = &std::strstreambuf::pcount;
/*-- pcount is conformant --*/
   return 0;
   }

What's the consensus?


The exact signature of member functions is not mandated by the standard, 
implementations are allowed to make the function const if that works (or 
provide both a const and a non-const version). Your code is not guaranteed 
to work. Lambdas usually provide a fine workaround.


--
Marc Glisse


Re: unfused fma question

2015-02-23 Thread Marc Glisse

On Mon, 23 Feb 2015, Jeff Law wrote:


On 02/23/15 11:38, Joseph Myers wrote:


(I wonder if convert_mult_to_fma is something that should move to
match-and-simplify infrastructure.)

Yea, it probably should.


Currently, it happens in a pass that is quite late. If it moves to 
match-and-simplify, I am afraid it might inhibit some other optimizations 
(we can turn plus+mult to fma but not the reverse), unless we use some way 
to inhibit some patterns until a certain pass (possibly a simple "if", if 
that's not too costly). Such "time-restricted" patterns might be useful 
for other purposes: don't introduce complicated vector/complex operations 
after the corresponding lowering passes, do narrowing until a certain 
point but then prefer fast integer sizes, etc (I haven't thought about 
those particular examples, they are only an illustration).


--
Marc Glisse


Re: A bug (?) with inline functions at O0: undefined reference

2015-03-06 Thread Marc Glisse

On Fri, 6 Mar 2015, Ilya Verbin wrote:


I've discovered a strange behaviour on trunk gcc, here is the reproducer:

inline int foo ()
{
 return 0;
}

int main ()
{
 return foo ();
}

$ gcc main.c
/tmp/ccD1LeXo.o: In function `main':
main.c:(.text+0xa): undefined reference to `foo'
collect2: error: ld returned 1 exit status

Is this a bug?  If yes, is it known?
GCC 4.8.3 works fine though.


Not a bug, that's what inline means in C99 and later.

--
Marc Glisse


Re: Named parameters

2015-03-16 Thread Marc Glisse

On Mon, 16 Mar 2015, David Brown wrote:


In a discussion on comp.lang.c, the subject of "named parameters" (or
"designated parameters") has come up again.  This is a feature that some
of us feel would be very useful in C (and in C++).  I think it would be
possible to include it in the language without leading to any conflicts
with existing code - it is therefore something that could be made as a
gcc extension, with a hope of adding it to the standards for a later C
standards revision.

I wanted to ask opinions on the mailing list as to the feasibility of
the idea - there is little point in my cluttering up bugzilla with an
enhancement request if the gcc developers can spot obvious flaws in the
idea.


Filing a report in bugzilla would be quite useless: language extensions 
are now almost automatically rejected unless they come with a proposal 
that has already been favorably seen by the standardization committee.


On the other hand, implementing the feature (in your own fork) is almost a 
requirement if you intend to propose this for standardization. And it 
should not be too hard.



Basically, the idea is this:

int foo(int a, int b, int c);

void bar(void) {
foo(1, 2, 3);   // Normal call
foo(.a = 1, .b = 2, .c = 3) // Same as foo(1, 2, 3)
foo(.c = 3, .b = 2, .a = 1) // Same as foo(1, 2, 3)
}


struct foo_args {
  int a, b, c;
};
void foo(struct foo_args);
#define foo(...) foo((struct foo_args){__VA_ARGS__})
void g(){
  foo(1,2,3);
  foo(.c=3,.b=2);
}

In C++ you could almost get away without the macro, calling f({1,2,3}), 
but f({.c=3}) currently gives "sorry, unimplemented". Maybe you would like 
to work on that?



If only the first variant is allowed (with the named parameters in the
order declared in the prototype), then this would not affect code
generation at all - the designators could only be used for static error
checking.

If the second variant is allowed, then the parameters could be re-ordered.


The aim of this is to make it easier and safer to call functions with a
large number of parameters.  The syntax is chosen to match that of
designated initialisers - that should be clearer to the programmer, and
hopefully also make implementation easier.

If there is more than one declaration of the function, then the
designators used should follow the most recent in-scope declaration.


An error may be safer, you would at least want a warning.


This feature could be particularly useful when combined with default
arguments in C++, as it would allow the programmer to override later
default arguments without specifying all earlier arguments.


C++ is always more complicated (so many features can interact in strange 
ways), I suggest you start with C.



At the moment, I am not asking for an implementation, or even /how/ it
might be implemented (perhaps a MELT plugin?) - I would merely like
opinions on whether it would be a useful and practical enhancement.


This is not such a good list for that, comp.lang.c is better suited. This 
will be a good list if you have technical issues implementing the feature.


--
Marc Glisse


Re: -Wno-c++11-extensions addition

2015-03-25 Thread Marc Glisse

On Wed, 25 Mar 2015, Jack Howarth wrote:


On Wed, Mar 25, 2015 at 12:41 PM, Jonathan Wakely  wrote:

On 25 March 2015 at 16:16, Jack Howarth wrote:

Does anyone remember which FSF gcc release first added the
-Wno-c++11-extensions option for g++? I know it exists in 4.6.3


Are you sure? It doesn't exist for 4.6.4 or anything later.

Are you thinking of -Wc++0x-compat ?


On x86_64 Fedora 15...

$ /usr/bin/g++ --version
g++ (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ /usr/bin/g++ -Wno-c++11-extensions hello.cc
$

So gcc 4.6.3 appears to at least tolerate that warning without
claiming that it is unknown.


https://gcc.gnu.org/wiki/FAQ#The_warning_.22unrecognized_command-line_option.22_is_not_given_for_-Wno-foo

--
Marc Glisse


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Marc Glisse

On Fri, 24 Apr 2015, Uros Bizjak wrote:


Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
pseudo). IIRC, there is some functionality in the compiler that is
able to tell if the highpart of the paradoxical register is zeroed.


Those are not currently legal (I tried to change that)
https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html

In this case, a subreg:V2DI of DImode should work.

--
Marc Glisse


Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Marc Glisse

On Fri, 12 Oct 2018, Thomas Schwinge wrote:


Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
also present when running the following code through the vectorizer:

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
 {
   int j = tmp / N_I;
   int i = tmp % N_I;
   a[j][i] = 0;
 }

... whereas the following variant (obviously) does vectorize:

   int a[NJ * NI];

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
 a[tmp] = 0;


I had a quick look at the difference, and a[j][i] remains in this form 
throughout optimization. If I write instead *((*(a+j))+i) = 0; I get


  j_10 = tmp_17 / 1025;
  i_11 = tmp_17 % 1025;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1025;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

or for a power of 2

  j_10 = tmp_17 >> 10;
  i_11 = tmp_17 & 1023;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1024;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least 
I think that's true).


So there are missing match.pd transformations in addition to whatever 
scev/ivdep/other work is needed.


--
Marc Glisse


Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)

2018-11-04 Thread Marc Glisse

(resent because of mail issues on my end)

On Mon, 22 Oct 2018, Thomas Schwinge wrote:


I had a quick look at the difference, and a[j][i] remains in this form
throughout optimization. If I write instead *((*(a+j))+i) = 0; I get

   j_10 = tmp_17 / 1025;
   i_11 = tmp_17 % 1025;
   _1 = (long unsigned int) j_10;
   _2 = _1 * 1025;
   _3 = (sizetype) i_11;
   _4 = _2 + _3;

or for a power of 2

   j_10 = tmp_17 >> 10;
   i_11 = tmp_17 & 1023;
   _1 = (long unsigned int) j_10;
   _2 = _1 * 1024;
   _3 = (sizetype) i_11;
   _4 = _2 + _3;

and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least
I think that's true).

So there are missing match.pd transformations in addition to whatever
scev/ivdep/other work is needed.


With a very simplistic "match.pd" rule (not yet any special cases
checking etc.):

diff --git gcc/match.pd gcc/match.pd
index b36d7ccb5dc3..4c23116308da 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
{ wide_int_to_tree (sizetype, off); })
  { swap_p ? @0 : @2; }))
{ rhs_tree; })
+
+/* Given:
+
+   j = in / N_I
+   i = in % N_I
+
+   ..., fold:
+
+   out = j * N_I + i
+
+   ..., into:
+
+   out = in
+*/
+
+/* As long as only considering N_I being INTEGER_CST (which are always second
+   argument?), probably don't need ":c" variants?  */
+
+(simplify
+ (plus:c
+  (mult:c
+   (trunc_div @0 INTEGER_CST@1)
+   INTEGER_CST@1)
+  (trunc_mod @0 INTEGER_CST@1))
+ (convert @0))


You should only specify INTEGER_CST@1 on the first occurence, the others
can be just @1. (you may be interested in @@1 at some point, but that
gets tricky)


..., the original code:

   int f1(int in)
   {
 int j = in / N_I;
 int i = in % N_I;

 int out = j * N_I + i;

 return out;
   }

... gets simplified from ("div-mod-0.c.027t.objsz1"):

   f1 (int in)
   {
 int out;
 int i;
 int j;
 int _1;
 int _6;

  :
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_return <_6>

   }

... to ("div-mod-0.c.028t.ccp1"):

   f1 (int in)
   {
 int out;
 int i;
 int j;
 int _1;

  :
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_return 

   }

(The three dead "gimple_assign"s get eliminated later on.)

So, that works.

However, it doesn't work yet for the original construct that I'd ran
into, which looks like this:

   [...]
   int i;
   int j;
   [...]
   signed int .offset.5_2;
   [...]
   unsigned int .offset.7_23;
   unsigned int .iter.0_24;
   unsigned int _25;
   unsigned int _26;
   [...]
   unsigned int .iter.0_32;
   [...]

:
   # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)>
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   [...]

Resolving the "a[j][i] = 123" we'll need to look into later.

As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we
get:

   [...]
   int i;
   int j;
   long unsigned int _1;
   long unsigned int _2;
   sizetype _3;
   sizetype _4;
   sizetype _5;
   int * _6;
   [...]
   signed int .offset.5_8;
   [...]
   unsigned int .offset.7_29;
   unsigned int .iter.0_30;
   unsigned int _31;
   unsigned int _32;
   [...]

:
   # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)>
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   [...]

Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30",
but "match.pd" doesn't agree yet.  Note the many "nop_expr"s here, which
I have not yet figured out how to handle, I suppose?  I tried some things
but couldn't get it to work.  Apparently the existing instances of
"(match (nop_convert @0)" and "Basic strip-useless-type-conversions /
strip_nops" rule also don't handle these; should they?  Or, are in fact
here the types mixed up too much?


"(match (nop_convert @0)" defines a shortcut so some transformations can
use nop_convert to detect some specific conversions, but it doesn't do
anything by itself. "Basic strip-useless-type-conversions" strips
conversions that are *useless*, essentially from a type to the same
type. If you want to handle true conversions, you need to do that
explicitly, see the many transformations that use convert? convert1?
convert2? and specify for which particular conversions the
transformation is valid.  Finding out the right conditions to detect
these conversions is often the most painful part of writing a match.pd
transformation.


I hope to get some time again soon to continue looking into this, but if
anybody got any ideas, I'm all ears.


--
Marc Glisse


Re: [RFC] -Weverything

2019-01-22 Thread Marc Glisse

On Tue, 22 Jan 2019, Thomas Koenig wrote:


Hi,

What would people think about a -Weverything option which turns on
every warning there is?

I think that could be quite useful in some circumstances, especially
to find potential bugs with warnings that people, for some reason
or other, found too noisy for -Wextra.

The name could be something else, of course. In the best GNU tradition,
-Wkitchen-sink could be another option :-)


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31573 and duplicates already 
list quite a few arguments. Basically, it could be useful for debugging 
gcc or to discover warnings, but gcc devs fear that users will actually 
use it for real.


--
Marc Glisse


Re: [RFC] -Weverything

2019-01-23 Thread Marc Glisse

On Wed, 23 Jan 2019, Jakub Jelinek wrote:


We have that, gcc -Q --help=warning
Of course, for warnings which do require arguments (numerical, or
enumeration/string), one still needs to pick up his choices of those
arguments; no idea what -Weverything would do here, while some warnings
have different levels where a higher (or lower) level is a superset of
another level, what numbers would you pick for e.g. warnings where the
argument is bytes?


For most of them, there is a value that maximizes the number of warnings, 
so the same superset argument applies. -Wframe-larger-than=0 so it shows 
the estimated frame size on every function, -Walloca-larger-than=0 so it 
is equivalent to -Walloca, etc.


--
Marc Glisse


Re: On-Demand range technology [2/5] - Major Components : How it works

2019-06-04 Thread Marc Glisse

On Tue, 4 Jun 2019, Martin Sebor wrote:


On 5/31/19 9:40 AM, Andrew MacLeod wrote:

On 5/29/19 7:15 AM, Richard Biener wrote:
On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod  
wrote:

On 5/27/19 9:02 AM, Richard Biener wrote:
On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod  
wrote:
The above suggests that iff this is done at all it is not in GORI 
because

those are not conditional stmts or ranges from feeding those.  The
machinery doing the use-def walking from stmt context also cannot
come along these so I have the suspicion that Ranger cannot handle
telling us that for the stmt following above, for example

    if (_5 != 0)

that _5 is not zero?

Can you clarify?

So there are 2 aspects to this.    the range-ops code for DIV_EXPR, if
asked for the range of op2 () would return ~[0,0] for _5.
But you are also correct in that the walk backwards would not find 
this.


This is similar functionality to how null_derefs are currently handled,
and in fact could probably be done simultaneously using the same code
base.   I didn't bring null derefs up, but this is a good time :-)

There is a separate class used by the gori-cache which tracks the
non-nullness property at the block level.    It has a single API:
non_null_deref_p (name, bb)    which determines whether the is a
dereference in any BB for NAME, which indicates whether the range has 
an

implicit ~[0,0] range in that basic block or not.

So when we then have

   _1 = *_2; // after this _2 is non-NULL
   _3 = _1 + 1; // _3 is non-NULL
   _4 = *_3;
...

when a on-demand user asks whether _3 is non-NULL at the
point of _4 = *_3 we don't have this information?  Since the
per-BB caching will only say _1 is non-NULL after the BB.
I'm also not sure whether _3 ever gets non-NULL during
non-NULL processing of the block since walking immediate uses
doesn't really help here?

presumably _3 is globally non-null due to the definition being (pointer
+ x)  ... ie, _3 has a global range o f ~[0,0] ?

No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.


I'm confused.

_1 was loaded from _2 (thus asserting _2 is non-NULL).  but we have no idea 
what the range of _1 is, so  how do you assert _1 is [~0,0] ?
The only way I see to determine _3 is non-NULL  is through the _4 = *_3 
statement.


In the first two statements from the above (where _1 is a pointer):

 _1 = *_2;
 _3 = _1 + 1;

_1 must be non-null because C/C++ define pointer addition only for
non-null pointers, and therefore so must _3.


(int*)0+0 is well-defined, so this uses the fact that 1 is non-null. This 
is all well done in extract_range_from_binary_expr already, although it 
seems to miss the (dangerous) optimization NULL + unknown == NULL.


Just in case, a quote:

"When an expression J that has integral type is added to or subtracted
from an expression P of pointer type, the result has the type of P.
(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the
result is a null pointer value.
(4.2) — Otherwise, if P points to element x[i] of an array object x with
n elements, 80 the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n
and the expression P - J points to the (possibly-hypothetical) element
x[i − j] if 0 ≤ i − j ≤ n.
(4.3) — Otherwise, the behavior is undefined"


Or does the middle-end allow arithmetic on null pointers?


When people use -fno-delete-null-pointer-checks because their (embedded) 
platform has important stuff at address 0, they also want to be able to do 
arithmetic there.


--
Marc Glisse


Re: Testsuite not passing and problem with xgcc executable

2019-06-08 Thread Marc Glisse

On Sat, 8 Jun 2019, Jonathan Wakely wrote:


You can see which tests failed by looking in the .log files in the
testsuite directories,


There are .sum files for a quick summary.


or by running the contrib/test_summary script.


There is also contrib/compare_tests, although running it globally has been 
failing for a long time now, and running it for individual .sum files 
fails for jit and libphobos. Other scripts in contrib/ may be relevant.


--
Marc Glisse


Re: Disappeared flag: -maes on -march=ivybridge, present in -march=native

2019-07-28 Thread Marc Glisse

On Mon, 29 Jul 2019, Kevin Weidemann wrote:

I have recently randomly discovered the fact, that building with 
`-march=ivybridge` does not necessarily produce the same output as 
`-march=native` on an Intel Core i7 3770K (Ivy Bridge).


Nothing so surprising there. Not all Ivy Bridge processors are equivalent, 
and -march=ivybridge has to conservatively target those with less 
features.



71c8e4e2f720bc7155ba2da7c0ee9136a9ab3283 is the first bad commit
commit 71c8e4e2f720bc7155ba2da7c0ee9136a9ab3283
Author: hjl 
Date:   Fri Feb 22 12:49:21 2019 +

    x86: (Reapply) Move AESNI generation to Skylake and Goldmont

    This is a repeat of commit r263989, which commit r264052 accidentally
    reverted.

    2019-02-22  Thiago Macieira  

    PR target/89444
    * config/i386/i386.h (PTA_WESTMERE): Remove PTA_AES.
    (PTA_SKYLAKE): Add PTA_AES.
    (PTA_GOLDMONT): Likewise.


As you can see, this is very much on purpose. See 
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01940.html for the 
explanation that came with the patch.


--
Marc Glisse


Re: [ARM] LLVM's -arm-assume-misaligned-load-store equivalent in GCC?

2020-01-07 Thread Marc Glisse

On Tue, 7 Jan 2020, Christophe Lyon wrote:


I've received a support request where GCC generates strd/ldrd which
require aligned memory addresses, while the user code actually
provides sub-aligned pointers.

The sample code is derived from CMSIS:
#define __SIMD32_TYPE int
#define __SIMD32(addr) (*(__SIMD32_TYPE **) & (addr))

void foo(short *pDst, int in1, int in2) {
  *__SIMD32(pDst)++ = in1;
  *__SIMD32(pDst)++ = in2;
}

compiled with arm-none-eabi-gcc -mcpu=cortex-m7 CMSIS.c -S -O2
generates:
foo:
   strdr1, r2, [r0]
   bx  lr

Using -mno-unaligned-access of course makes no change, since the code
is lying to the compiler by casting short* to int*.


If the issue is as well isolated as this, can't they just edit the code?

typedef int __SIMD32_TYPE __attribute__((aligned(1)));

gets

str r1, [r0]@ unaligned
str r2, [r0, #4]@ unaligned

instead of

strdr1, r2, [r0]

--
Marc Glisse


Re: Deprecating arithmetic on std::atomic

2017-04-20 Thread Marc Glisse

On Thu, 20 Apr 2017, Florian Weimer wrote:


On 04/19/2017 07:07 PM, Jonathan Wakely wrote:

I know it's a bit late, but I'd like to propose deprecating the
libstdc++ extension that allows arithmetic on std::atomic.
Currently we make it behave like arithmetic on void*, which is also a
GNU extension (https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html).
We also allow arithmetic on types such as std::atomic which
is probably not useful (PR 69769).


Why is it acceptable to have the extension for built-in types, but not for 
library types wrapping them?  Why be inconsistent about this?


I thought the extension was there for legacy code, to avoid breaking old 
programs, and we could deprecate it eventually. At least the manual is 
missing an example of where this extension is actually useful. For atomic, 
I don't see why we should encourage people to write new code that violates 
the standard...


--
Marc Glisse


Re: Support Library Requirements for GCC 7.1

2017-05-02 Thread Marc Glisse

On Tue, 2 May 2017, Joel Sherrill wrote:


I am trying to update the gcc version for rtems to 7.1 and
running into trouble finding the correct versions of
mpc, mpfr, and gmp. We build those as part of building
gcc so we have configuration control over the set.

With gcc 6.3.0, we have this in our build recipe:

%define mpfr_version   2.4.2
%define mpc_version0.8.1
%define gmp_version4.3.2

I tried that with gcc 7.1.0 but the build failed complaining
mpfr was too old.


Could you be more precise about how the build failed? AFAIK mpfr-2.4.2 is 
still supposed to work.


--
Marc Glisse


Re: Bug in GCC 7.1?

2017-05-05 Thread Marc Glisse
(I think you are looking for gcc-h...@gcc.gnu.org, or gcc's bugzilla, 
rather than this mailing list)


On Fri, 5 May 2017, Helmut Zeisel wrote:


The following program gives a warning under GCC 7.1 (built on cygwin, 64 bit)

#include 
int main()
{
   std::vector c {1,2,3,0};
   while(c.size() > 0 && c.back() == 0)
   {
   auto sz = c.size() -1;
   c.resize(sz);
   }
   return 0;
}

$ c++7.1 -O3 tt.cxx


Please use
$ LC_ALL=C c++7.1 -O3 tt.cxx
when you want to post the result, unless you are sending to a German 
forum.



In Funktion »int main()«:
cc1plus: Warnung: »void* __builtin_memset(void*, int, long unsigned int)«: 
angegebene Größe 18446744073709551612 überschreitet maximale Objektgröße 
9223372036854775807 [-Wstringop-overflow=]

Compiling with GCC 6.1 (c++6.1 -O3 tt.cxx) works fine.

Is this a problem of my program or a problem of GCC 7.1?


Sounds like a problem with gcc, maybe optimization creates a path that 
corresponds to size==0 and fails to notice that it cannot be taken.


--
Marc Glisse


Re: Infering that the condition of a for loop is initially true?

2017-09-14 Thread Marc Glisse

On Thu, 14 Sep 2017, Niels Möller wrote:


This is more of a question than a bug report, so I'm trying to send it
to the list rather than filing a bugzilla issue.

I think it's quite common to write for- and while-loops where the
condition is always initially true. A simple example might be

double average (const double *a, size_t n)
{
 double sum;
 size_t i;

 assert (n > 0);
 for (i = 0, sum = 0; i < n; i++)
   sum += a[i];
 return sum / n;
}

The programmer could do the microptimization to rewrite it as a
do-while-loop instead. It would be nice if gcc could infer that the
condition is initially true, and convert to a do-while loop
automatically.

Converting to a do-while-loop should produce slightly better code,
omitting the typical jump to enter the loop at the end where the
condition is checked. It would also make analysis of where variables are
written more accurate, which is my main concern at the moment.


Hello,

assert is not what you want, since it completely disappears with -DNDEBUG. 
clang has __builtin_assume, with gcc you want a test and 
__builtin_unreachable. Replacing your assert with

if(n==0)__builtin_unreachable();
gcc does skip the first test of the loop, as can be seen in the dump
produced with -fdump-tree-optimized.

--
Marc Glisse


Re: -pie option in ARM64 environment

2017-09-29 Thread Marc Glisse

On Fri, 29 Sep 2017, jacob navia wrote:


I am getting this error:

GNU ld (GNU Binutils for Debian) 2.28
/usr/bin/ld: error.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external 
symbol `stderr@@GLIBC_2.17' can not be used when making a shared object; 
recompile with -fPIC


The problem is, I do NOT want to make a shared object! Just a plain 
executable.


The verbose linker options are as follows:

collect2 version 6.3.0 20170516
/usr/bin/ld -plugin /usr/lib/gcc/aarch64-linux-gnu/6/liblto_plugin.so 
-plugin-opt=/usr/lib/gcc/aarch64-linux-gnu/6/lto-wrapper 
-plugin-opt=-fresolution=/tmp/cc9I00ft.res -plugin-opt=-pass-through=-lgcc 
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc 
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ 
--build-id --eh-frame-hdr --hash-style=gnu -dynamic-linker 
/lib/ld-linux-aarch64.so.1 -X -EL -maarch64linux --fix-cortex-a53-843419 -pie 
-o lcc /usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/Scrt1.o 
/usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/crti.o 
/usr/lib/gcc/aarch64-linux-gnu/6/crtbeginS.o 
-L/usr/lib/gcc/aarch64-linux-gnu/6 
-L/usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu 
-L/usr/lib/gcc/aarch64-linux-gnu/6/../../../../lib -L/lib/aarch64-linux-gnu 
-L/lib/../lib -L/usr/lib/aarch64-linux-gnu -L/usr/lib/../lib 
-L/usr/lib/gcc/aarch64-linux-gnu/6/../../.. alloc.o bind.o dag.o decl.o 
enode.o error.o backend-arm.o intrin.o event.o expr.o gen.o init.o input.o 
lex.o arm64.o list.o operators.o main.o ncpp.o output.o simp.o msg.o 
callwin64.o bitmasktable.o table.o stmt.o string.o stab.o sym.o Tree.o 
types.o analysis.o asm.o inline.o -lm ../lcclib.a ../bfd/libbfd.a 
../asm/libopcodes.a -Map=lcc.map -v -lgcc --as-needed -lgcc_s --no-as-needed 
-lc -lgcc --as-needed -lgcc_s --no-as-needed 
/usr/lib/gcc/aarch64-linux-gnu/6/crtendS.o 
/usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/crtn.o


I think the problems lies in this mysterious "pie" option:

... --fix-cortex-a53-843419 -pie -o lcc...


"PIE" could stand for Position Independent Executable.

How could I get rid of that?


-no-pie probably.

Which text file where is responsible for adding 
this "pie" option to the ld command line?


I am not so well versed in gcc's internals to figure out without your help.


Does it show when you run "gcc -dumpspecs"? If so you could provide a 
different specs file. Otherwise, you could check the patches that your 
distribution applies to gcc, one of them likely has "pie" in its name.


Easiest is likely to build gcc from the official sources, which shouldn't 
use pie by default.


--
Marc Glisse


Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Marc Glisse

On Wed, 11 Oct 2017, David Malcolm wrote:


On Wed, 2017-10-11 at 11:18 +0200, Paulo Matos wrote:


On 11/10/17 11:15, Christophe Lyon wrote:


You can have a look at
https://git.linaro.org/toolchain/gcc-compare-results.git/
where compare_tests is a patched version of the contrib/ script,
it calls the main perl script (which is not the prettiest thing :-)



Thanks, that's useful. I will take a look.


You may also want to look at this script I wrote:

 https://github.com/davidmalcolm/jamais-vu

(it has Python classes for working with DejaGnu output)


By the way, David, how do you handle comparisons for the jit testsuite? jv 
gives


Tests that went away in build/gcc/testsuite/jit/jit.sum: 81
---

 PASS:  t
 PASS:  test-
 PASS:  test-arith-overflow.c
 PASS:  test-arith-overflow.c.exe iteration 1 of 5: verify_uint_over
 PASS:  test-arith-overflow.c.exe iteration 2 of 5: verify_uint_o
 PASS:  test-arith-overflow.c.exe iteration 3 of 5: verify
[...]

Tests appeared in build/gcc/testsuite/jit/jit.sum: 78
-

 PASS:  test-arith-overflow.c.exe iteration 1
 PASS:  test-arith-overflow.c.exe iteration 2 of
 PASS:  test-arith-overflow.c.exe iteration 4 of 5: verify_u
 PASS:  test-combination.
 PASS:  test-combination.c.exe it
[...]

The issue is more likely in the testsuite, but I assume you have a 
workflow that allows working around the issue?


--
Marc Glisse


Re: gcc Bugzilla corrupt again?

2017-11-22 Thread Marc Glisse

On Thu, 23 Nov 2017, Jeffrey Walton wrote:


On Thu, Nov 23, 2017 at 1:51 AM, Andrew Roberts  wrote:

I was adding a comment to bug:

81616 - Update -mtune=generic for the current Intel and AMD processors

After clicking add comment it took me an an entirely different bug.

I tried to add the comment again, and got a message about a "Mid Air
Collision"

The comment ended up the system twice (Comment 4/5).

But I've never seen it take me to a different bug after adding a comment
before.


The "take me to a different bug" after submitting been happening for a while.


In preferences, you get to choose the behavior "After changing a bug". 
Default is "Show next bug in my list".


--
Marc Glisse


Re: gcc 7.3: Replacing global operator new/delete in shared libraries

2018-02-07 Thread Marc Glisse

On Tue, 6 Feb 2018, Paul Smith wrote:


My environment has been using GCC 6.2 (locally compiled) on GNU/Linux
systems.  We use a separate heap management library (jemalloc) rather
than the libc allocator.  The way we did this in the past was to
declare operator new/delete (all forms) as inline functions in a header


Are you sure you still have all forms? The aligned versions were added in 
gcc-7 IIRC.



and ensure that this header was always the very first thing in every
source file, before even any standard header files.  I know that inline
operator new/delete isn't OK in the C++ standard, but in fact it has
worked for us on the systems we care about.


Inline usually works, but violating the ODR is harder... I would at least 
use the always_inline attribute to improve chances (I assume that static 
(or anonymous namespace) versions wouldn't work), since the optimizer may 
decide not to inline otherwise. Something based on visibility should be 
somewhat safer. But it still seems dangerous, some global libstdc++ object 
might be initialized using one allocator then used with another one...



I'm attempting a toolchain upgrade which is switching to GCC 7.3 /
binutils 2.30 (along with many other updates).

Now when I run our code, I get a core on exit.  It appears an STL
container delete is invoking libc free() with a pointer to memory
allocated by jemalloc.


An example would help the discussion.


My question is, what do I need to do to ensure this behavior persists
if I create a global operator new/delete?

Is it sufficient to ensure that the symbol for our shared library
global new/delete symbols are hidden and not global, using a linker map
or -fvisibility=hidden?


I think so (hidden implies not-interposable, so locally bound), but I 
don't have much experience there.


--
Marc Glisse


Re: gdb 8.x - g++ 7.x compatibility

2018-02-07 Thread Marc Glisse

On Wed, 7 Feb 2018, Simon Marchi wrote:


On 2018-02-07 12:08, Jonathan Wakely wrote:

Why would they not have a mangled name?


Interesting.  What do they look like, and in what context do they appear?


Anywhere you need a name for linkage purposes, such as in a function
signature, or as a template argument of another type, or in the
std::type_info::name() for the type etc. etc.

$ g++ -o test.o -c -x c++ - <<< 'struct X {}; void f(X) {}
template struct Y { }; void g(Y) {}' && nm
--defined-only test.o
 T _Z1f1X
0007 T _Z1g1YI1XE

The mangled name for X is "X" and the mangled name for Y is "YI1XE"
which includes the name "X".

This isn't really on-topic for solving the GDB type lookup problem though.


Ah ok, the class name appears mangled in other entities' mangled name.  But 
from what I understand there's no mangled name for the class such that


 echo  | c++filt

outputs the class name (e.g. "Foo<10>").  That wouldn't make sense, since 
there's no symbol for the class itself.


$ echo _Z1YI1XE | c++filt
Y

--
Marc Glisse


Re: gcc 7.3: Replacing global operator new/delete in shared libraries

2018-02-07 Thread Marc Glisse

On Wed, 7 Feb 2018, Paul Smith wrote:


My question is, what do I need to do to ensure this behavior
persists if I create a global operator new/delete?

Is it sufficient to ensure that the symbol for our shared library
global new/delete symbols are hidden and not global, using a linker
map or -fvisibility=hidden?


I think so (hidden implies not-interposable, so locally bound), but
I don't have much experience there.


OK I'll pursue this for now.


I answered too fast. It isn't just new/delete that need to be hidden. It 
is also anything that uses them and might be used in both contexts. For 
instance, std::allocator::allocate is an inline function that calls 
operator new. You get one version that calls new1, and one version that 
calls new2. If you don't do anything special, the linker keeps only one 
(more or less arbitrarily). So I believe you need -fvisibility=hidden to 
hide everything but a few carefully chosen interfaces.


--
Marc Glisse


Re: why C++ cannot alias an inline function, C can ?

2018-04-01 Thread Marc Glisse

(should have been on gcc-help I believe)

On Sun, 1 Apr 2018, Max Filippov wrote:


On Sun, Apr 1, 2018 at 5:33 AM, Jason Vas Dias  wrote:

Aha!  But how to determine the mangled name beforehand ?

Even if I compile the object without the alias, then inspect
the object with objdump, there is no mangled symbol
_ZL3foov defined in the object file .

So I must run some name mangler /  generator as a
pre-processing step to get the correct mangled name
string ?


I guess so. Or you could define foo with C linkage:

extern "C" {
 static inline __attribute__((always_inline))
 void foo(void){}
};

static inline __attribute__((always_inline,alias("foo")))
void bar(void);


Or you can use an asm label to specify some arbitrary name.

--
Marc Glisse


Re: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG

2018-05-08 Thread Marc Glisse

On Tue, 8 May 2018, Jonathan Wakely wrote:


On 8 May 2018 at 14:00, Jonathan Wakely wrote:

On 8 May 2018 at 13:44, Stephan Bergmann wrote:

I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A
process loads two dynamic libraries A and B both using std::regex, and A is
compiled without -D_GLIBCXX_DEBUG while B is compiled with -D_GLIBCXX_DEBUG.


This is only supported in very restricted cases.


B creates an instance of std::regex, which internally creates a
std::shared_ptr>>,
where _NFA has various members of std::__debug::vector type (but which isn't
reflected in the mangled name of that _NFA instantiation itself).

Now, when that instance of std::regex is destroyed again in library B, the
std::shared_ptr>>::~shared_ptr
destructor (and functions it in turn calls) that happens to get picked is
the (inlined, and exported due to default visibility) instance from library
A.  And that assumes that that _NFA instantiation has members of non-debug
std::vector type, which causes a crash.

Should it be considered a bug that such mixture of debug and non-debug
std::regex usage causes ODR violations?


Yes, but my frank response is "don't do that".

The right fix here might be to ensure that _NFA always uses the
non-debug vector even in Debug Mode, but I'm fairly certain there are
other similar problems lurking.


N.B. I think this discussion belongs on the libstdc++ list.


Would it make sense to use the abi_tag attribute to help with that? (I 
didn't really think about it, maybe it doesn't)


"don't do that" remains the most sensible answer.

--
Marc Glisse


Re: About Bug 52485

2018-05-09 Thread Marc Glisse

On Wed, 9 May 2018, SHIH YEN-TE wrote:

Want to comment on "Bug 52485 - [c++11] add an option to disable c++11 
user-defined literals"


It's a pity GCC doesn't support this, which forces me to give up 
introducing newer C++ standard into my project. I know it is ridiculous, 
but we must know the real world is somehow ridiculous as well as nothing 
is perfect.


You have the wrong approach.

Apparently, you are using an unmaintained library (if it was maintained, 
it would be compatible with C++11 by now), so there is no problem 
modifying it, especially just to add a few spaces. A single run of 
clang-tidy would likely fix all of them for you.


--
Marc Glisse


Re: Unused __builtin_ia32_* builtins

2018-05-10 Thread Marc Glisse

On Thu, 10 May 2018, Jakub Jelinek wrote:


for i in `grep __builtin_ia32 config/i386/i386-builtin.def | sed 
's/^.*__builtin_ia32_/__builtin_ia32_/;s/".*$//' | sort -u`; do grep -q -w $i 
config/i386/*.h || echo $i; done

shows many builtins not used in any of the intrinsic headers.

I believe for the __builtin_ia32_* builtins we only support the intrinsics
and not the builtins directly.  Can we remove some of these (not necessarily
all of them), after checking when and why they were added and if they were
added for the intrinsic headers which now e.g. uses generic vector arith
instead?


When I removed their use in the intrinsic headers, I tried to remove them, 
but Ada people asked us to keep them

https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00843.html

--
Marc Glisse


Re: Generating gimple assign stmt that changes sign

2018-05-21 Thread Marc Glisse

On Tue, 22 May 2018, Kugan Vivekanandarajah wrote:


Hi,

I am looking to introduce ABSU_EXPR and that would create:

unsigned short res = ABSU_EXPR (short);

Note that the argument is signed and result is unsigned. As per the
review, I have a match.pd entry to generate this as:
(simplify (abs (convert @0))
(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)))
 (convert (absu @0


Not sure, but we may want a few more restrictions on this transformation.


Now when gimplifying the converted tree, how do we tell that ABSU_EXPR
will take a signed arg and return unsigned. I will have other match.pd
entries so this will be generated while in gimple.passes too. Should I
add new functions in gimple.[h|c] for this.

Is there any examples I can refer to. Conversion expressions seems to
be the only place where sign can change in gimple assignment but they
are very specific.


You'll probably want to patch genmatch.c (near get_operand_type maybe?) so 
it doesn't try to guess that the type of absu is the same as its argument. 
You can also specify a type in transformations, look for :utype or :etype 
in match.pd.


--
Marc Glisse


Re: How to get GCC on par with ICC?

2018-06-08 Thread Marc Glisse

On Fri, 8 Jun 2018, Steve Ellcey wrote:


On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote:

 
When we do our own comparisons of GCC vs. ICC on benchmarks
like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC
(in fact it even trails in some benchmarks) unless you get to
"SPEC tricks" like data structure re-organization optimizations that
probably never apply in practice on real-world code (and people
should fix such things at the source level being pointed at them
via actually profiling their codes).


Richard,

I was wondering if you have any more details about these comparisions
you have done that you can share?  Compiler versions, options used,
hardware, etc  Also, were there any tests that stood out in terms of
icc outperforming GCC?

I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and
a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4).
I used '-xHost -O3' for icc and '-march=native -mtune=native -O3'
for gcc.


You should use -Ofast for gcc. As mentionned earlier in the discussion, 
ICC has some equivalent of -ffast-math by default.



The int rate numbers (running 1 copy only) were not too bad, GCC was
only about 2% slower and only 525.x264_r seemed way slower with GCC.
The fp rate numbers (again only 1 copy) showed a larger difference, 
around 20%.  521.wrf_r was more than twice as slow when compiled with
GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed
significant slowdowns when compiled with GCC vs. ICC.


--
Marc Glisse


Re: -Wclass-memaccess warning should be in -Wextra, not -Wall

2018-07-08 Thread Marc Glisse

On Fri, 6 Jul 2018, Martin Sebor wrote:


On 07/05/2018 05:14 PM, Soul Studios wrote:

Simply because a struct has a constructor does not mean it isn't a
viable target/source for use with memcpy/memmove/memset.


As the documentation that Segher quoted explains, it does
mean exactly that.

Some classes have user-defined copy and default ctors with
the same effect as memcpy/memset.  In modern C++ those ctors
should be defaulted (= default) and GCC should emit optimal
code for them.


What if I want to memcpy a std::pair?

Some classes may have several states, some that are memcpy-safe, and some 
that are not. A user may know that at some point in their program, all the 
objects in a given array are safe, and want to memcpy the whole array 
somewhere.


memcpy can also be used to work around the lack of a destructive move in 
C++. For instance, vector>::resize could safely use memcpy (and 
skip destroy before deallocate). In this particular case, we could imagine 
at some point in the future that the compiler would notice it is 
equivalent to memcpy+bzero, and then that the bzero is dead, but there are 
more complicated use cases for destructive move.



In fact, in loops they can result in more
efficient code than the equivalent memset/memcpy calls.  In
any case, "native" operations lend themselves more readily
to code analysis than raw memory accesses and as a result
allow all compilers (not just GCC) do a better a job of
detecting bugs or performing interesting transformations
that they may not be able to do otherwise.


Having benchmarked the alternatives memcpy/memmove/memset definitely
makes a difference in various scenarios.


Please open bugs with small test cases showing
the inefficiencies so the optimizers can be improved.


Some already exist (PR 86024 seems related, there are probably some closer 
matches), but indeed more would be helpful.


--
Marc Glisse


Re: -Wclass-memaccess warning should be in -Wextra, not -Wall

2018-07-08 Thread Marc Glisse

On Sun, 8 Jul 2018, Jason Merrill wrote:


On Sun, Jul 8, 2018 at 6:40 PM, Marc Glisse  wrote:

On Fri, 6 Jul 2018, Martin Sebor wrote:

On 07/05/2018 05:14 PM, Soul Studios wrote:


Simply because a struct has a constructor does not mean it isn't a
viable target/source for use with memcpy/memmove/memset.



As the documentation that Segher quoted explains, it does
mean exactly that.

Some classes have user-defined copy and default ctors with
the same effect as memcpy/memset.  In modern C++ those ctors
should be defaulted (= default) and GCC should emit optimal
code for them.


What if I want to memcpy a std::pair?


That's fine, since the pair copy constructor is defaulted, and trivial
for pair.


G++ does currently warn for

#include 
#include 
typedef std::pair P;
void f(P*d, P const*s){ std::memcpy(d,s,sizeof(P)); }

because copy-assignment is not trivial.

IIRC std::pair and std::tuple are not as trivial as they could be for ABI 
reasons.


Boost.Container chose to disable the warning ( 
https://github.com/boostorg/container/commit/62a8beb0f12242fb1e99daa98533ce74e735 
) instead of making their version of pair trivial. I don't know why, but 
maybe that was to avoid a mess of #ifdef to maintain a C++03 version of 
the code.



Some classes may have several states, some that are memcpy-safe, and some
that are not. A user may know that at some point in their program, all the
objects in a given array are safe, and want to memcpy the whole array
somewhere.


The user may know that, but the language only defines the semantics of
memcpy for trivially copyable classes.  If you want to assume that the
compiler will do what you expect with this instance of undefined
behavior, you can turn off the warning.  You may well be right, but I
don't think it follows that putting this warning about undefined
behavior in -Wall is wrong.


(note that I am not the original reporter, I am only trying to help find 
examples)
I don't mind the warning so much, I am more scared of the optimizations 
that may follow.



memcpy can also be used to work around the lack of a destructive move in
C++.


I wonder what you mean by "the lack of a destructive move in C++",
given that much of C++11 was about supporting destructive move
semantics.


There is a misunderstanding here. C++11 added move semantics that one 
might call "conservative", i.e. the moved-from object is still alive and 
one should eventually run its destructor. "destructive move" is used in 
some papers / blogs to refer to a move that also destructs the original 
object. For some types that are not trivially default constructible (like 
libstdc++'s std::deque IIRC), a conservative move is still expensive while 
a destructive move is trivial (memcpy). Libstdc++'s std::string is one of 
the rare types that are not trivially destructively movable (it can 
contain a pointer to itself). Most operations in std::vector could use 
a destructive move of V very naturally.


The denomination conservative/destructive is certainly not canonical, I 
don't know if there are better words to describe it.



For instance, vector>::resize could safely use memcpy (and
skip destroy before deallocate). In this particular case, we could imagine
at some point in the future that the compiler would notice it is equivalent
to memcpy+bzero, and then that the bzero is dead, but there are more
complicated use cases for destructive move.


Indeed, resizing a vector> will loop over the outer vector
calling the move constructor for each inner vector, which will copy
the pointer and zero out the moved-from object, which the optimizer
could then coalesce into memcpy/bzero.  This sort of pattern is common
enough in C++11 containers that this seems like an attractive
optimization, if we don't already perform it.

What more complicated uses don't reduce to memcpy/bzero, but you would
still want to use memcpy for somehow?


Noticing that it reduces to memcpy can be hard. For std::deque, you have 
to cancel a new/delete pair (which we still do not handle), and for that 
you may first need some loop fusion to put the new and delete next to each 
other. For GMP's mpz_class, the allocation is hidden in opaque mpz_init / 
mpz_clear functions, so the compiler cannot simplify move+destruct into 
memcpy.


I would certainly welcome optimizer improvements that make it less useful 
to specialize the library, but some things are easier to do at the level 
of the library.


--
Marc Glisse


Re: -Wclass-memaccess warning should be in -Wextra, not -Wall

2018-07-10 Thread Marc Glisse

On Mon, 9 Jul 2018, Martin Sebor wrote:


My point to all of this (and I'm annoyed that I'm having to repeat it
again, as it my first post wasn't clear enough - which it was) was that
any programmer using memcpy/memmove/memset is going to know what they're
getting into.


No, programmers don't always know that.  In fact, it's easy even
for an expert programmer to make the mistake that what looks like
a POD struct can safely be cleared by memset or copied by memcpy
when doing so is undefined because one of the struct members is
of a non-trivial type (such a container like string).


Indeed, especially since some other compilers have implemented string in a 
way that is safe (even if theoretically UB) to memset/memcpy.



Therefore it makes no sense to penalize them by getting them to write
ugly, needless code - regardless of the surrounding politics/codewars.


Quite a lot of thought and discussion went into the design and
implementation of the warning, so venting your frustrations or
insulting those of us involved in the process is unlikely to
help you effect a change.  To make a compelling argument you
need to provide convincing evidence that we have missed
an important use case.  The best way to do that in this forum
is with test cases and/or real world designs that are hampered
by our choice.  That's a high bar to meet for warnings whose
documented purpose is to diagnose "constructions that some
users consider questionable, and that are easy to avoid (or
modify to prevent the warning)."


I guess the phrasing is a bit weak, "some users" obviously has to refer to 
a significant proportion of users, "easy to avoid" cannot have too many 
drawbacks (in particular, generated code should be of equivalent quality), 
etc.


-Wclass-memaccess fits the "easy to avoid" quite well, since a simple cast 
disables it. -Wmaybe-uninitialized is much worse: it produces many false 
positives, that change with every release and are super hard to avoid. And 
even in the "easy to avoid" category where we don't want to litter the 
code with casts to quiet the warnings, I find -Wsign-compare way worse in 
practice than -Wclass-memaccess.


--
Marc Glisse


Re: r227907 and AIX 5.[23]

2018-07-25 Thread Marc Glisse

On Wed, 25 Jul 2018, David Edelsohn wrote:


AIX 5.3 no longer is under supported or maintained.


If gcc-5+ fails to build on AIX 5.3 and patches to make it compile are
not welcome, maybe some cleanup removing aix43.h, aix5*.h and whatever
configure bits could help clarify things? Only when someone has the time, 
of course.


--
Marc Glisse


Re: Can offsetting a non-null pointer result in a null one?

2018-08-20 Thread Marc Glisse

On Mon, 20 Aug 2018, Richard Biener wrote:


On Mon, Aug 20, 2018 at 10:53 AM Andreas Schwab  wrote:


On Aug 20 2018, Richard Biener  wrote:


Btw, I can't find wording in the standards that nullptr + 1 is
invoking undefined behavior,
that is, that pointer arithmetic is only allowed on pointers pointing
to a valid object.
Any specific pointers?


All of 5.7 talks about pointers pointing to objects (except when adding
0).


Thanks all for the response.  Working on a patch introducing infrastructure
for this right now but implementing this we'd need to make sure to not
hoist pointer arithmetic into blocks that might otherwise not be executed.
Like

  if (p != 0)
   {
 q = p + 1;
 foo (q);
   }

may not be optimized to

 q = p + 1;
 if (p != 0)
   foo (q);

because then we'd elide the p != 0 check.  I'm implementing the infrastructure
to assume y != 0 after a stmt like z = x / y; where we'd already avoid
such hoisting
because it may trap at runtime.

Similar "issues" would be exposed when hoisting undefined overflow
stmts and we'd
derive ranges for their operands.

So I'm not entirely sure it's worth the likely trouble.


The opposite direction may be both easier and safer, even if it won't
handle everything:

P p+ N is nonnull if P or N is known to be nonnull
(and something similar for &p->field and others)

--
Marc Glisse


Re: Can offsetting a non-null pointer result in a null one?

2018-08-20 Thread Marc Glisse

On Mon, 20 Aug 2018, Richard Biener wrote:


P p+ N is nonnull if P or N is known to be nonnull
(and something similar for &p->field and others)


But we already do that.


Oups... I never noticed, I should have checked.


 else if (code == POINTER_PLUS_EXPR)
   {
 /* For pointer types, we are really only interested in asserting
whether the expression evaluates to non-NULL.  */
 if (range_is_nonnull (&vr0) || range_is_nonnull (&vr1))
   set_value_range_to_nonnull (vr, expr_type);
 else if (range_is_null (&vr0) && range_is_null (&vr1))
   set_value_range_to_null (vr, expr_type);
 else
   set_value_range_to_varying (vr);
   }

Ah, range_is_nonnull (&vr1) is only matching ~[0,0].  We'd
probably want VR_RANGE && !range_includes_zero_p here.  That
range_is_nonnull is probably never true due to canonicalization.


That explains it. Yes please. I am surprised there isn't a helper like 
range_includes_zero_p or value_inside_range that takes a value_range* as 
argument so we don't have to worry about the type of range (the closest 
seems to be value_ranges_intersect_p with a singleton range, but that 
function seems dead and broken). When POINTER_PLUS_EXPR is changed to take 
a signed argument, your suggested test will need updating :-(


--
Marc Glisse


__builtin_clzll and uintmax_t

2011-03-05 Thread Marc Glisse

Hello,

the following question came up for a libstdc++ patch. We have a variable 
of type uintmax_t and want to count the leading zeros. Can we just call 
__builtin_clzll on it?


In particular, can uintmax_t be larger than unsigned long long in gcc? Is 
__builtin_clzll available on all platforms? Is there a good reason to use 
__builtin_clzl instead on platforms where long and long long have the same 
size?


In case it matters, this is strictly for compile-time computations 
(templates, constexpr).


--
Marc Glisse


Re: __builtin_clzll and uintmax_t

2011-03-06 Thread Marc Glisse


Coucou FX,

On Sat, 5 Mar 2011, FX wrote:


uintmax_t is the largest of the standard unsigned C types, so it cannot be 
larger than unsigned long long.


That's a gcc property then. The C99 standard only guarantees that 
uintmax_t is at least as large as unsigned long long, but it is allowed to 
be some other larger type:


"The following type designates an unsigned integer type capable of 
representing any value of any unsigned integer type: uintmax_t"




On x86_64, for example:


#include 
#include 

int main (void)
{
  printf ("%lu ", sizeof (uintmax_t));
  printf ("%lu ", sizeof (int));
  printf ("%lu ", sizeof (long int));
  printf ("%lu ", sizeof (long long int));
  printf ("%lu\n", sizeof (__int128));
}


gives : 8 4 8 8 16


I am not sure how legal that is. __int128 is an extended signed integer 
type, and thus the statement about intmax_t should apply to it as well. So 
gcc is just pretending that __int128 is not really there.



Is __builtin_clzll available on all platforms?


Yes, we emit calls to this built-in unconditionally in the Fortran 
front-end, and it has caused no trouble.


Thank you, that's the best guarantee I could ask for about the existence 
of __builtin_clzll.


--
Marc Glisse


Re: Environment setting LDFLAGS ineffective after installation stage 1. Any workaround?

2011-05-31 Thread Marc Glisse

(gcc-help ?)

On Tue, 31 May 2011, Thierry Moreau wrote:

But with the gcc (latest 4.6.1 snapshot), -rpath (requested through LDFLAGS 
as indicated above) is effective only for executables built in stage 1 (and 
fixincl), but not for the installed gcc executables.


Is it intentional that the LDFLAGS environment setting is partially effective 
during gcc build?


Yes. For further stages, there is BOOT_LDFLAGS. There is also a configure 
option with a similar name.

--with-stage1-ldflags=
--with-boot-ldflags=

see:
http://gcc.gnu.org/install/configure.html

--
Marc Glisse


Re: badly broken?!?

2011-06-06 Thread Marc Glisse

On Mon, 6 Jun 2011, Paolo Carlini wrote:

I just built Rev 174696 and if I run the following snippet in the bash shell 
of an x86_64-linux machine, today I don't get any meaningful output, in 
particular I don't get 'ok', instead '|+000|', which I have no idea what it 
means:


#include 

int main()
{
 std::cout << "ok\n";
}

Can anybody else see this crazy breakage? May be a few days old, AFAICS. 
4_6-branch is perfectly fine.


174683 here on linux x64 and everything is fine.

--
Marc Glisse


  1   2   3   >