[Bug libstdc++/94049] New: For better diagnostics CPOs should not use concepts for operator()

2020-03-05 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94049

Bug ID: 94049
   Summary: For better diagnostics CPOs should not use concepts
for operator()
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following code:

#include 
void foo0() {
int t = 0;
std::ranges::begin(t);
}


Diagnostics for it is mostly unreadable and points to the internals of
libstdc++ https://godbolt.org/z/c-RwuY .

This could be significantly improved. Right now the `requires` clause on
`std::ranges::__cust_access::_Begin::operator()` duplicates the body of the
function. So instead of such duplication all the requirements could be just
asserted in the body:


template
constexpr auto
operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp>()) {
  static_assert(__maybe_borrowed_range<_Tp>, "Not a borrowed range or
lvalue");
  if constexpr (is_array_v>) {
...
  } else if constexpr (__member_begin<_Tp>)
return __t.begin();
  else if constexpr (__adl_begin<_Tp>)
return begin(__t);
  else
static_assert(!sizeof(_Tp), "_Tp should have either a member begin() or an
begin(_Tp&) should be in the namespace of _Tp");
}


This gives a much better diagnostics: https://godbolt.org/z/kmLGb7
All the CPOs could be improved in that manner

[Bug middle-end/94146] New: Merging functions with same bodies stopped working

2020-03-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94146

Bug ID: 94146
   Summary: Merging functions with same bodies stopped working
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

extern int x , y;

int ternary(int i) { return i > 0 ? x : y; }
int ternary2(int i) { return i > 0 ? x : y; }


GCC9 was merging the functions with -O2:

ternary2(int):
jmp ternary(int)


With GCC10 merging at -O2 is missing and function bodies are duplicated even
for very big functions: https://godbolt.org/z/2kH8VR

[Bug c++/67302] [C++14] copy elision in return (expression)

2020-06-30 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67302

Antony Polukhin  changed:

   What|Removed |Added

 CC||antoshkka at gmail dot com

--- Comment #3 from Antony Polukhin  ---
Can reproduce with GCC 10.1 https://godbolt.org/z/tYvccG

[Bug c++/96004] New: Copy elision with conditional

2020-06-30 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96004

Bug ID: 96004
   Summary: Copy elision with conditional
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


struct Struct {
Struct() = default;
Struct(Struct&&);
};

Struct question10(bool b) {
if (b) {
Struct s{};
return s;
} else {
return {};
}
}


It is possible to elide move constructor call as the lifetimes of object `s`
and `return {}` do not intersect.

(some other compilers already do copy elision in that place
https://godbolt.org/z/wdpLkT )

[Bug libstdc++/96088] New: Range insertion into unordered_map is less effective than a loop with insertion

2020-07-06 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96088

Bug ID: 96088
   Summary: Range insertion into unordered_map is less effective
than a loop with insertion
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the function f1:

static constexpr std::initializer_list> lst = {
{"long_str_for_dynamic_allocating", 1}};

void f1() {
std::unordered_map m(1);
m.insert(lst.begin(), lst.end());
}


It creates a temporary and as a result makes 4 allocations. Meanwhile f2 does
not create a temporary and does aonly 3 allocations:

void f2() {
std::unordered_map m(1);
for (const auto& x : lst) {
m.insert(x);
}
}


Godbolt playground: https://godbolt.org/z/VapmBU

[Bug c++/96121] New: Uninitialized variable copying not diagnosed

2020-07-08 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121

Bug ID: 96121
   Summary: Uninitialized variable copying not diagnosed
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

struct A { A(); };
struct B { B(A); };

struct composed2 {
B b_;
A a_;
composed2() : b_(a_)  {}
};


GCC does not diagnose the uninitialized variable `a_` usage with -Wall and
-Wextra. Some other compiler do diagnose:

warning: field 'a_' is uninitialized when used here [-Wuninitialized]
composed2() : b_(a_)  {}
 ^

Godbolt playground: https://godbolt.org/z/AbqzjR

[Bug c++/96121] Uninitialized variable copying not diagnosed

2020-07-08 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121

--- Comment #4 from Antony Polukhin  ---
Adding members and usage does not make a difference
https://godbolt.org/z/VommHu

struct A {
  A();
  int i;
};
struct B {
  B(A);
  int i;
};

struct composed2 {
  B b_;
  A a_;
  composed2() : b_(a_) {}
};

auto test() {
return composed2{};
}

[Bug c++/96452] New: Narrowing conversion is not rejected

2020-08-04 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96452

Bug ID: 96452
   Summary: Narrowing conversion is not rejected
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: accepts-invalid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

float test_main(double d) {
float f2{d};
return f2;
}


Narrowing of double to float in brace-init is not rejected, only a warning is
issued. 


Godbolt playground: https://godbolt.org/z/fzPT8r

[Bug c++/96452] Narrowing conversion is not rejected

2020-08-04 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96452

--- Comment #5 from Antony Polukhin  ---
Hm... My reading of http://eel.is/c++draft/dcl.init.list#3.9 is that the
program is ill-formed for narrowing conversions. And
http://eel.is/c++draft/dcl.init.list#7.2 states that conversion from double to
float is a narrowing one, except where the source is a constant expression.

Am I missing something?

[Bug c++/96452] Narrowing conversion is not rejected

2020-08-04 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96452

--- Comment #7 from Antony Polukhin  ---
(In reply to Jonathan Wakely from comment #6)
> Your understanding of what a compiler needs to do for ill-formed programs is
> wrong.

You're right, thank you!

[Bug c++/92375] New: Warn on suspicious taking of function address instead of calling a function

2019-11-05 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92375

Bug ID: 92375
   Summary: Warn on suspicious taking of function address instead
of calling a function
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the simple example:

bool function();

bool test() {
bool result = function;
if (result) {
return 1;
} else {
return 2;
}
}


Students and scholars often forget to put braces to actually call the function.
Unfortunately GCC does not give a warning that is implemented in other
compilers:

:4:19: warning: address of function 'function' will always evaluate to
'true' [-Wpointer-bool-conversion]
bool result = function;
 ~~   ^~~~

:4:19: note: prefix with the address-of operator to silence this
warning
bool result = function;
  ^
  &

:4:19: note: suffix with parentheses to turn this into a function call
bool result = function;
  ^
  ()

Please, add a warning.

[Bug middle-end/92455] New: Unnecessary memory read in a loop

2019-11-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

Bug ID: 92455
   Summary: Unnecessary memory read in a loop
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

typedef struct {
int* ptr_; 
} int_ptr;  

int_ptr f1(int_ptr* x) {
int_ptr* max = x;
for (int i =0 ; i < 5; ++ i) {
++ x;
if (*max->ptr_ < *x->ptr_) {
max = x;
}
}
return *max;
}

GCC with -O2 generates the following assembly:

f1(int_ptr*):
  lea rsi, [rdi+40]
  mov rax, rdi
.L3:
  mov rcx, QWORD PTR [rax]  ; <== This could be removed from the loop
  mov rdx, QWORD PTR [rdi+8]
  add rdi, 8
  mov edx, DWORD PTR [rdx]
  cmp DWORD PTR [rcx], edx
  cmovl rax, rdi
  cmp rsi, rdi
  jne .L3
  mov rax, QWORD PTR [rax]
  ret


If we rewrite the example to avoid int_ptr:

int* f2(int** x) {
int** max = x;
for (int i =0 ; i < 5; ++ i) {
++ x;
if (**max < **x) {
max = x;
}
}
return *max;
}

Then there'll be less memory accesses in a loop:
f2(int**):
  mov rax, QWORD PTR [rdi] ; <=== Not in a loop any more
  lea rcx, [rdi+40]
.L8:
  mov rdx, QWORD PTR [rdi+8]
  add rdi, 8
  mov esi, DWORD PTR [rdx]
  cmp DWORD PTR [rax], esi
  cmovl rax, rdx
  cmp rcx, rdi
  jne .L8
  ret


Please improve the memory accesses for the first case

Godbolt playground: https://godbolt.org/z/CaGbT2

[Bug middle-end/92455] Unnecessary memory read in a loop

2019-11-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

--- Comment #2 from Antony Polukhin  ---
Can the -ftree-partial-pre flag be enabled by default for -O2?

[Bug middle-end/92455] Unnecessary memory read in a loop

2019-11-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92455

--- Comment #4 from Antony Polukhin  ---
(In reply to Richard Biener from comment #3)
> But maybe
> you can provide benchmark data (including compile-time/memory-use figures)?

OK. Is there any GCC specific tool or flag for that?

[Bug target/92592] New: Redundant comparison after subtraction on x86

2019-11-19 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92592

Bug ID: 92592
   Summary: Redundant comparison after subtraction on x86
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

int sample(int a, int b) {
unsigned diff = (unsigned)b - (unsigned)a;
unsigned sign_bit = b < a;
return diff + sign_bit;
}

With -O2 and -O3 GCC produces the assembly:

sample(int, int):
  mov eax, esi  ; <=== not required
  xor edx, edx
  sub eax, edi
  cmp esi, edi  ; <=== not required
  setl dl
  add eax, edx
  ret

However, `sub` changes the status flags and there's no need to call `cmp`:

sample(int, int):
  xor eax, eax
  sub esi, edi
  setl al
  add eax, esi
  ret


The above sample is a minimized version of std::midpoint.

Godbolt playground: https://godbolt.org/z/j6FGq4

[Bug c++/90647] Warn on returning a lambda with captured local variables

2019-11-30 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90647

--- Comment #2 from Antony Polukhin  ---
-Wreturn-local-addr looks good to me

[Bug c++/66139] destructor not called for members of partially constructed anonymous struct/array

2019-12-13 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66139

Antony Polukhin  changed:

   What|Removed |Added

 CC||antoshkka at gmail dot com

--- Comment #16 from Antony Polukhin  ---
Can we increase the priority of this issue to P1 or P2? It affects the very
basics of the C++.

BTW, I've minimized example. It aborts on every version of GCC with -std=c++11,
passes on Clang:


int constructed = 0;

class lock_guard_ext{
public:
lock_guard_ext() { ++constructed; }
~lock_guard_ext() { --constructed; }
};

struct Access {
lock_guard_ext lock;
int value;
};

int t() {
throw 0;
}

Access foo1() {
return { {}, t() };
}

int main () {
try {
foo1();
} catch (int) {}
if (constructed != 0)
__builtin_abort();
}

[Bug c++/93413] New: Destructor definition not found during constant evaluation

2020-01-24 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93413

Bug ID: 93413
   Summary: Destructor definition not found during constant
evaluation
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


struct Base {
constexpr virtual ~Base(){}
};

struct Derived: Base {};

constexpr Derived d;


Destructor for the `Derived` should be implicitly defined. However the above
snippet produces error message on GCC-10 with -std=c++2a flag: `error: 'virtual
constexpr Derived::~Derived()' used before its definition`.

[Bug c++/93414] New: Bad diagnostics for dynamic_cast during constant evaluation: implementation details leak out

2020-01-24 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93414

Bug ID: 93414
   Summary: Bad diagnostics for dynamic_cast during constant
evaluation: implementation details leak out
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example that attempts to throw a std::bad_cast:


struct Base {
constexpr virtual ~Base(){}
};

struct Derived: Base {
constexpr ~Derived(){}
};

constexpr const Derived& cast(const Base& b) {
return dynamic_cast(b); // error!
}

auto test() {
static constexpr Base b;
constexpr auto res = cast(b);
return res;
}


The error message is following:

: In function 'constexpr const Derived& cast(const Base&)':
:10:42: error: call to non-'constexpr' function 'void*
__cxa_bad_cast()'
   10 | return dynamic_cast(b); // error: call to
non-'constexpr' function 'void* __cxa_bad_cast()'


That's not informative: users usually know nothing about __cxa_bad_cast

Please change the error message to something more informative, for example
"During constexpr evaluation attempt to cast a variable `b` with typeid(b) ==
typeid(Base) to `Derived` was detected"

[Bug c++/55249] New: Multiple copy constructors for template class lead to link errors

2012-11-09 Thread antoshkka at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55249



 Bug #: 55249

   Summary: Multiple copy constructors for template class lead to

link errors

Classification: Unclassified

   Product: gcc

   Version: 4.6.3

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: antosh...@gmail.com





Created attachment 28647

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28647

gcc -v -save-temps -std=c++0x -Wall -Wextra main.cpp 1>output.txt 2>&1



Following code leads to linker errors in C++11 mode and in default mode

(requires replacement of std::array with boost::array):



#include 

#include 



template 

struct inner_type {



inner_type() {}

inner_type(inner_type& ) {}

inner_type(const inner_type& ) {}



~inner_type() {}

};



// Uncomment typedef to get undefined reference to 

// __uninitialized_copyILb0EE13__uninit_copy

// Can be workaround by marking inner_type copy constructors with noexcept

//typedef std::vector, 3> > type;



// Uncomment typedef to get undefined reference to 

// `inner_type::inner_type(inner_type const&)'

//typedef std::array, 3> type;



int main()

{

type t1;

type t2 = t1;

return 0;

}


[Bug c++/55249] Multiple copy constructors for template class lead to link errors

2012-11-09 Thread antoshkka at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55249



--- Comment #1 from Antony Polukhin  2012-11-09 
10:24:49 UTC ---

Created attachment 28648

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28648

Preprocessed file that triggers the bug


[Bug c++/55249] Multiple copy constructors for template class lead to link errors

2012-11-09 Thread antoshkka at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55249



--- Comment #4 from Antony Polukhin  2012-11-09 
12:28:11 UTC ---

(In reply to comment #3)



Yes, thanks.

`output.txt` will be the same.



Also, reproduced this bug on GCC 4.7.2:



[cc@ontos-soa-01 ~]$ gcc -v

Using built-in specs.

COLLECT_GCC=gcc

COLLECT_LTO_WRAPPER=/home/cc/dev/gcc-4.7.2/libexec/gcc/x86_64-unknown-linux-gnu/4.7.2/lto-wrapper

Target: x86_64-unknown-linux-gnu

Configured with: ../downloads/gcc-4.7.2/configure

--prefix=/home/cc/dev/gcc-4.7.2 --disable-multilib --enable-languages=c,c++

Thread model: posix

gcc version 4.7.2 (GCC)


[Bug c++/88445] New: noexcept(expr) should return true with -fno-exceptions

2018-12-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88445

Bug ID: 88445
   Summary: noexcept(expr) should return true with -fno-exceptions
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following example:

#include 

struct test {
test();
test(test&&);
test& operator=(test&&);
};

void test_func() {
static_assert(noexcept(test{test()}), "");
static_assert(std::is_nothrow_move_constructible::value, "");
static_assert(std::is_nothrow_move_assignable::value, "");
}


The static assertions fail with the -fno-exceptions flag however no exception
could happen because all the exceptions are disabled.

Please adjust the noexcept(expr) logic for the -fno-exceptions flag.

Such adjustment is essential because the standard library heavily relies on the
type traits and chooses the suboptimal algorithms in -fno-exceptions
environments.

[Bug c++/88445] noexcept(expr) should return true with -fno-exceptions

2018-12-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88445

--- Comment #1 from Antony Polukhin  ---
Hm... This was discussed in Clang and looks like such optimization could break
ABI and cause ODR violations https://bugs.llvm.org/show_bug.cgi?id=27442#c4

If nothing changed since then, I'm OK with closin this issue as Invalid or
Won't Fix.

[Bug libstdc++/87431] valueless_by_exception() should unconditionally return false if all the constructors are noexcept

2019-01-06 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87431

--- Comment #11 from Antony Polukhin  ---
Looks good.

Note that boost::variant went further: if all the types are nothrow movable
then variant always does the trick with moving from temporary. In that way
`valueless_by_exception()` like states never happen.

Such approach may not fit the libstdc++.

[Bug libstdc++/87431] valueless_by_exception() should unconditionally return false if all the constructors are noexcept

2019-01-06 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87431

--- Comment #13 from Antony Polukhin  ---
Yeah... but some whitelist of types to move could be hardcoded. For example
std::basic_string, std::vector, std::unique_ptr and std::shared_ptr could be
safely moved and `valueless_by_exception()` never happen for them. Those types
cover some of the popular std::variant usages and the overhead from
`valueless_by_exception()` will be avoided for those cases.

[Bug c++/53294] Optimize out some exception code

2019-01-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53294

--- Comment #3 from Antony Polukhin  ---
Any progress?

[Bug c++/89036] ICE if destructor has a requires

2019-01-24 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89036

--- Comment #1 from Antony Polukhin  ---
Compile with flags: -std=c++2a -fconcepts

[Bug c++/89036] New: ICE if destructor has a requires

2019-01-24 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89036

Bug ID: 89036
   Summary: ICE if destructor has a requires
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following code:


template
struct Y {
~Y() requires(true) = default;
~Y() requires(false) {}
};


causes ICE:


:6:27: internal compiler error: in add_method, at cp/class.c:1137

6 | ~Y() requires(false) {}

  |   ^

Please submit a full bug report,

with preprocessed source if appropriate.

See <https://gcc.gnu.org/bugs/> for instructions.

Compiler returned: 1

[Bug libstdc++/89120] New: std::minmax_element 2.5 times slower than hand written loop

2019-01-30 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89120

Bug ID: 89120
   Summary: std::minmax_element 2.5 times slower than hand written
loop
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

std::minmax_element is slow when there's a lot of data and it does not fit into
the CPU cache: http://quick-bench.com/Z0iRfbm2_S9KvQ1C92ydh8USF-8

[Bug libstdc++/89121] New: std::min_element (and max_element) 3.6 times slower than hand written loop

2019-01-30 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89121

Bug ID: 89121
   Summary: std::min_element (and max_element) 3.6 times slower
than hand written loop
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

std::min_element is slow when there's a lot of data and it does not fit into
the CPU cache: http://quick-bench.com/tlgxCx9CUMZgOfYbwhFaEI0WNOg

[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type

2018-05-03 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899

--- Comment #8 from Antony Polukhin  ---
(In reply to Richard Biener from comment #4)
> (In reply to Antony Polukhin from comment #2)
> > Looks like [class.ctor] paragraph 14 covers this case:
> > 
> > "During the construction of an object, if the value of the object or any of
> > its subobjects is accessed through
> > a glvalue that is not obtained, directly or indirectly, from the
> > constructor’s this pointer, the value of the
> > object or subobject thus obtained is unspecified."
> 
> Yeah, sounds like covering this case.  Thus we can make 'this' restrict in
> constructors (and possibly assignment operators if self-assignment is
> forbidden).

Self assignment is tricky and is OK to alias in most cases. It could be
restricted at some point after the `this != &rhs` check (as proposed in Bug
82918). 

I'd rather start by "restricting this" for copy and move constructors, leaving
assignment as is.

[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type

2018-05-10 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899

--- Comment #9 from Antony Polukhin  ---
There's an identical issue for clang:
https://bugs.llvm.org/show_bug.cgi?id=37329

During review of that issue Richard Smith noted that the solution could be made
more generic by adding `__restrict` for `this` for any constructor (not just
copy and move constructors).

Does the violation of noalias in GCC could be treated as unspecified behavior
or is it undefined?

[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type

2018-05-10 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899

--- Comment #11 from Antony Polukhin  ---
Seems perfect https://godbolt.org/g/GX3GQd
The mov is not generated for any constructor and the following code:

extern struct A a;
struct A {
  int m, n;
  A(const A &v);
};

A::A(const A &v) : m(v.m), n((a.m = 1, v.m)) {}

Is not optimized to "A::A(int, const A &v) : m(v.m), n(v.m) { a.m = 1; }"
(which is a mistake).


Are there some tests to make sure that the `mov` won't appear again?

[Bug c++/82899] *this in constructors could not alias with reference input parameters of the same type

2018-05-10 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899

--- Comment #12 from Antony Polukhin  ---
(In reply to Marc Glisse from comment #10)
> This seems fixed in 8.1 (at least we don't generate the extra mov anymore),
> can you check?

Actually it still does not work for subobjects. For example
https://godbolt.org/g/zPha3U

Code 

struct array {
int d[2];
};

struct test {
array data1;
array data2;

test(const array& t);
};

test::test(const array& t)
: data1{t}
, data2{t}
{}

produces assembly

test::test(array const&):
  mov rax, QWORD PTR [rsi]
  mov QWORD PTR [rdi], rax
  mov rax, QWORD PTR [rsi]   <== Not required. Could not alias
  mov QWORD PTR [rdi+8], rax
  ret

[class.ctor] paragraph 14 also covers this case:

"During the construction of an object, if the value of the object *or any of
its subobjects* is accessed through a glvalue that is not obtained, directly or
indirectly, from the constructor’s this pointer, the value of the object or
subobject thus obtained is unspecified."

Looks like not only `this` should be marked with __restrict, but also all the
subobjects of the type.

[Bug c++/85747] New: suboptimal code without constexpr

2018-05-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747

Bug ID: 85747
   Summary: suboptimal code without constexpr
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following code snippet:

// Bubble-like sort. Anything complex enough will work
template 
constexpr void sort(It first, It last) {
for (;first != last; ++first) {
auto it = first;
++it;
for (; it != last; ++it) {
if (*it < *first) {
auto tmp = *it;
*it = *first;
*first = tmp;
}
}
}
}

static int generate() {
int a[7] = {3, 7, 4, 2, 8, 0, 1};
sort(a + 0, a + 7);
return a[0] + a[6];
}

int no_constexpr() {
return generate();
}



Above code generates ~30 assembly instructions instead of just generating:

no_constexpr():
  mov eax, 8
  ret



But if we change `static` to `constexpr` then the compiler will optimize the
code correctly.

Could the compiler detect that `a[7]` holds values known at compile time and
force the constexpr on `sort(a + 0, a + 7);`? Could the compiler detect that
the function `generate()` is an `__attribute__((const))` function without
arguments and fully evaluate it's body?

[Bug c++/85747] suboptimal code without constexpr

2018-05-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747

--- Comment #3 from Antony Polukhin  ---
(In reply to Richard Biener from comment #1)
> What's the reason for writing the code as you pasted it?

I've tried to provide a simplified case. In real world `generate()` function
will have some arguments and depending on those it could be either constexpr
evaluated or not.

There's plenty of pre C++14 code that is not well maintained and does not use
constexpr a lot, but functions could be treated and evaluated as constexpr in
C++14.

Main reason for this ticket - is to have some out-of-the-box speedup for such
legacy code. Function without arguments seemed to be a good place to start.

[Bug c++/85747] suboptimal code without constexpr

2018-05-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747

--- Comment #4 from Antony Polukhin  ---
(In reply to Marc Glisse from comment #2)
> (In reply to Antony Polukhin from comment #0)
> > Could the compiler detect that `a[7]` holds values known at compile time and
> > force the constexpr on `sort(a + 0, a + 7);`?
> 
> There has to be a limit. If I write a program that computes the trillion's
> decimal of pi, this is a constant, do you expect the compiler to evaluate
> the whole program and compile it to just return cst? We are moving into a
> realm where we would want to mix compilation and execution, sort of JIT.
> For smaller functions, some heuristics could be used to try compile-time
> evaluation, but sorting an array of size 7 already seems large to me.

Does providing some kind of -Oon-the-fly switch solves the issue with JIT
compile times while still allows more optimizations for the traditional non JIT
 -O2 builds?

[Bug c++/85747] suboptimal code without constexpr

2018-05-14 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747

--- Comment #7 from Antony Polukhin  ---
(In reply to Jakub Jelinek from comment #6)
> IMHO just use constexpr if you care about compile time evaluation
> guarantees, that is what it has been added for.

Fair point. Overcomplicated logic on the frontend does not seem right. But from
my (not experienced) point of view there are some low hanging fruits here.

I assume that frontend uses some kind of `__builtin_constant_p` to distinguish
between constexpr evaluation or not. Adjusting that function slightly could
produce better code out of the box on some optimization levels:

static int generate() {
int a[7] = {3, 7, 4, 2, 8, 0, 1};
static_assert(
__builtin_constant_p(a + 0),
"Immediate usage of variable initialized by constant should be a
constant expression"
);
sort(a + 0, a + 7); // __builtin_constant_p returns `true` => constexpr
call
static_assert(
__builtin_constant_p(a + 0),
"Value after constexpr function call should be a constant"
);
return a[0] + a[6];
}

[Bug middle-end/91174] New: Suboptimal code for arithmetic with bool

2019-07-15 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91174

Bug ID: 91174
   Summary: Suboptimal code for arithmetic with bool
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

int test (bool x) {
return '0' + x;
}


For the above snippet the following suboptimal assembly is generated:


test(bool):
  movzx eax, dil
  add eax, 48
  ret


More efficient assembly would be:

test(bool):
  lea eax, [rdi + 48]
  ret

[Bug target/91174] Suboptimal code for arithmetic with bool and char

2019-07-16 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91174

--- Comment #2 from Antony Polukhin  ---
(In reply to Florian Weimer from comment #1)
> For which ABI do you propose the change?  It's not correct for GNU/Linux:

As far as I understand the proposed change does not touch ABI. `lea eax, [rdi +
48]` is equivalent to `movzx+add`

[Bug target/91174] Suboptimal code for arithmetic with bool and char

2019-07-16 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91174

--- Comment #4 from Antony Polukhin  ---
Sorry, now I understood that the bug is invalid. Please close.

[Bug c++/91329] New: Unnecessary call to __cxa_throw_bad_array_new_length

2019-08-02 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91329

Bug ID: 91329
   Summary: Unnecessary call to __cxa_throw_bad_array_new_length
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

For the code

int* test(int i) {
return new int[i];
}

The following assembly is generated:

test(int):
  movsx rdi, edi
  sub rsp, 8
  movabs rax, 2305843009213693950
  cmp rdi, rax
  ja .L2
  sal rdi, 2
  add rsp, 8
  jmp operator new[](unsigned long)
test(int) [clone .cold]:
.L2:
  call __cxa_throw_bad_array_new_length


However the `i * sizeof(int)` can not be greater than `2305843009213693950`. So
the checks should be skipped.

Optimal assembly should look close to:

test(int):
  movsx rdi, edi
  sal rdi, 2
  jmp operator new[](unsigned long)

[Bug c++/91329] Unnecessary call to __cxa_throw_bad_array_new_length

2019-08-02 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91329

--- Comment #1 from Antony Polukhin  ---
Ops, sorry. This is invalild. `i` could be negative.

Please close as invalid

[Bug middle-end/91358] New: Wrong code with dynamic allocation and optional like class

2019-08-05 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91358

Bug ID: 91358
   Summary: Wrong code with dynamic allocation and optional like
class
   Product: gcc
   Version: 9.1.1
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The issue is reproduced on GCCs from 5 to 9 with -O2 and -std=c++11. GCC-10
also generates wrong code with -O2 -std=c++11 -fno-allocation-dce.

Source code:

template
struct optional {
  optional() : m_initialized(false) {}

  ~optional() {
if (m_initialized)
  reinterpret_cast(m_storage).~T();
  }

  bool m_initialized;
  alignas(T) unsigned char m_storage[sizeof(T)];
};

struct NoPtr1 {
  void *ptr = nullptr;

  ~NoPtr1() {
if (ptr) {
  __builtin_abort();
}
  }
};

static void test(optional ) noexcept {
  delete new unsigned;
}

void process(optional state) {
  return test(state);
}

int main() {
  process({});
}


The above code generates a conditional jump that depends on uninitialised
value. valgrind complains:
==13823==at 0x4007B2: ~NoPtr1 (main.cpp:18)
==13823==by 0x4007B2: ~optional (main.cpp:7)
==13823==by 0x4007B2: process(optional) (main.cpp:29)
==13823==by 0x40067F: main (main.cpp:33)


Running the example under GDB confirms that the destructor of NoPtr1 is called:

(gdb) break main.cpp:18
Breakpoint 1 at 0x400686: main.cpp:18. (2 locations)
(gdb) r
Breakpoint 1, NoPtr1::~NoPtr1 (this=, __in_chrg=)
at main.cpp:18
18  if (ptr) {
(gdb) bt
#0  NoPtr1::~NoPtr1 (this=, __in_chrg=) at
main.cpp:18
#1  optional::~optional (this=, __in_chrg=) at main.cpp:7
#2  process (state=...) at main.cpp:29
#3  0x00400680 in main () at main.cpp:33
(gdb) disassemble 
Dump of assembler code for function process(optional):
   0x00400790 <+0>: push   %rbp
   0x00400791 <+1>: push   %rbx
   0x00400792 <+2>: sub$0x8,%rsp
   0x00400796 <+6>: mov0x8(%rdi),%rbx
   0x0040079a <+10>:movzbl (%rdi),%ebp
   0x0040079d <+13>:mov$0x4,%edi
   0x004007a2 <+18>:callq  0x400600 <_Znwm@plt>
   0x004007a7 <+23>:mov%rax,%rdi
   0x004007aa <+26>:callq  0x4005f0 <_ZdlPv@plt>
=> 0x004007af <+31>:test   %rbx,%rbx
   0x004007b2 <+34>:je 0x4007b9 )+41>
   0x004007b4 <+36>:test   %bpl,%bpl
   0x004007b7 <+39>:jne0x4007c0 )+48>
   0x004007b9 <+41>:add$0x8,%rsp
   0x004007bd <+45>:pop%rbx
   0x004007be <+46>:pop%rbp
   0x004007bf <+47>:retq   
   0x004007c0 <+48>:callq  0x4005e0 

[Bug middle-end/91358] Wrong code with dynamic allocation and optional like class

2019-08-06 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91358

--- Comment #2 from Antony Polukhin  ---
(In reply to Michael Matz from comment #1)
> So, if you've seen a real problem somewhere (and not just valgrind
> complaining about uninitialized registers in comparisons),
> then you've reduced the testcase too much.

The original test case was not hitting the abort. Only the valgrind was
complaining. Original test case uses boost::variant, boost::optional and
std::vector, so it's quite hard to analyze. Part of the assembly with two
checks after the delete looks quite the same.

Valgrind complains are distracting. GDB entering the destructor is missleading.
Is there a simple way to change the GCC codegen to avoid the issue and not
affect performance?

Otherwise, is there some kind of a pattern that valgrind/gdb could detect to
avoid false positives?

[Bug middle-end/91358] Wrong code with dynamic allocation and optional like class

2019-08-08 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91358

--- Comment #6 from Antony Polukhin  ---
(In reply to Michael Matz from comment #3)
> I don't really see any, no good idea here :-/

How about moving all the optimizations based on reading uninitialized values
under a flag like -funinitialized-logic, so that users could build with -O2
-fno-uninitialized-logic ?

[Bug target/91681] New: Missed optimization for 128 bit arithmetic operations

2019-09-06 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681

Bug ID: 91681
   Summary: Missed optimization for 128 bit arithmetic operations
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the function:

void multiply128x64x2_3 ( 
const unsigned long a, 
const unsigned long b, 
const unsigned long c, 
const unsigned long d, 
__uint128_t o[2]
  ) noexcept
{
__uint128_t B0 = __uint128_t{ b } * c;
__uint128_t B2 = __uint128_t{ a } * c;
__uint128_t B1 = __uint128_t{ b } * d;
__uint128_t B3 = __uint128_t{ a } * d;

o[0] = B2 + (B0 >> 64);
o[1] = B3 + (B1 >> 64);
}


With compilation flags "-O2 -std=c++17 -mavx" the following assembly is
produced:

multiply128x64x2_3(unsigned long, unsigned long, unsigned long, unsigned long,
unsigned __int128*):
  mov rax, rdx
  push rbx
  mov rbx, rdx
  mov r9, rdi
  mul rsi
  mov rax, rdx
  xor edx, edx
  mov r10, rax
  mov rax, rbx
  mov r11, rdx
  pop rbx
  mul rdi
  add rax, r10
  adc rdx, r11
  mov QWORD PTR [r8], rax
  mov rax, rsi
  xor edi, edi
  mov QWORD PTR [r8+8], rdx
  mul rcx
  mov rax, rcx
  mov rsi, rdx
  mul r9
  add rsi, rax
  adc rdi, rdx
  mov QWORD PTR [r8+16], rsi
  mov QWORD PTR [r8+24], rdi
  ret

However, it is sub-optimal. Touching the stack is not necessary and the same
result could be achieved with less instructions:

multiply128x64x2_3(unsigned long, unsigned long, unsigned long, unsigned long,
unsigned __int128*):
  mov r9, r8
  mov r8, rdx
  mov rax, rsi
  mul r8
  mov rax, r8
  mov r10, rdx
  mul rdi
  add r10, rax
  mov rax, rsi
  mov QWORD PTR [r9], r10
  adc rdx, 0
  mov QWORD PTR [8+r9], rdx
  mul rcx
  mov rax, rdi
  mov r11, rdx
  mul rcx
  add r11, rax
  mov QWORD PTR [16+r9], r11
  adc rdx, 0
  mov QWORD PTR [24+r9], rdx
  ret

[Bug middle-end/91709] New: Missed optimization for multiplication on 1.5 and 1.25

2019-09-09 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91709

Bug ID: 91709
   Summary: Missed optimization for multiplication on 1.5 and 1.25
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

On x86_32 for any number X of type (unsigned, unsigned short, unsigned char)
multiplication by 1.5 with a conversion back to unsigned with any rounding mode
produces the exactly same result as if X + (X >> 1).

Same holds for 1.25:
unsigned(X * 1.25) == unsigned(X + (X >> 2))

The above transformation allows to emit a short code without floating point
computations:

test2(unsigned int):
  mov eax, edi
  shr eax
  add eax, edi
  ret


Instead of:
test(unsigned int):
  movl %edi, %edi
  pxor %xmm0, %xmm0
  cvtsi2sdq %rdi, %xmm0
  mulsd .LC0(%rip), %xmm0
  cvttsd2siq %xmm0, %rax
  ret
.LC0:
  .long 0
  .long 1073217536

[Bug middle-end/91709] Missed optimization for multiplication on 1.5 and 1.25

2019-09-09 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91709

--- Comment #1 from Antony Polukhin  ---
Godbolt playground: https://godbolt.org/z/rHQj2w

[Bug middle-end/91709] Missed optimization for multiplication on 1.5 and 1.25

2019-09-10 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91709

--- Comment #3 from Antony Polukhin  ---
(In reply to jos...@codesourcery.com from comment #2)
> If the result of multiplying by 1.5 is outside the range of the integer 
> type, the version with multiplication is required to raise the FE_INVALID 
> exception for the out-of-range conversion to integer

My reading of the C++ standard tells that such conversion is an undefined
behavior: http://eel.is/c++draft/conv.fpint#1

Is it really required to raise FE_INVALID ?

[Bug target/91721] New: Missed optimization for checking nan and comparison

2019-09-10 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91721

Bug ID: 91721
   Summary: Missed optimization for checking nan and comparison
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

int doubleToString_0(double a) {
if ( __builtin_isnan( a ) )
return 1;
else if ( a == 0. )
return 2;

return 3;
}

A suboptimal assembly with two `ucomisd` is generated for the above sample:

doubleToString_0(double):
  ucomisd xmm0, xmm0
  jp .L4
  ucomisd xmm0, QWORD PTR .LC0[rip]
  jnp .L8
.L5:
  mov eax, 3
  ret
.L8:
  jne .L5
  mov eax, 2
  ret
.L4:
  mov eax, 1
  ret
.LC0:
  .long 0
  .long 0


More optimal solution would be to do only the second `ucomisd` and check flags
for a NaN:

doubleToString_0(double):
  pxor xmm1, xmm1
  ucomisd xmm0, xmm1
  jp .L4
  je .L8
.L5:
  mov eax, 3
  ret
.L8:
  mov eax, 2
  ret
.L4:
  mov eax, 1
  ret

[Bug middle-end/91739] New: Missed optimization for arithmetic operations of integers and floating point constants

2019-09-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91739

Bug ID: 91739
   Summary: Missed optimization for arithmetic operations of
integers and floating point constants
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

double foo(unsigned i, unsigned j) {
  return i * 4.0 + j * 7.0;
}

Right now GCC emits code that converts integers to a floating points and does
the multiplications:

foo(unsigned int, unsigned int): # @foo(unsigned int, unsigned int)
  mov eax, edi
  cvtsi2sd xmm1, rax
  mulsd xmm1, qword ptr [rip + .LCPI0_0]
  mov eax, esi
  cvtsi2sd xmm0, rax
  mulsd xmm0, qword ptr [rip + .LCPI0_1]
  addsd xmm0, xmm1
  ret

However it is possible to do better. If the max value of integer multiplied by
the floating point constant fits into the mantissa and there is an integral
type that could also hold the value then do the multiplication using integers:

double foo2(unsigned i, unsigned j) {
  return i * 4ull + j * 7ull;
}

This results in a much better code:

foo2(unsigned int, unsigned int): # @foo2(unsigned int, unsigned int)
  mov eax, edi
  mov ecx, esi
  lea rdx, [8*rcx]
  sub rdx, rcx
  lea rax, [rdx + 4*rax]
  cvtsi2sd xmm0, rax
  ret

[Bug middle-end/91866] New: Sign extend of an int is not recognized

2019-09-23 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91866

Bug ID: 91866
   Summary: Sign extend of an int is not recognized
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


using size_t = unsigned long long;

size_t index0(int i) {
return size_t(i + 1) - 1;
}


GCC generates the following assembly:

index0(int):
  lea eax, [rdi+1]
  cdqe
  sub rax, 1
  ret


However a more optimal assembly is possible:

index0(int): # @index0(int)
  movsxd rax, edi
  ret

Godbolt playground: https://godbolt.org/z/3j7_SE

[Bug middle-end/91881] New: Value range knowledge of higher bits not used in optimizations

2019-09-24 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91881

Bug ID: 91881
   Summary: Value range knowledge of higher bits not used in
optimizations
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

unsigned long long sample2(unsigned long long m) {
if (m >= 100) __builtin_unreachable();
m *= 16;
return m >> 3;
}

After the `if` statement we do know that the higher bits are set to 0. So
instead of generating the following assembly:

sample2(unsigned long long):
  mov rax, rdi
  sal rax, 4
  shr rax, 3
  ret

A more optimal assembly could be generated:

sample2(unsigned long long):
  lea rax, [rdi + rdi]
  ret


Godbolt playground: https://godbolt.org/z/1iSpTh

P.S.: that optimization is important for std::to_chars(..., double) like
functions, where a significant of a double is extracted into an unsigned long
long variable, so its upper bits are always zero.

[Bug middle-end/91883] New: Division by a constant could be optimized for known variables value range

2019-09-24 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91883

Bug ID: 91883
   Summary: Division by a constant could be optimized for known
variables value range
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

unsigned long long kBorder = (1ull<<62);

unsigned long long sample(unsigned long long m) {
if (m >= kBorder) __builtin_unreachable();
return m / 10;
}

It produces the following assembly:

sample(unsigned long long):
  movabs rdx, -3689348814741910323
  mov rax, rdi
  mul rdx
  mov rax, rdx
  shr rax, 3
  ret

However, knowing that the higher bits are always 0, the constant could be
adjusted to avoid the `shr rax, 3`:

sample(unsigned long long):
  movabs rax, 1844674407370955162
  mul rdi
  mov rax, rdx
  ret

Godbolt playground: https://godbolt.org/z/YU2yAC

This issue is probably related to PR 91881

P.S.: that optimization is important for std::to_chars(..., double) like
functions, where a significant of a double is extracted into an unsigned long
long variable, so its upper bits are always zero.

[Bug middle-end/91899] New: Merge constant literals

2019-09-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91899

Bug ID: 91899
   Summary: Merge constant literals
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

static const char data1[] = {'t','e','s','t'};
static const char data2[] = "test test";

bool index0(const char* cmp) {
return cmp == data1 || cmp == data2;
}

Right now GCC generates suboptimal assembly:

index0(char const*):
  mov eax, offset data1
  cmp rdi, rax
  sete cl
  mov eax, offset data2
  cmp rdi, rax
  sete al
  or al, cl
  ret
data1:
  .ascii "test"

data2:
  .asciz "test test"

A more efficient way to generate the code is to merge `data1` and `data2`:

index0(char const*):
  mov eax, offset data
  cmp rdi, rax
  sete al
  ret
data:
  .ascii "test test"


Constant literals merging significantly reduces binary size and cache misses.

[Bug middle-end/91899] Merge constant literals

2019-09-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91899

--- Comment #1 from Antony Polukhin  ---
Godbolt playground: https://godbolt.org/z/UA_Xsm

[Bug middle-end/91899] Merge constant literals

2019-09-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91899

--- Comment #4 from Antony Polukhin  ---
(In reply to Alexander Monakov from comment #3)
> unless the compiler somehow proves that overlap is not
> observable?

Oh, now I see. Here's a valid example:

static const char data1[] = "test";
static const char data2[] = "test test";
char lookup1(int i) { return data1[i]; }
char lookup2(int i) { return data2[i]; }


data1/2 are internal linkage symbols and pointers to them or their content are
not returned or passed to any other function. So the overlap is not observable.

The above case could be found in std::to_chars, where different internal
functions have overlapping `static constexpr char __digits[]` arrays.

[Bug middle-end/91981] New: Speed degradation because of inlining a register clobbering function

2019-10-03 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981

Bug ID: 91981
   Summary: Speed degradation because of inlining a register
clobbering function
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example that is a simplified version of
boost::container::small_vector:


#define MAKE_INLINING_BAD 1

struct vector {
int* data_;
int* capacity_;
int* size_;

void push_back(int v) {
if (capacity_ > size_) {
*size_ = v;
++size_;
} else {
reallocate_and_push(v);
}
}

void reallocate_and_push(int v)
#if MAKE_INLINING_BAD
{
// Just some code that clobbers many registers.
// You may skip reading it
const auto old_cap = capacity_ - data_; 
const auto old_size = capacity_ - size_; 
const auto new_cap = old_cap * 2 + 1;

auto new_data_1 = new int[new_cap];
auto new_data = new_data_1;
for (int* old_data = data_; old_data != size_; ++old_data, ++new_data)
{
*new_data = *old_data;
}

delete[] data_;
data_ = new_data_1;
size_ = new_data_1 + old_size;
capacity_ = new_data_1 + new_cap;

*size_ = v;
++size_;
}
#else
;
#endif
};

void bad_inlining(vector& v) {
v.push_back(42);
}


With `#define MAKE_INLINING_BAD 0` the generated code is quite good:

bad_inlining(vector&):
  mov rax, QWORD PTR [rdi+16]
  cmp QWORD PTR [rdi+8], rax
  jbe .L2
  mov DWORD PTR [rax], 42
  add rax, 4
  mov QWORD PTR [rdi+16], rax
  ret
.L2:
  mov esi, 42
  jmp vector::reallocate_and_push(int)

However, with `#define MAKE_INLINING_BAD 1` the compiler decides to inline the
`reallocate_and_push` function that clobbers many registers. So the compiler
stores the values of those registers on the stack before doing the cmp+jbe:

bad_inlining(vector&):
  push r13 ; don't need those for the `(capacity_ > size_)` case
  push r12 ; likewise
  push rbp ; likewise
  push rbx ; likewise
  mov rbx, rdi ; likewise
  sub rsp, 8   ; likewise
  mov rdx, QWORD PTR [rdi+8]
  mov rax, QWORD PTR [rdi+16]
  cmp rdx, rax
  jbe .L2
  mov DWORD PTR [rax], 42
  add rax, 4
  mov QWORD PTR [rdi+16], rax
  add rsp, 8 ; don't need those for the `(capacity_ > size_)` case
  pop rbx ; likewise
  pop rbp ; likewise
  pop r12 ; likewise
  pop r13 ; likewise
  ret
.L2: 
  ; vector::reallocate_and_push(int) implementation goes here

This greatly degrades the performance of the first branch (more than x3
degradation in real code).


The possible fix would be to place all the push/pop operations near the inlined
`reallocate_and_push`:

bad_inlining(vector&):
  mov rax, QWORD PTR [rdi+16]
  cmp QWORD PTR [rdi+8], rax
  jbe .L2
  mov DWORD PTR [rax], 42
  add rax, 4
  mov QWORD PTR [rdi+16], rax
  ret
.L2: 
  push r13
  push r12
  push rbp
  push rbx
  mov rbx, rdi
  sub rsp, 8
  ; vector::reallocate_and_push(int) implementation goes here
  add rsp, 8
  pop rbx
  pop rbp
  pop r12
  pop r13
  ret

Godbolt playground: https://godbolt.org/z/oDutOd

[Bug rtl-optimization/91981] Speed degradation because of inlining a register clobbering function

2019-10-04 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981

--- Comment #4 from Antony Polukhin  ---
It was broken in GCC-9, GCC-8.3 and below do not have this issue.

[Bug c++/92053] New: Compilation fails or succeeds depending on the optimization flags

2019-10-10 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92053

Bug ID: 92053
   Summary: Compilation fails or succeeds depending on the
optimization flags
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: accepts-invalid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following code:

#include 
#include 
#include 

struct widget;
using variant_t = std::variant;

struct my_func {
my_func(variant_t&& arg) {
std::make_unique(std::move(arg));
}
};

struct widget {};
my_func f({});


With `-std=c++2a -O0` it compiles. With `-std=c++2a -O2` it fails on a static
assert in instantiation of 'struct std::is_default_constructible'. 

Godbolt playground: https://godbolt.org/z/-d26aG

[Bug c++/92054] New: `final` does not cause devirtualization of nested calls

2019-10-10 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92054

Bug ID: 92054
   Summary: `final` does not cause devirtualization of nested
calls
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


struct A {
virtual int f() { return 0; }
virtual int g() { return f() + 40; } 
};

struct B2 final : A {
int f() override { return 42; }
};

int test(B2& b) {
return b.g();
}


GCC-10 generates the assembly that does a fair vptr call. However, `B2` is
final, so any call to the virtual functions of `A` end up with a call to the
same function in `B2`. So `B2::g()` should inline the `A::g()` and get
optimized to:

int test(B2& b) { return B2::f() + 40; }


Which is just 82, because `B2::f()` always returns 42.

Godbolt playground: https://godbolt.org/z/PJ4nL-

[Bug c++/92053] Compilation fails or succeeds depending on the optimization flags

2019-10-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92053

Antony Polukhin  changed:

   What|Removed |Added

   Keywords|needs-reduction,|accepts-invalid
   |rejects-valid   |

--- Comment #2 from Antony Polukhin  ---
Reduced version. Note that Clang reduces to compile it with any -O, while GCC
is fine with it on -O0 https://godbolt.org/z/yTM0a4 :


template 
struct unique_ptr {
_Tp* pointer_{};

explicit unique_ptr(_Tp* __p) noexcept
: pointer_(__p) {}

~unique_ptr() noexcept {
delete pointer_;
}

unique_ptr(const unique_ptr&) = delete;
unique_ptr& operator=(const unique_ptr&) = delete;
};

namespace my {
template
unique_ptr<_Tp> make_unique(_Args&&... __args)
{ return unique_ptr<_Tp>(new _Tp(static_cast<_Args&&>(__args)...)); }

  template
constexpr Target move(_Tp&& __t) noexcept
{ return static_cast(__t); }
}

struct widget;

template 
struct my_variant_impl  {
T value;
my_variant_impl() = default;
my_variant_impl(T&& val) : value(val) {};
};

template 
struct my_variant
: my_variant_impl 
{};

using variant_t = my_variant;

struct my_func {
my_func(variant_t&& arg) {
my::make_unique(my::move(arg));
}
};

struct widget {};
my_func f({});

[Bug c++/92067] New: __is_constructible(incomplete_type) should make the program ill-formed

2019-10-11 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92067

Bug ID: 92067
   Summary: __is_constructible(incomplete_type) should make the
program ill-formed
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Such change brings consistency with Clang and other built-in traits:


struct incomplete;

// fails on clang, OK on GCC
const bool res = __is_constructible(incomplete);

// GCC: invalid use of incomplete type 'struct incomplete'
const bool res0 = __is_trivial(incomplete);

// GCC: invalid use of incomplete type 'struct incomplete'
const bool res1 = __is_final(incomplete);


Godbolt playground: https://godbolt.org/z/GVX7mK

[Bug c++/82019] [concepts] ICE if concept is not satisfied

2019-10-15 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82019

--- Comment #2 from Antony Polukhin  ---
Updated version of the test (works well on GCC 10):


// { dg-options "-std=c++2a" }

template 
concept VectorOperations = requires(T& v, const Data& data) {
v += data;
requires __is_same_as(T&, decltype(v += data));
};

template 
requires VectorOperations
void compute_vector_optimal(Container& , const Data& ) {}

int main() {
unsigned v1[] = {1,2,3};
compute_vector_optimal(v1, v1); // { dg-error "cannot call function" }
}

[Bug libstdc++/83754] New: Segmentation fault in regex_search

2018-01-09 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83754

Bug ID: 83754
   Summary: Segmentation fault in regex_search
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following code 

#include 
#include 

int main() {
  std::regex pattern("\\w+\\.");
  std::string s(100, 'a');
  return std::regex_search(s, pattern);
}


leads to segmentation fault. Backtrace reports the following:

#1  0x004174a2 in std::_Function_handler, false, false>
>::_M_invoke(std::_Any_data const&, char&&) ()
#2  0x00415544 in std::function::operator()(char) const ()
#3  0x00411222 in std::__detail::_State::_M_matches(char) const
()
#4  0x0040cde3 in std::__detail::_Executor::_M_handle_match

#5  0x00409cb0 in std::__detail::_Executor::_M_dfs
#6  0x00411656 in std::__detail::_Executor::_M_rep_once_more
#7  0x0040ca05 in std::__detail::_Executor::_M_handle_repeat
<...>
#11350  0x00409cb0 in std::__detail::_Executor::_M_dfs
#11351  0x00411656 in std::__detail::_Executor::_M_rep_once_more
#11352  0x0040ca05 in std::__detail::_Executor::_M_handle_repeat
<...>


This issue could be related to the bug 79539

[Bug c++/84099] New: Dynamic initialization is performed in case when constant initialization is permitted

2018-01-29 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84099

Bug ID: 84099
   Summary: Dynamic initialization is performed in case when
constant initialization is permitted
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following code 

struct foo {
const char* data_;
unsigned size_;

foo(const char* data, unsigned size) noexcept
: data_(data)
, size_(size)
{}
};

foo test() {
static const foo v{"Hello", 5};
return v;
}


Produces disassembly with dynamic initialization of the `v` variable. However
in this case C++ Standard permits constant initialization:

"An implementation is permitted to perform the initialization of a variable
with static or thread storage duration as a static initialization even if such
initialization is not required to be done statically, provided that

— the dynamic version of the initialization does not change the value of any
other object of static or thread storage duration prior to its initialization,
and
— the static version of the initialization produces the same value in the
initialized variable as would be produced by the dynamic initialization if all
variables not required to be initialized statically were initialized
dynamically.
"

Optimal assembly would look like

.LC0:
  .string "Hello"
test():
  mov eax, OFFSET FLAT:.LC0
  mov edx, 5
  ret

[Bug c++/84103] New: Dynamic initialization is performed for non-local variables in case when constant initialization is permitted

2018-01-29 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84103

Bug ID: 84103
   Summary: Dynamic initialization is performed for non-local
variables in case when constant initialization is
permitted
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Following code 

struct foo {
const char* data_;
unsigned size_;

foo(const char* data, unsigned size) noexcept
: data_(data)
, size_(size)
{}
};

extern const foo v{"Hello", 5};


Produces assembly with dynamic initialization:

.LC0:
  .string "Hello"
_GLOBAL__sub_I_v:
  mov QWORD PTR v[rip], OFFSET FLAT:.LC0
  mov DWORD PTR v[rip+8], 5
  ret
v:
  .zero 16


However in this case C++ Standard permits constant initialization:

"An implementation is permitted to perform the initialization of a variable
with static or thread storage duration as a static initialization even if such
initialization is not required to be done statically, provided that

— the dynamic version of the initialization does not change the value of any
other object of static or thread storage duration prior to its initialization,
and
— the static version of the initialization produces the same value in the
initialized variable as would be produced by the dynamic initialization if all
variables not required to be initialized statically were initialized
dynamically.
"

Optimal assembly would look like the following

v:
  .quad .L.str
  .long 5 # 0x5
  .zero 4

.L.str:
  .asciz "Hello"

(clang produces the code from above)

Bug 84099 may be related to this one. That bug is about local variables
initialization, this bug is about non-local variables.

[Bug middle-end/84147] New: RTTI for base class in anonymous namespace could be avoided

2018-01-31 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84147

Bug ID: 84147
   Summary: RTTI for base class in anonymous namespace could be
avoided
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example

namespace {
struct base {
virtual int foo() noexcept {return 1;}
};
}

struct derived1 final : base {};
struct derived2 final : base {};

struct pair {
derived1 d1;
derived2 d2;
};

pair test() {
return {};
}


`base` is in the anonymous namespace (has internal linkage) and used only for
providing some functions to derived classes. There are no complex inheritances,
there are no dynamic_casts and typeid(base) calls.

RTTI for base class seems useless in that case, but it is still generated in
the assembly:


  .type typeinfo for (anonymous namespace)::base, @object
  .size typeinfo for (anonymous namespace)::base, 16
typeinfo for (anonymous namespace)::base:
  .quad vtable for __cxxabiv1::__class_type_info+16
  .quad typeinfo name for (anonymous namespace)::base
  .align 16
  .type typeinfo name for (anonymous namespace)::base, @object
  .size typeinfo name for (anonymous namespace)::base, 23
typeinfo name for (anonymous namespace)::base:
  .string "*N12_GLOBAL__N_14baseE"

[Bug c++/84306] New: Wrong overload selected with -std=c++17, explicit and {}

2018-02-09 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84306

Bug ID: 84306
   Summary: Wrong overload selected with -std=c++17, explicit and
{}
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Following code uses function (1), however function (2) must be used.

struct foo {
foo() = default;

foo(foo const&);   // (1)

template
explicit foo(T&&); // (2)
};

int main() {
foo f1;
foo f2{f1};  // (1) - wrong, must be (2)
}


The compiler chooses the right function if 'explicit' is removed, or if '{f1}'
is replaced with '(f1)', or if -std=c++17 option is changed to -std=c++14.

[Bug c++/89301] New: [concepts] requires clause on a template alias is ignored

2019-02-12 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89301

Bug ID: 89301
   Summary: [concepts] requires clause on a template alias is
ignored
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following code compiles however it should not:

template
requires(condition)
using enable_if_t = T;

template>
void foo();

void test() {
foo();
}


Slightly changed example also compiles on GCC (but fails to compile on Clang):

template
requires(condition)
using enable_if_t = T;

template
enable_if_t foo();

void test() {
foo();
}

[Bug libgcc/89625] New: Freeing memory under the lock in __deregister_frame_info_bases

2019-03-07 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89625

Bug ID: 89625
   Summary: Freeing memory under the lock in
__deregister_frame_info_bases
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

`__deregister_frame_info_bases` in file unwind-dw2-fde.c calls `free
(ob->u.sort);` under the locked `object_mutex`. This can be avoided by
remembering the pointer to free and freeing it outside the critical section.

This has been fixed in upstream glibc
https://github.com/bminor/glibc/commit/2604882cefd3281679b8177245fdebc7061b8695#diff-17235859a5d2697ce97070a69ab9a602

[Bug c++/89700] New: Warn if move constructor is not generated and not deleted

2019-03-13 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89700

Bug ID: 89700
   Summary: Warn if move constructor is not generated and not
deleted
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

It would be great to have a warning that helps to identify classes with
sub-optimal move semantics. Foe example it would be nice to have such a warning
for cases like following:

struct member {
member();
member(const member&);
member(member&&);
private:
int* data_;
};

// warninig: `my_class(const my_class&)` disables the 
// implicit move constructor generation. Use 
// `my_class(my_class&) = default;` to generate it or
// `my_class(my_class&) = delete;` to disable this warning.
struct my_class {
my_class() = default;
my_class(const my_class&);
private:
member member1;
member member2;
};

void foo(my_class c);

void test() {
my_class c;
foo(static_cast(c)); // copies
}



The rules for the warning could be following:
Issue a warning if at least one of the class members has a move constructor,
class has a copy constructor and the move constructor is not implicitly
deleted.

[Bug libstdc++/89728] New: ctype is underconstrained

2019-03-15 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89728

Bug ID: 89728
   Summary: ctype is underconstrained
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Because of that overloads from [locale.convenience] compile well with creepy
charT template arguments like std::string:

std::tolower(std::string{}, std::locale::classic());


That leads to runtime exceptions (bad cast to ctype>)
instead of a compile time.


Some other standard library implementations are more restrictive and do not
allow such weird template parameters for ctype:

error: implicit instantiation of undefined template
'std::__1::ctype >'

[Bug c++/89785] New: Incorrect "not a constant expression" error with switch statement that returns

2019-03-21 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89785

Bug ID: 89785
   Summary: Incorrect "not a constant expression" error with
switch statement that returns
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following code fails to compile:

constexpr int Addrlen(int domain) {
switch (domain) {
  case 0:
return 0;
  case 2:
return 42;
}
throw 42;
}

Error message is following:

: In function 'constexpr int Addrlen(int)':
:8:11: error: expression '' is not a constant
expression
8 | throw 42;
  |   ^~

[Bug c++/89785] Incorrect "not a constant expression" error with switch statement that returns

2019-03-21 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89785

--- Comment #2 from Antony Polukhin  ---
> So you say that Addrlen(0) and Addrlen(2) are proper constexprs?  Of course
Addrlen(1) is not.

Yes. But GCC does not even allow to define the Addrlen function:
https://godbolt.org/z/xqR2Lr

[Bug libstdc++/89816] New: [9 Regression] std::variant move construction regressed since GCC 8.3

2019-03-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89816

Bug ID: 89816
   Summary: [9 Regression] std::variant move construction
regressed since GCC 8.3
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following code


#include 

struct my_type{
my_type(my_type&&) noexcept;
};

using V1 = std::variant;
V1 test1(V1 v ) { return v; }


Was producing a jump table of size 5 on GCC 8.3. GCC 9 produces huge jump
tables with over 30 entries. This leads to 3 times bigger binaries with GCC 9.
https://godbolt.org/z/SUWL5T

[Bug c++/89700] Warn if move constructor is not generated and not deleted

2019-03-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89700

--- Comment #6 from Antony Polukhin  ---
Another way to workaround the warning is to use something like
`my_class(my_class&) requires false;`. That's too ugly to use.

I'd be fine with closing this issue as a 'won't fix'.

[Bug libstdc++/89816] [9 Regression] std::variant move construction regressed since GCC 8.3

2019-03-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89816

--- Comment #6 from Antony Polukhin  ---
The fix seems pretty trivial: in function `__variant_construct` get the address
of the sorage before entering the `__do_visit` and make it switch only by the
`__rhs`.

Pseudo-code:

  template
void __variant_construct(_Tp&& __lhs, _Up&& __rhs)
{
  __lhs._M_index = __rhs._M_index;
  void* storage = std::addressof(__lhs._M_u);
  __do_visit([storage](auto&& __rhs_mem)
 -> __detail::__variant::__variant_cookie
{
  using _Type = remove_reference_t;
  ::new (storage)
  _Type(std::forward(__rhs_mem));
  return {};
}, __variant_cast<_Types...>(std::forward(__rhs)));
  }

[Bug libstdc++/89819] New: [9 Regression] std::variant operators regressed since GCC 8.3

2019-03-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89819

Bug ID: 89819
   Summary: [9 Regression] std::variant operators regressed since
GCC 8.3
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following code


#include 

struct my_type{};
bool operator==(const my_type&, const my_type&) noexcept;

using V1 = std::variant;
auto test1(const V1& v) { return v == v; }


Was producing a jump table of size 5 on GCC 8.3. GCC 9 produces huge jump
tables with over 30 entries. This leads to ~15 times bigger binaries with GCC 9
and ~25% compilation slowdown. https://godbolt.org/z/yoAIrP

This could be fixed by changing the `_VARIANT_RELATION_FUNCTION_TEMPLATE` from
binary visitation to unary via first checking the `index()` of `__lhs` +
`__rhs` and doing the visitation only if they match (hold the same type).

Pseudo-code:

#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__OP, __NAME) \
  template \
constexpr bool operator __OP(const variant<_Types...>& __lhs, \
 const variant<_Types...>& __rhs) \
{ \
  bool __ret = true; \
  if ((__lhs.index() + 1) != (__rhs.index() + 1)) { \
  return (__lhs.index() + 1) __OP (__rhs.index() + 1); \
  } \
  __do_visit([&__ret, &__lhs] \
 (auto&& __rhs_mem) mutable \
   -> __detail::__variant::__variant_cookie \
{ \
  using __Type = remove_reference_t; \
  if constexpr (!is_same_v< \
  __Type, \
  __detail::__variant::__variant_cookie>) \
__ret =
__detail::__variant::__get<__detail::__variant::__index_of_v<__Type,
_Types...>>(__this_mem) __OP __rhs_mem; \
  return {}; \
}, __rhs); \
  return __ret; \
} \
\
  constexpr bool operator __OP(monostate, monostate) noexcept \
{ return 0 __OP 0; }

[Bug libstdc++/89816] [9 Regression] std::variant move construction regressed since GCC 8.3

2019-03-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89816

--- Comment #9 from Antony Polukhin  ---
BTW, I think there are some other cases where binary visitation could be
simplified to unary (significantly reducing the code size and improving the
compile times). I've filled Bug 89819, but looks like assignment and swap could
be also optimized.

[Bug c++/89820] New: Returning empty type produces unnecessary instructions

2019-03-25 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89820

Bug ID: 89820
   Summary: Returning empty type produces unnecessary instructions
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following code


struct my_type_impl {};

my_type_impl foo0() { return {}; }
my_type_impl foo1() { my_type_impl tmp; return tmp; }


For `foo0` and `foo1` GCC generates the following assembly:

xor eax, eax
ret


However xoring the `eax` seems unnecessary and some of the other compilers just
generate the `ret` instruction.

The additional `xor` instruction could significantly increase the code size for
generic C++ programs. For example in Bug 89819 and Bug 89816 each of he 36 jump
table entries has that additional instruction.

[Bug libstdc++/89824] New: Variant jump table reserves space for __variant_cookie twice

2019-03-26 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89824

Bug ID: 89824
   Summary: Variant jump table reserves space for __variant_cookie
twice
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Space for the `__variant_cookie` state is already reserved in _Multi_array
`_Multi_array<_Tp, __rest...> _M_arr[__first + __do_cookie];`.

Additionally reserving it inside the `__gen_vtable` produces jump table with
gaps https://godbolt.org/z/Vx_wEU.


Fix: remove the `+ (is_same_v<_Result_type, __variant_cookie> ? 1 : 0)` from
`__gen_vtable`.


This removes zeros from jump table and slightly reduces the binary size
https://godbolt.org/z/gyo0-j

[Bug libstdc++/89825] New: Jump table for variant visitation could be shortened for never empty variants

2019-03-26 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89825

Bug ID: 89825
   Summary: Jump table for variant visitation could be shortened
for never empty variants
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The `__do_cookie` computation in `_Multi_array` seems suboptimal. There are
variant types that are never empty, so they never need the cookie value at all.
`_Variant_storage::_M_valid()` already use that knowledge to
always return `true`. The same logic could be used for the `__do_cookie`.

Pseudo-code:

+  template
+  struct _Never_empty;

+  template
+  struct _Never_empty>
+  {
+static constexpr bool _S_value = (is_trivially_copyable_v<_Types> && ...);
+  };

  template
struct _Multi_array<_Ret(*)(_Visitor, _Variants...), __first, __rest...>
{
+  static constexpr size_t __index = sizeof...(_Variants) -
sizeof...(__rest) - 1;
+  using _Variant_current = __remove_cvref_t::type>;
  static constexpr int __do_cookie =
-   is_same_v<_Ret, __variant_cookie> ? 1 : 0;
+   is_same_v<_Ret, __variant_cookie> &&
_Never_empty<_Variant_current>::_S_value ? 1 : 0;
  using _Tp = _Ret(*)(_Visitor, _Variants...);
  template
constexpr const _Tp&
_M_access(size_t __first_index, _Args... __rest_indices) const
{ return _M_arr[__first_index +
__do_cookie]._M_access(__rest_indices...); }

  _Multi_array<_Tp, __rest...> _M_arr[__first + __do_cookie];
  };


  template
static constexpr void
_S_apply_all_alts(_Array_type& __vtable,
  std::index_sequence<__var_indices...>)
{
- if constexpr (is_same_v<_Result_type, __variant_cookie>)
+ if constexpr (is_same_v<_Result_type, __variant_cookie>
+   && !_Never_empty>::_S_value)
(_S_apply_single_alt(
  __vtable._M_arr[__var_indices + 1],
  &(__vtable._M_arr[0])), ...);
  else
(_S_apply_single_alt(
  __vtable._M_arr[__var_indices]), ...);
}



The above patch reduces jump table size on up to 2*sizeof...(_Types) for binary
visitations.

[Bug libstdc++/89825] Jump table for variant visitation could be shortened for never empty variants

2019-03-26 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89825

--- Comment #1 from Antony Polukhin  ---
There's a typo in proposed solution: it should be `&& !_Never_empty` in
`_Multi_array`.

[Bug libstdc++/89825] Jump table for variant visitation could be shortened for never empty variants

2019-03-26 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89825

--- Comment #4 from Antony Polukhin  ---
> Would you be willing to complete a copyright assignment for contributions to
> GCC?

Yes, I can do that. Please send the instructions to my email.

[Bug libstdc++/89851] New: [Regression] std::variant comparison operators violate [variant.relops]

2019-03-27 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89851

Bug ID: 89851
   Summary: [Regression] std::variant comparison operators violate
[variant.relops]
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

The following function should return `false` according to [variant.relops]:


#include 

using V = std::variant;
bool test1() { 
V v1{std::in_place_index<0>, 0};
V v2{std::in_place_index<1>, 0};
return v1 == v2;
}


std::variant in GCC-8 was returning `false`, however the variant from GCC-9
returns `true`.


This could be quickly fixed by comparing indexes at the start of each operator.
Another way of fixing is to pass integral_constants instead of types into the
__do_visit function.

[Bug middle-end/89922] New: Loop on fixed size array is not unrolled and poorly optimized

2019-04-02 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922

Bug ID: 89922
   Summary: Loop on fixed size array is not unrolled and poorly
optimized
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


struct array {
   int data[5];
};

array test(int i) {
array a = {1, i, 2, 3, 4};

for (int j = 0; j < 5; ++j) {
  a.data[j] += j;
}

return a;
}


GCC-9 generates ~20 instructions with jmps.

Rewriting the same function with unrolled loop makes the assembly much better:

array test2(int i) {
array a = {1, i, 2, 3, 4};
a.data[0] += 0;
a.data[1] += 1;
a.data[2] += 2;
a.data[3] += 3;
a.data[4] += 4;

return a;
}


Assembly for `test2` takes only ~8 instructions:
test2(int):
add esi, 1
mov DWORD PTR [rdi], 1
mov rax, rdi
movabs  rdx, 25769803780
mov DWORD PTR [rdi+4], esi
mov QWORD PTR [rdi+8], rdx
mov DWORD PTR [rdi+16], 8
ret

[Bug middle-end/89922] Loop on fixed size array is not unrolled and poorly optimized at -O2

2019-04-04 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922

--- Comment #2 from Antony Polukhin  ---
The estimation is very close to the actual result for the loop.

But it does not take into the account the instructions before the loop that are
eliminated due to unrolling. Some heuristic like "initializing the local
variable with goes away for unrolled loops if the variable is rewritten in loop
or if the variable is not used outside the loop"

[Bug middle-end/89922] Loop on fixed size array is not unrolled and poorly optimized at -O2

2019-04-05 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922

--- Comment #4 from Antony Polukhin  ---
> Was the testcase just an artificial one or does it appear (in this
> isolated form!) in a real application/benchmark?

I was not investigating a particular benchmark or real world application at
first.

My guess is that heuristic will affect cryptography (initializing big arrays
with magic constants) and math (matrix multiplication with identity matrix for
example).

I've tried to check the validity of the guess. The very first attempt
succeeded. Hash computation for a constant string is not well optimized:
https://godbolt.org/z/iKi0pb The heuristic may notice that the string is a
local variable and may force the loop unrolling. Hash computations on a
constant variable is a common case in libstdc++ when working with unordered
maps and sets.

There's definitely some room for improvement for cases when a local variable is
used in the loop only.

[Bug libstdc++/90008] New: [9 Regression] variant attempts to copy rhs in comparison operators

2019-04-08 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90008

Bug ID: 90008
   Summary: [9 Regression] variant attempts to copy rhs in
comparison operators
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

_VARIANT_RELATION_FUNCTION_TEMPLATE accidentally accepts the second visitable
by copy in `__do_visit<__detail::__variant::__visit_with_index>`. 

The following test fails right now, but worked in GCC-8:

#include 

struct user_defined {
user_defined();
user_defined(const user_defined&) = delete;
user_defined(user_defined&&) = delete;
};

bool operator==(const user_defined& x, const user_defined& y) { return true; }

using v_t = std::variant;

auto test(const v_t& v, const v_t& v2) {
return v == v2;
}

[Bug target/90202] New: AVX-512 instructions not used

2019-04-22 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90202

Bug ID: 90202
   Summary: AVX-512 instructions not used
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following test program:


struct v {
int val[16];
};

v test(v a, v b) {
v res;

for (int i = 0; i < 16; i++)
res.val[i] = a.val[i] + b.val[i];

return res;
}


When compiled with `g++ -O3 -march=skylake-avx512` the following assembly is
produced:
test(v, v):
  push rbp
  mov rax, rdi
  mov rbp, rsp
  vmovdqu32 ymm1, YMMWORD PTR [rbp+16]
  vmovdqu32 ymm2, YMMWORD PTR [rbp+48]
  vpaddd ymm0, ymm1, YMMWORD PTR [rbp+80]
  vmovdqu32 YMMWORD PTR [rdi], ymm0
  vpaddd ymm0, ymm2, YMMWORD PTR [rbp+112]
  vmovdqu32 YMMWORD PTR [rdi+32], ymm0
  vzeroupper
  pop rbp
  ret

it seems suboptimal, as the 512 registers are available and a better assembly
is possible:
test(v, v):
  vmovdqu32 zmm0, zmmword ptr [rsp + 72]
  vpaddd zmm0, zmm0, zmmword ptr [rsp + 8]
  vmovdqu32 zmmword ptr [rdi], zmm0
  mov rax, rdi
  vzeroupper
  ret

[Bug c/90204] New: [8 Regression] C code is optimized worse than C++

2019-04-22 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90204

Bug ID: 90204
   Summary: [8 Regression] C code is optimized worse than C++
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


struct v {
int val[16];
};

struct v test(struct v a, struct v b) {
struct v res;

for (int i = 0; i < 16; i++)
res.val[i] = a.val[i] + b.val[i];

return res;
}


Compiling that snippet with `g++ -O3 -march=skylake-avx512` gives a short
assembly:
test(v, v):
  push rbp
  mov rax, rdi
  mov rbp, rsp
  vmovdqu32 ymm1, YMMWORD PTR [rbp+16]
  vmovdqu32 ymm2, YMMWORD PTR [rbp+48]
  vpaddd ymm0, ymm1, YMMWORD PTR [rbp+80]
  vmovdqu32 YMMWORD PTR [rdi], ymm0
  vpaddd ymm0, ymm2, YMMWORD PTR [rbp+112]
  vmovdqu32 YMMWORD PTR [rdi+32], ymm0
  vzeroupper
  pop rbp
  ret


Compiling the same sample with the C compiler and same flags produces a ~150
lines of assembly with a lot of jumps and comparisons. The regression appeared
after GCC-7.3

[Bug target/90202] AVX-512 instructions not used

2019-04-22 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90202

--- Comment #2 from Antony Polukhin  ---
Then I'm fine with the current codegen.

However with -mavx512f it produces a few additional instructions for rbp
register

test(v, v):
  push rbp  ; not necessary
  mov rax, rdi
  mov rbp, rsp  ; not necessary
  vmovdqu32 zmm1, ZMMWORD PTR [rbp+16]; could use rsp directly
  vpaddd zmm0, zmm1, ZMMWORD PTR [rbp+80] ; could use rsp directly
  vmovdqu32 ZMMWORD PTR [rdi], zmm0
  vzeroupper
  pop rbp   ; not necessary
  ret

[Bug c++/90647] New: Warn on returning a lambda with captured local variables

2019-05-27 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90647

Bug ID: 90647
   Summary: Warn on returning a lambda with captured local
variables
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

auto test(int s) {
return [&s] { return s; };
}


`s` is a local variable, so we return a lambda that has a dangling reference.

It would be nice to have a warning for such cases.

[Bug c++/90666] New: Warn if an UB was met during constexpr evaluation attempt

2019-05-29 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90666

Bug ID: 90666
   Summary: Warn if an UB was met during constexpr evaluation
attempt
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


constexpr int test() {
  const char* from = "wow";
  char dest[1] = {*from};

  // assignment to dereferenced one-past-the-end pointer
  dest[1] = 0;
  return 0;
}

const auto r = test();


`test()` function is a constexpr function, yet any attempt to call it causes
UB.

It would be very helpful to have a warning for a constexpr evaluation attempt
that met an UB and fell back to runtime evaluation.


Note, that such warning would be extremely helpful for contracts. It would
allow to detect contract violations at compile time:


constexpr int impl(int num)
  [[pre: num > 0]]
{
  return num + 42;
}

auto test() {
  // Core constant expression:
  const auto f0 = impl(1);

  // Runtime call to __on_contract_violation.
  // Warning would be very helpful.
  const auto f1 = impl(0); 

}

[Bug libstdc++/71579] type_traits miss checks for type completeness in some traits

2019-05-31 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71579

--- Comment #9 from Antony Polukhin  ---
(In reply to Jonathan Wakely from comment #8)
> Is there more work to do to support the whole of https://wg21.link/lwg2797 ?

Looks like I've missed the is_nothrow_invocable_r, is_convertible,
is_nothrow_convertible, is_swappable_with, is_nothrow_swappable_with. I'll add
static asserts in a separate patch.

is_base_of is a hard one. But doable.

Non first template arguments of some traits could be hardened further. However
there are doubts about hardening those and especially the `R` parameter of the
is_*invocable_r traits:


#include 

struct X;
struct foo{
  X operator()(X&, X&);
};

// OK on GCC and Clang
constexpr bool r0 = std::is_invocable_r::value;


struct Y {
Y& operator=(X );
};

// OK on GCC, ill-formed on clang
constexpr bool r1 = std::is_assignable::value;

I'm not sure what to do. We may harden those and make the behavior match the
Comments/Preconditions columns in the [meta.*], or relax those preconditions in
the WD, or do nothing and leave it as is. Right now I'm in favor of the second
approach.

[Bug libstdc++/71579] type_traits miss checks for type completeness in some traits

2019-05-31 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71579

--- Comment #12 from Antony Polukhin  ---
(In reply to Jonathan Wakely from comment #11)
> This change broke a compiler test: g++.dg/cpp0x/noexcept15.C
> 
> I'll have to figure out how to update that test to keep testing what it was
> meant to test, without triggering the library assertion.

Something like the following should do the trick
- noexcept(std::is_nothrow_move_constructible::value)
+ noexcept(noexcept(std::declval() = std::declval()))

[Bug libstdc++/71579] type_traits miss checks for type completeness in some traits

2019-05-31 Thread antoshkka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71579

--- Comment #13 from Antony Polukhin  ---
I meant 
+ noexcept(noexcept(Tp(std::declval(

but now I'm not sure that it would test excactly the same thing.

  1   2   3   >