[llvm-bugs] [Bug 40678] New: llvm-config generates options that are incompatible with clang++

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40678

Bug ID: 40678
   Summary: llvm-config generates options that are incompatible
with clang++
   Product: clang
   Version: 7.0
  Hardware: PC
OS: Linux
Status: NEW
  Severity: normal
  Priority: P
 Component: Driver
  Assignee: unassignedclangb...@nondot.org
  Reporter: michael.finn.jorgen...@gmail.com
CC: llvm-bugs@lists.llvm.org, neeil...@live.com,
richard-l...@metafoo.co.uk

Created attachment 21460
  --> https://bugs.llvm.org/attachment.cgi?id=21460&action=edit
Output from "llvm-config --cxxflags"

I have a minimal C++ file (named test.cc), and I try to compile it with clang++
and llvm-config.

I run the command:
clang++ `llvm-config --cxxflags` test.cc

This command fails with the error:
clang-7: error: unknown argument: '-fstack-clash-protection'

If I run "llvm-config --cxxflags" alone I see that the problematic option is
indeed part of the output. The full output is attached.

I believe this is an error, and that the options generated by the llvm-config
command should all be accepted by the clang++ command.


Additional information:

I can compile the given file test.cc without problems if I don't invoke
llvm-config, i.e. the following command works without error:
clang++ test.cc

The versions I'm using are: 7.0.1 for both clang++ and llvm-config.

I'm using Fedora 29 (64-bit).

P.S. This is all part of a large project (not my own) on github
(https://github.com/ghdl/ghdl).

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40679] New: Merge r351322 into the 8.0 branch : [MSan] Apply the ctor creation scheme of TSan

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40679

Bug ID: 40679
   Summary: Merge r351322 into the 8.0 branch : [MSan] Apply the
ctor creation scheme of TSan
   Product: new-bugs
   Version: 8.0
  Hardware: All
OS: All
Status: NEW
  Severity: enhancement
  Priority: P
 Component: new bugs
  Assignee: unassignedb...@nondot.org
  Reporter: philip.pfa...@gmail.com
CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org
Blocks: 40331

Is it OK to merge the following revision(s) to the 8.0 branch?


Referenced Bugs:

https://bugs.llvm.org/show_bug.cgi?id=40331
[Bug 40331] [meta] 8.0.0 Release Blockers
-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] Issue 10250 in oss-fuzz: llvm: Build failure

2019-02-10 Thread ClusterFuzz-External via monorail via llvm-bugs


Comment #36 on issue 10250 by ClusterFuzz-External: llvm: Build failure
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10250#c36

Friendly reminder that the the build is still failing.
Please try to fix this failure to ensure that fuzzing remains productive.
Latest build log:  
https://oss-fuzz-build-logs.storage.googleapis.com/log-e51246c5-5dd0-434c-bc0f-00e151c28048.txt


--
You received this message because:
  1. You were specifically CC'd on the issue

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

Reply to this email to add a comment.
___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 30690] error in backend: Cannot select: masked_gather

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=30690

Simon Pilgrim  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from Simon Pilgrim  ---
Fixed in trunk

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40657] IR programs printing incorrect results after being compiled with -O0

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40657

Sanjay Patel  changed:

   What|Removed |Added

 Status|CONFIRMED   |RESOLVED
 Fixed By Commit(s)||r353615, r353639
 Resolution|--- |FIXED

--- Comment #10 from Sanjay Patel  ---
All attached examples should be fixed after:
https://reviews.llvm.org/rL353639

Feel free to reopen if that's not correct.

Side note about fuzzing for LLVM bugs:
If you're looking for backend bugs, it might be worth trying something like
this:

$ clang -O2 test.c -S -emit-llvm -Xclang -disable-llvm-optzns -o
unoptimized_ir.ll
$ llc -O2 unoptimized_ir.ll -o unoptimized_ir_optimized_asm.s
$ clang unoptimized_ir_optimized_asm.s -o maybe_buggy_executable
$ clang test.c -o reference_executable
$ { compare output of } reference_executable maybe_buggy_executable

A lot of people get that 1st step wrong: you want clang to create an IR file
that allows optimization by the backend, but skip intermediate optimization.
That doesn't happen if you use "-O0 -emit-llvm" with clang.

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 38971] [X86] Reductions should use smaller vector types later on

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=38971

Sanjay Patel  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 Fixed By Commit(s)||r353641

--- Comment #6 from Sanjay Patel  ---
We should get the ideal output for this example with or without -fast-hops
after:
https://reviews.llvm.org/rL353641
https://reviews.llvm.org/D57841

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40680] New: clang: error: unable to execute command: Segmentation fault when compiling Apache Qpid Proton

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40680

Bug ID: 40680
   Summary: clang: error: unable to execute command: Segmentation
fault when compiling Apache Qpid Proton
   Product: clang
   Version: trunk
  Hardware: PC
OS: Linux
Status: NEW
  Severity: release blocker
  Priority: P
 Component: -New Bugs
  Assignee: unassignedclangb...@nondot.org
  Reporter: jda...@redhat.com
CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org,
neeil...@live.com, richard-l...@metafoo.co.uk

Created attachment 21461
  --> https://bugs.llvm.org/attachment.cgi?id=21461&action=edit
/tmp/bitmask-079a44.c

-- Build files have been written to: /home/buildadm/qpid-dispatch/build
+ CC='clang-9 -fsanitize=thread'
+ CXX='clang++-9 -fsanitize=thread'
+ LDFLAGS=-fsanitize=thread
+ ninja -v install
[1/90] cd /home/buildadm/qpid-dispatch/build/src && /usr/bin/python
/home/buildadm/qpid-dispatch/build/tests/run.py -s
/home/buildadm/qpid-dispatch/src/schema_c.py
[2/90] /usr/bin/clang-9  -fsanitize=thread -Dqpid_dispatch_EXPORTS -I../include
-Iinclude -I/home/buildadm/qpid-proton/build/install/include
-I/usr/include/python2.7 -I../src -I../src/router_core -Isrc -fsanitize=thread
-O2 -g -DNDEBUG -fPIC   -g -fno-omit-frame-pointer -Werror -Wall -Wpedantic
-std=gnu99 -pthread -Wno-gnu-statement-expression -MD -MT
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o -MF
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o.d -o
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o   -c
/home/buildadm/qpid-dispatch/src/bitmask.c
FAILED: src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o 
/usr/bin/clang-9  -fsanitize=thread -Dqpid_dispatch_EXPORTS -I../include
-Iinclude -I/home/buildadm/qpid-proton/build/install/include
-I/usr/include/python2.7 -I../src -I../src/router_core -Isrc -fsanitize=thread
-O2 -g -DNDEBUG -fPIC   -g -fno-omit-frame-pointer -Werror -Wall -Wpedantic
-std=gnu99 -pthread -Wno-gnu-statement-expression -MD -MT
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o -MF
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o.d -o
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o   -c
/home/buildadm/qpid-dispatch/src/bitmask.c
Stack dump:
0.  Program arguments: /usr/lib/llvm-9/bin/clang -cc1 -triple
x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier
-discard-value-names -main-file-name bitmask.c -mrelocation-model pic
-pic-level 2 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose
-mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64
-dwarf-column-info -debug-info-kind=limited -dwarf-version=4
-debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file
/home/buildadm/qpid-dispatch/build/src/CMakeFiles/qpid-dispatch.dir/bitmask.c.gcno
-resource-dir /usr/lib/llvm-9/lib/clang/9.0.0 -dependency-file
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o.d -sys-header-deps -MT
src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o -D qpid_dispatch_EXPORTS -I
../include -I include -I /home/buildadm/qpid-proton/build/install/include -I
/usr/include/python2.7 -I ../src -I ../src/router_core -I src -D NDEBUG
-internal-isystem /usr/local/include -internal-isystem
/usr/lib/llvm-9/lib/clang/9.0.0/include -internal-externc-isystem
/usr/include/x86_64-linux-gnu -internal-externc-isystem /include
-internal-externc-isystem /usr/include -O2 -Werror -Wall -Wpedantic
-Wno-gnu-statement-expression -std=gnu99 -fdebug-compilation-dir
/home/buildadm/qpid-dispatch/build -ferror-limit 19 -fmessage-length 0
-fsanitize=thread -pthread -fobjc-runtime=gcc -fdiagnostics-show-option
-vectorize-loops -vectorize-slp -o src/CMakeFiles/qpid-dispatch.dir/bitmask.c.o
-x c /home/buildadm/qpid-dispatch/src/bitmask.c -faddrsig 
1.  clang: error: unable to execute command: Segmentation fault
clang: error: clang frontend command failed due to signal (use -v to see
invocation)
clang version 9.0.0-svn353471-1~exp1+0~20190207214556.1054~1.gbp77e1bc (trunk)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
clang: note: diagnostic msg: PLEASE submit a bug report to
https://bugs.llvm.org/ and include the crash backtrace, preprocessed source,
and associated run script.
clang: note: diagnostic msg: 


PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/bitmask-079a44.c
clang: note: diagnostic msg: /tmp/bitmask-079a44.sh
clang: note: diagnostic msg: 


2019/02/10 16:21:20 Clang crash detected
[3/90] /usr/bin/clang-9  -fsanitize=thread -Dqpid_dispatch_EXPORTS -I../include
-Iinclude -I/home/buildadm/qpid-proton/build/install/include
-I/usr/include/python2.7 -I../src -I../src/router_core -Isrc -fsanitize=thread
-O2 -g -DNDEBUG -fPIC   -g -fno-omit-frame-pointer -Werror -Wall -Wpedantic
-std=gnu99 -pthread -Wno-gnu-statement-expression -MD -MT
src/CMakeFiles/qpid-dispatch

[llvm-bugs] [Bug 40681] New: [X86] LLVM 7.0.x optimises out variable init at -O1

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40681

Bug ID: 40681
   Summary: [X86] LLVM 7.0.x optimises out variable init at -O1
   Product: libraries
   Version: 7.0
  Hardware: PC
OS: All
Status: NEW
  Severity: enhancement
  Priority: P
 Component: Backend: X86
  Assignee: unassignedb...@nondot.org
  Reporter: vit9...@avp.su
CC: craig.top...@gmail.com, llvm-bugs@lists.llvm.org,
llvm-...@redking.me.uk, spatel+l...@rotateright.com

Created attachment 21463
  --> https://bugs.llvm.org/attachment.cgi?id=21463&action=edit
Test C file

LLVM 7.0 generates invalid code optimises out variable zeroing for 32-bit X86
at -O1 or higher when sanitizers are enabled. I was able to reproduce the issue
with AddressSanitizer or UndefinedBehaviorSanitizer enabled, yet I believe they
are just the trigger point. The IR looks fine, so most likely the issue lies in
LLVM itself.

The bug is not reproducible on LLVM 8.0 or trunk. If LLVM 7.1 release is
abandoned, it should be closed, otherwise I believe it is to be release
blocker.

Test example is provided in the attachment. Both C file and generated .S file.

clang -S -c -target i386-gnu-linux -march=pentium2 -pipe -nostdinc
-fno-asynchronous-unwind-tables -O1 -fno-builtin -I. -fno-omit-frame-pointer
-m32 -fno-stack-protector -fsanitize=address -c d.c -o d.S

Relevant comments for generated asm:

pushl %esi
...
# implicit-def: $esi ; allocates r temporary in %esi, which is filled with
random data
...
movl %esi, -16(%ebp) 
...
calll func1
testl %eax, %eax
movl -16(%ebp), %ecx ; writes random data to %ecx
cmovsl %eax, %ecx ; if (%eax < 0) %ecx = %eax
movl %ecx, -16(%ebp) ; %ecx is returned back to stack
...
jns .LBB0_11 → if (%eax < 0) goto 11
jmp .LBB0_19
...
.LBB0_19:
...
movl -16(%ebp), %eax ; function returns random data when func1 returns >= 0
...
ret

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40682] New: clang++ miscompilation while generating copy constructors of classes containing 0-length array

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40682

Bug ID: 40682
   Summary: clang++ miscompilation while generating copy
constructors of classes containing 0-length array
   Product: clang
   Version: trunk
  Hardware: PC
OS: All
Status: NEW
  Severity: normal
  Priority: P
 Component: C++
  Assignee: unassignedclangb...@nondot.org
  Reporter: joran.biga...@gmail.com
CC: blitzrak...@gmail.com, dgre...@apple.com,
erik.pilking...@gmail.com, llvm-bugs@lists.llvm.org,
richard-l...@metafoo.co.uk

The auto-generated copy constructor of a struct containing a 0-length array
followed by exactly 1 trivially copyable field will not copy the later.
This holds for copy assignments.

Here is a simple example:

===

#include 

struct NonTrivial {
int n;

NonTrivial& operator=(NonTrivial o) {
this->n = o.n;
return *this;
}
};

struct S {
NonTrivial _a;  // to force clang to generate a copy assignment constructor
int ok;  // not mandatory, only to show other trivial fields are still
copied
int _b[0];
int bugged;  // not copied by the auto-generated copy assignment
constructor, unless directly followed by another non-trivial field
};

int main() {
S foo;
foo.ok = 11;
foo.bugged = 22;

S bar;
bar.ok = 9876;
bar.bugged = 4321;

bar = foo;

printf("%d %d ; %d %d\n", foo.ok, foo.bugged, bar.ok, bar.bugged);
// expected: 11 22 ; 11 22
//   output: 11 22 ; 11 4321

return 0;
}

===

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] Issue 11097 in oss-fuzz: llvm/llvm-isel-fuzzer--x86_64-O2: Timeout in llvm_llvm-isel-fuzzer--x86_64-O2

2019-02-10 Thread ClusterFuzz-External via monorail via llvm-bugs

Updates:
Labels: -Reproducible Unreproducible

Comment #6 on issue 11097 by ClusterFuzz-External:  
llvm/llvm-isel-fuzzer--x86_64-O2: Timeout in  
llvm_llvm-isel-fuzzer--x86_64-O2

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11097#c6

ClusterFuzz testcase 5642269969874944 appears to be flaky, updating  
reproducibility label.


--
You received this message because:
  1. You were specifically CC'd on the issue

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

Reply to this email to add a comment.
___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40681] [X86] LLVM 7.0.x optimises out variable init at -O1

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40681

vit9696  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from vit9696  ---
Thanks for that catch and sorry. After a subsequent review, I discovered that
the issue was lost during the minimisation.

However, a second attempt to minimise it actually showed that the issue only
exists in our local copy, and not in upstream. Closing this as invalid.

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] Issue 13044 in oss-fuzz: llvm/clang-fuzzer: Stack-overflow in clang::Parser::ParseConstantExpressionInExprEvalContext

2019-02-10 Thread ClusterFuzz-External via monorail via llvm-bugs

Status: New
Owner: 
CC: k...@google.com, masc...@google.com, jdevlieg...@apple.com,  
igm...@gmail.com, mit...@google.com, eney...@google.com,  
llvm-b...@lists.llvm.org, j...@chromium.org, v...@apple.com,  
mitchphi...@outlook.com, xpl...@gmail.com, akils...@apple.com
Labels: ClusterFuzz Stability-Memory-AddressSanitizer Reproducible  
Engine-libfuzzer Proj-llvm Reported-2019-02-11

Type: Bug

New issue 13044 by ClusterFuzz-External: llvm/clang-fuzzer: Stack-overflow  
in clang::Parser::ParseConstantExpressionInExprEvalContext

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13044

Detailed report: https://oss-fuzz.com/testcase?key=5687174530334720

Project: llvm
Fuzzer: libFuzzer_llvm_clang-fuzzer
Fuzz target binary: clang-fuzzer
Job Type: libfuzzer_asan_llvm
Platform Id: linux

Crash Type: Stack-overflow
Crash Address: 0x7ffc296c8f80
Crash State:
  clang::Parser::ParseConstantExpressionInExprEvalContext
  clang::Parser::ParseTemplateArgument
  clang::Parser::ParseTemplateArgumentList

Sanitizer: address (ASAN)

Reproducer Testcase:  
https://oss-fuzz.com/download?testcase_id=5687174530334720


Issue filed automatically.

See https://github.com/google/oss-fuzz/blob/master/docs/reproducing.md for  
instructions to reproduce this bug locally.


When you fix this bug, please
  * mention the fix revision(s).
  * state whether the bug was a short-lived regression or an old bug in any  
stable releases.

  * add any other useful information.
This information can help downstream consumers.

If you need to contact the OSS-Fuzz team with a question, concern, or any  
other feedback, please file an issue at  
https://github.com/google/oss-fuzz/issues.


--
You received this message because:
  1. You were specifically CC'd on the issue

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

Reply to this email to add a comment.
___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40683] New: [OptimizePHIs + Expensive Checks] Bad machine code: Virtual register killed in block, but needed live out.

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40683

Bug ID: 40683
   Summary: [OptimizePHIs + Expensive Checks]  Bad machine code:
Virtual register killed in block, but needed live out.
   Product: libraries
   Version: trunk
  Hardware: PC
OS: Linux
Status: NEW
  Severity: enhancement
  Priority: P
 Component: Common Code Generator Code
  Assignee: unassignedb...@nondot.org
  Reporter: pauls...@linux.vnet.ibm.com
CC: llvm-bugs@lists.llvm.org

Created attachment 21465
  --> https://bugs.llvm.org/attachment.cgi?id=21465&action=edit
reduced testcase

bin/llc -mcpu=z10 -O3 tc_vregkill_liveout.ll -o -

*** Bad machine code: Virtual register killed in block, but needed live out.
***
- function:main
- basic block: %bb.2  (0x2aa66bceb88)
Virtual register %1 is used after the block.
LLVM ERROR: Found 1 machine code errors.

It seems that "Optimize machine instruction PHIs" is transforming this block,
which is inside a loop:

bb.2 (%ir-block.3):
; predecessors: %bb.1, %bb.3
  successors: %bb.4(0x4000), %bb.5(0x4000); %bb.4(50.00%),
%bb.5(50.00%)

  %3:gr32bit = PHI %1:gr32bit, %bb.1, %0:gr32bit, %bb.3
  %4:gr32bit = IMPLICIT_DEF
  CR killed %3:gr32bit, %4:gr32bit, implicit-def $cc
  %5:gr32bit = LHI 0
  %6:gr32bit = LHI -10
  BRC 14, 4, %bb.4, implicit $cc

to

bb.2 (%ir-block.3):
; predecessors: %bb.1, %bb.3
  successors: %bb.4(0x4000), %bb.5(0x4000); %bb.4(50.00%),
%bb.5(50.00%)

  %4:gr32bit = IMPLICIT_DEF
  CR killed %1:gr32bit, %4:gr32bit, implicit-def $cc
  %5:gr32bit = LHI 0
  %6:gr32bit = LHI -10
  BRC 14, 4, %bb.4, implicit $cc

Both %1 and %0 are defined outside the loop, and %0 is a copy of %1. %1 should
not have the kill flag since it is used in the next iteration as well.

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40684] New: Merge r353656 to the 8.0 branch

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40684

Bug ID: 40684
   Summary: Merge r353656 to the 8.0 branch
   Product: clang
   Version: 8.0
  Hardware: Macintosh
OS: OpenBSD
Status: NEW
  Severity: normal
  Priority: P
 Component: LLVM Codegen
  Assignee: unassignedclangb...@nondot.org
  Reporter: b...@comstyle.com
CC: llvm-bugs@lists.llvm.org, neeil...@live.com,
richard-l...@metafoo.co.uk
Blocks: 40331

Merge r353656 back to the 8.0 branch. OpenBSD/NetBSD/PPC diff for proper long
double setting.


Referenced Bugs:

https://bugs.llvm.org/show_bug.cgi?id=40331
[Bug 40331] [meta] 8.0.0 Release Blockers
-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40553] wrong sizeof(long double) in 32-bit PowerPC NetBSD, OpenBSD

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40553

Brad Smith  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
 CC||b...@comstyle.com

--- Comment #1 from Brad Smith  ---
Commited r353656.

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40685] New: [5, 6, 7, 8, 9 regression] auto-vectorization unpacks, repacks, and unpacks to 32-bit again for count += (bool_arr[i]==0) for boolean array, using 3x the shuffles needed

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40685

Bug ID: 40685
   Summary: [5,6,7,8,9 regression] auto-vectorization unpacks,
repacks, and unpacks to 32-bit again for count +=
(bool_arr[i]==0) for boolean array, using 3x the
shuffles needed
   Product: new-bugs
   Version: trunk
  Hardware: PC
OS: Linux
Status: NEW
  Keywords: performance, regression
  Severity: enhancement
  Priority: P
 Component: new bugs
  Assignee: unassignedb...@nondot.org
  Reporter: pe...@cordes.ca
CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org

int count(const bool *visited, int len) {
int counter = 0;

for(int i=0;i<100;i++) {  // len unused or not doesn't matter
if (visited[i]==0)
counter++;
}
return counter;
}

(adapted from:
https://stackoverflow.com/questions/54618685/what-is-the-meaning-use-of-the-movzx-cdqe-instructions-in-this-code-output-by-a)

I expected compilers not to notice that byte elements wouldn't overflow (and
make code that unpacks to dword inside the loop), and probably to fail to use
psadbw to hsum bytes inside a loop.  (ICC does that, gcc and MSVC just go
scalar.)

But I didn't expect clang to pack back down to bytes with pshufb after PXOR,
before redoing the expansion to dword with another PMOVZX.  (This is a
regression from clang4.0.1)

https://godbolt.org/z/1SEmTu

# clang version 9.0.0 (trunk 353629) on Godbolt
# -O3 -Wall -march=haswell -fno-unroll-loops -mno-avx
count(bool const*, int):
pxorxmm0, xmm0
xor eax, eax
movdqa  xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [1,1,1,1]
movdqa  xmm2, xmmword ptr [rip + .LCPI0_1] # xmm2 =
<0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u>

.LBB0_1:# =>This Inner Loop Header: Depth=1
pmovzxbdxmm3, dword ptr [rdi + rax] # xmm3 =
mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
pxorxmm3, xmm1
pshufb  xmm3, xmm2
pmovzxbdxmm3, xmm3  # xmm3 =
xmm3[0],zero,zero,zero,xmm3[1],zero,zero,zero,xmm3[2],zero,zero,zero,xmm3[3],zero,zero,zero
paddd   xmm0, xmm3

add rax, 4
cmp rax, 100
jne .LBB0_1
   ... horizontal sum

Unrolling just repeats this pattern
-march=haswell -mno-avx is basically the same.  -march=haswell *with* AVX2 does
slightly better, only unpacking to 16-bit elements in an XMM before repacking,
otherwise it would have needed a lane-crossing byte shuffle to pack back to
bytes for vpmovzxbd ymm, xmm.

So it looks like something really wants to fill up a whole XMM before flipping
bits with PXOR, instead of just flipping packed bits in an XMM with high
garbage.  If you're going to unpack though, you might as well just flip
unpacked booleans so you can load with pmovzx.  movd + pxor would be worse,
especially on CPUs other than Intel SnB-family where an indexed addressing mode
for pmovzx saves front-end bandwidth vs. a separate load.

The pshufb + 2nd pmovzxbd can literally be removed with zero change to the
result, because xmm1 = set1_epi32(1).

pmovzxbd  xmm3, dword ptr [rdi + rax]; un-laminates on SnB
including HSW/SKL
pxor  xmm3, xmm1
paddd xmm0, xmm3

Of course, avoiding a non-indexed addressing mode would also be a good thing
when tuning for Haswell.  Clang/LLVM still use indexed for -march=haswell,
costing an extra uop from un-lamination (pmovzx destination is write-only, so
it always unlaminates an indexed addressing mode.  vpmovzx can't micro-fuse
with a ymm destination, but it can with an xmm destination.)


We could also consider unpacking against zero with punpcklbw / hbw to feed 2x
punpcklwd / hwd, but that saves PXOR instructions and load uops at the cost of
more shuffle uops (6 instead of 4 to get 4 dword vectors).

--

This changed between Clang 4.0.1 and clang 5.0:

# clang4.0.1 inner loop

pmovzxbdxmm3, dword ptr [rdi + rax] # xmm3 =
mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
pxorxmm3, xmm1 # ^= set1_epi32(1)
pandxmm3, xmm2 # &= set1_epi32(255)
paddd   xmm0, xmm3


This is less bad (3x the shuffle-port bottleneck on Haswell/Skylake), so this
is a regression.



## Other missed optimizations

reporting separately, will link the bug number here for LLVM's failure to
efficiently sum 8-bit elements with PSADBW and so on.

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 40686] New: Use PSADBW for horizontal uint8_t byte sums (and accumulate multiple booleans before using it), instead of widening right away

2019-02-10 Thread via llvm-bugs
https://bugs.llvm.org/show_bug.cgi?id=40686

Bug ID: 40686
   Summary: Use PSADBW for horizontal uint8_t byte sums (and
accumulate multiple booleans before using it), instead
of widening right away
   Product: new-bugs
   Version: trunk
  Hardware: PC
OS: Linux
Status: NEW
  Keywords: performance
  Severity: enhancement
  Priority: P
 Component: new bugs
  Assignee: unassignedb...@nondot.org
  Reporter: pe...@cordes.ca
CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org

Same code as PR40685, but this bug is about the other major optimizations that
are possible, not the silly extra shuffles:

// https://godbolt.org/z/1SEmTu  (same Godbolt link as the other PR)
int count(const bool *visited, int len) {
int counter = 0;

for(int i=0;i<100;i++) {  // len unused or not doesn't matter
if (visited[i]==0)
counter++;
}
return counter;
}
(adapted from:
https://stackoverflow.com/questions/54618685/what-is-the-meaning-use-of-the-movzx-cdqe-instructions-in-this-code-output-by-a)

At best, once Bug 40685 is fixed, clang / LLVM is probably doing something like 

pmovzxbdxmm3, dword ptr [rdi + rax]
pxorxmm3, xmm1# flip the bits
paddd   xmm0, xmm3

This costs us a shuffle and 2 other ALU instructions per 4 bools, regardless of
unrolling.

## Other missed optimizations

For large arrays, we only need ~1.25 vector ALU instruction (and a pure load)
plus loop overhead per 16 / 32 / 64 bools (1 per vector with an extra ALU
amortized over a minor unroll by 4).  See below.

Instead of flipping inside the loop, we can do

   for()
  notcount += arr[i];
   return 100-notcount;

We can prove that this can't overflow notcount, because len is small enough and
bool->int can only be 0 or 1.

Byte counters won't overflow for a compile-time-constant array size of 100, so
we can simply sum outside the loop.

count:   # for hard-coded size=100, fully unrolled
vmovdqu  ymm0, [rdi]
vpaddb   ymm0, [rdi+32] # asm syntax leaving out first src = dst
operand
vpaddb   ymm0, [rdi+64]
; 3 * 32 = 96 elements fully unrolled

movd  xmm1, [rdi]
vpaddbymm0, ymm1 # add the last 4 elements

# then hsum the 32x byte accumulators
vpxor  xmm1, xmm1
vpsadbwymm0, ymm1# hsum unsigned bytes into 64-bit elements
vextracti128 xmm1, ymm0, 1
vpaddd xmm0, xmm1
vpunpckhqdq  xmm1, xmm0,xmm0   # saves an imm8 of code size vs. vpshufd
vpaddd xmm0, xmm1

# and do 100 - that
vmovd  edx, xmm0 # original source used int, so it's only a
32-bit result
moveax, 100
subeax, edx
ret

Probably actually best to vextracti128 + paddb first, then psadbw xmm (if we
care about Excavator / Ryzen), but that would make overflow of byte elements
possible for sizes half as large.  If we're using that hsum of bytes as a
canned sequence,  probably easiest to have one that works for all sizes up to
255 vectors.  It's a minor difference, like 1 extra ALU uop.

32 * 255 = 8160 fits in 16 bits, so it really doesn't matter what SIMD element
size we use on the result of psadbw.  PADDQ is slower than PADDD/W/B on
Atom/Silvermont including Goldmont Plus (non-AVX CPUs only), so we might as
well avoid it so the same auto-vectorization pattern is good with just SSE2 /
SSE4 on those CPUs.

Signed int overflow is undefined behaviour, but this is counting something
different than the source; the C abstract machine never has a sum of the true
elements in an `int`.  So we'd better only do it in a way that can't overflow.

## For unknown array sizes, we can use psadbw inside the inner loop.  e.g.

(Without AVX, we'd either need an alignment guarantee or separate movdqu loads,
so for large len reaching an alignment boundary could become valuable to avoid
front-end bottlenecks.)

vpxorymm1, ymm1
.loop
vmovdqu  ymm0, [rdi]
vpaddb   ymm0, [rdi+32] # leaving out first src = dst
vpaddb   ymm0, [rdi+64]
sub  rdi, -128
vpaddb   ymm0, [rdi + 96 - 128]  # still using a disp8 addressing mode

vpsadbw  ymm0, ymm1 # hsum bytes to qword elements
vpaddq   ymm2, ymm0 # accumulate into a qword vector

cmp  rdi, rsi
jb  .loop

return len - hsum(ymm2)

Or if we can't / don't want to sink the boolean inversion out of the loop, we
can start with a vector of set1_epi8( 4 ) and do 

vpsubb  ymm0, ymm3, [rdi]  ; 4 - v0.  ymm3 = set1(4)
vpsubb  ymm0, ymm0, [rdi+32]   ; . - v1
...

Or other even-less optimal ways of flipping inside the loop before summing.

In the general case of counting compare results, vpcmpeqb / vpsubb is good.

---

For hot loops we can put off psadbw for up to 255 iterations (254 vpaddb)
without