[Bug target/88798] New: AVX512BW code does not use bit-operations that work on mask registers

2019-01-10 Thread wojciech_mula at poczta dot onet.pl
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Hi! AVX512BW-related issue: the C compiler generates superfluous moves from 64-bit mask registers to 64-bit GPRs and

[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers

2019-01-11 Thread wojciech_mula at poczta dot onet.pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798 --- Comment #3 from Wojciech Mula --- Sorry, I didn't find that bug; I think you may close this one. BTW, I had checked the code on godbolt.org before submitting. I tested also with their "GCC (trunk)", but the generated code is the same as for

[Bug tree-optimization/88868] New: [SSE] pshufb can be omitted for a specific pattern

2019-01-15 Thread wojciech_mula at poczta dot onet.pl
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- SSSE3 instruction PSHUFB (and the AVX2 counterpart VPSHUFB) acts as a no-operation when its argument is a sequence 0..15. Such invocation does not

[Bug target/88916] New: [x86] suboptimal code generated for integer comparisons joined with boolean operators

2019-01-18 Thread wojciech_mula at poczta dot onet.pl
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Let's consider these two simple, yet pretty useful functions: --test.c--- int both_nonnegative(l

[Bug tree-optimization/88916] [x86] suboptimal code generated for integer comparisons joined with boolean operators

2019-01-21 Thread wojciech_mula at poczta dot onet.pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88916 --- Comment #2 from Wojciech Mula --- (In reply to Richard Biener from comment #1) > Confirmed. The first case is OK, but the second (for `both_nonzero`) is obviously wrong. Sorry for that.

[Bug tree-optimization/88916] [x86] suboptimal code generated for integer comparisons joined with boolean operators

2019-01-22 Thread wojciech_mula at poczta dot onet.pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88916 --- Comment #3 from Wojciech Mula --- A similar case: ---sign.c--- int different_sign(long a, long b) { return (a >= 0 && b < 0) || (a < 0 && b >= 0); } ---eof-- This is compiled into: different_sign: notq%rdi movq%

[Bug tree-optimization/89018] New: common subexpression present in both branches of condition is not factored out

2019-01-23 Thread wojciech_mula at poczta dot onet.pl
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- A common transformation used in a C condition expression is not detected and code is duplicated. Below

[Bug target/89063] New: [x86] lack of support for BEXTR from BMI extension

2019-01-25 Thread wojciech_mula at poczta dot onet.pl
: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Instruction BEXTR extracts an arbitrary unsigned bit field from 32- or 64-bit value. As I see in `config/i386.md`, there's support for the immediate va

[Bug target/89081] New: [x86] suboptimal code generated for condition expression returning negation

2019-01-27 Thread wojciech_mula at poczta dot onet.pl
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Let's consider this trivial function: ---clamp.c--- #include uint64_t clamp1(int64_t x) { return (x <

[Bug target/85832] New: [AVX512] possible shorter code when comparing with vector of zeros

2018-05-18 Thread wojciech_mula at poczta dot onet.pl
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Consider this simple function, which yields mask fors non-zero elements: ---cat cmp.c--- #include int fun(__m512i x

[Bug target/85833] New: [AVX512] use mask registers instructions instead of scalar code

2018-05-18 Thread wojciech_mula at poczta dot onet.pl
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- There is a simple function, which checks if there is any non-zero element in a vector: ---ktest.c--- #include int

[Bug target/85833] [AVX512] use mask registers instructions instead of scalar code

2018-05-22 Thread wojciech_mula at poczta dot onet.pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85833 --- Comment #3 from Wojciech Mula --- Uroš, thank you very much. I didn't pay attention on the AVX512 variant, as I thought this is so basic instruction that it should be available from AVX512F.

[Bug target/85073] New: [x86] extra check after BLSR

2018-03-25 Thread wojciech_mula at poczta dot onet.pl
Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- GCC is able to use the BLSR instruction in place of expression (x - 1) & x [which is REALLY nice, thank you :)], but does not utilize CPU flags set by the instruction. Below

[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers

2022-01-31 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798 --- Comment #6 from Wojciech Mula --- Hongtao, thank you for your patch and for pinging back! I checked the code from this issue against version 11.2.0 (Debian 11.2.0-14), but still, there are KMOVQs before performing any bit ops. Here is the out

[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers

2022-02-07 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798 --- Comment #8 from Wojciech Mula --- Thank you for the answer. Thus my question is: is it possible to delay conversion from kmasks into ints? I'm not a language lawyer, but I guess a `x binop y` has to be treated as `(int)x binop (int)y`. If it'

[Bug target/114172] [13 only] ICE with riscv rvv VSETVL intrinsic

2024-03-28 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114172 Wojciech Mula changed: What|Removed |Added CC||wojciech_mula at poczta dot onet.p

[Bug c++/114747] New: [RISC-V RVV] Wrong SEW set for mixed-size intrinsics

2024-04-16 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- This is a distilled procedure from simdutf project: --- #include #include #include size_t convert_latin1_to_utf16le(const char *src, size_t len

[Bug target/114809] New: [RISC-V RVV] Counting elements might be simpler

2024-04-22 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Consider this simple procedure --- #include #include size_t count_chars(const char *src, size_t len, char c) { size_t count = 0; for (size_t i=0; i

[Bug target/117421] New: [RISCV] Use byte comparison instead of word comparison

2024-11-02 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Consider this simple function: --- #include bool ext_is_gzip(std::string_view ext) { return ext == "gzip"; } --- For the x86 target,

[Bug target/109279] RISC-V: complex constants synthesized should be improved

2024-11-12 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109279 --- Comment #20 from Wojciech Mula --- This constants is worth checking (appears in division by 10): ``` unsigned long ccd() { return 0xcccd; } ``` riscv64-unknown-linux-gnu-g++ (crosstool-NG UNKNOWN) 15.0.0 2024 (experime

[Bug target/117421] [RISCV] Use byte comparison instead of word comparison

2024-11-12 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117421 --- Comment #4 from Wojciech Mula --- Although, there's no word-wise set for equality, thus I think this sequence would be better. ``` lbu a0, 1(a1) lbu a2, 0(a1) lbu a3, 2(a1) lb a1, 3(a1)

[Bug target/117421] [RISCV] Use byte comparison instead of word comparison

2024-11-12 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117421 --- Comment #3 from Wojciech Mula --- It's worth noting, that Clang first synthesizes a 32-bit word from individual bytes, and then use a single comparison. ``` ext_is_gzip(std::basic_string_view>): li a2, 4 bne a0, a2,

[Bug target/117421] [RISCV] Use byte comparison instead of word comparison

2024-11-08 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117421 --- Comment #2 from Wojciech Mula --- First of all, thanks for looking at this! > I should note that -mno-strict-align still does not do it but that is because > it might be slow still to do unaligned access. OK, maybe `-mno-strict-align` sh

[Bug target/119040] New: [PPC/Altivec] Missing bit-level optimization (select)

2025-02-27 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- This come from real-world usage. Suppose we have a vector of words, we want to move around some bit-fields of that words. We isolate the bit-fields with