Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Hi!
AVX512BW-related issue: the C compiler generates superfluous moves from 64-bit
mask registers to 64-bit GPRs and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798
--- Comment #3 from Wojciech Mula ---
Sorry, I didn't find that bug; I think you may close this one.
BTW, I had checked the code on godbolt.org before submitting. I tested also
with their "GCC (trunk)", but the generated code is the same as for
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
SSSE3 instruction PSHUFB (and the AVX2 counterpart VPSHUFB) acts as a
no-operation
when its argument is a sequence 0..15. Such invocation does not
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Let's consider these two simple, yet pretty useful functions:
--test.c---
int both_nonnegative(l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88916
--- Comment #2 from Wojciech Mula ---
(In reply to Richard Biener from comment #1)
> Confirmed.
The first case is OK, but the second (for `both_nonzero`) is obviously wrong.
Sorry for that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88916
--- Comment #3 from Wojciech Mula ---
A similar case:
---sign.c---
int different_sign(long a, long b) {
return (a >= 0 && b < 0) || (a < 0 && b >= 0);
}
---eof--
This is compiled into:
different_sign:
notq%rdi
movq%
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
A common transformation used in a C condition expression is not detected and
code is duplicated. Below
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Instruction BEXTR extracts an arbitrary unsigned bit field from 32- or 64-bit
value. As I see in `config/i386.md`, there's support for the immediate
va
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Let's consider this trivial function:
---clamp.c---
#include
uint64_t clamp1(int64_t x) {
return (x <
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Consider this simple function, which yields mask fors non-zero elements:
---cat cmp.c---
#include
int fun(__m512i x
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
There is a simple function, which checks if there is any non-zero element
in a vector:
---ktest.c---
#include
int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85833
--- Comment #3 from Wojciech Mula ---
Uroš, thank you very much. I didn't pay attention on the AVX512 variant, as I
thought this is so basic instruction that it should be available from AVX512F.
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
GCC is able to use the BLSR instruction in place of expression (x - 1) & x
[which is REALLY nice, thank you :)], but does not utilize CPU flags set by the
instruction. Below
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798
--- Comment #6 from Wojciech Mula ---
Hongtao, thank you for your patch and for pinging back! I checked the code from
this issue against version 11.2.0 (Debian 11.2.0-14), but still, there are
KMOVQs before performing any bit ops. Here is the out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798
--- Comment #8 from Wojciech Mula ---
Thank you for the answer. Thus my question is: is it possible to delay
conversion from kmasks into ints? I'm not a language lawyer, but I guess a `x
binop y` has to be treated as `(int)x binop (int)y`. If it'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114172
Wojciech Mula changed:
What|Removed |Added
CC||wojciech_mula at poczta dot
onet.p
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
This is a distilled procedure from simdutf project:
---
#include
#include
#include
size_t convert_latin1_to_utf16le(const char *src, size_t len
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Consider this simple procedure
---
#include
#include
size_t count_chars(const char *src, size_t len, char c) {
size_t count = 0;
for (size_t i=0; i
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Consider this simple function:
---
#include
bool ext_is_gzip(std::string_view ext) {
return ext == "gzip";
}
---
For the x86 target,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109279
--- Comment #20 from Wojciech Mula ---
This constants is worth checking (appears in division by 10):
```
unsigned long ccd() {
return 0xcccd;
}
```
riscv64-unknown-linux-gnu-g++ (crosstool-NG UNKNOWN) 15.0.0 2024
(experime
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117421
--- Comment #4 from Wojciech Mula ---
Although, there's no word-wise set for equality, thus I think this sequence
would be better.
```
lbu a0, 1(a1)
lbu a2, 0(a1)
lbu a3, 2(a1)
lb a1, 3(a1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117421
--- Comment #3 from Wojciech Mula ---
It's worth noting, that Clang first synthesizes a 32-bit word from individual
bytes, and then use a single comparison.
```
ext_is_gzip(std::basic_string_view>):
li a2, 4
bne a0, a2,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117421
--- Comment #2 from Wojciech Mula ---
First of all, thanks for looking at this!
> I should note that -mno-strict-align still does not do it but that is because
> it might be slow still to do unaligned access.
OK, maybe `-mno-strict-align` sh
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
This come from real-world usage. Suppose we have a vector of words, we want to
move around some bit-fields of that words. We isolate the bit-fields with
24 matches
Mail list logo