: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
unsigned int rotr32(unsigned int v, unsigned int r)
{
return (v>>r)|(v<<(32-r));
}
unsigned long long rotr64(unsigned long long v, unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94135
--- Comment #2 from Jens Seifert ---
POWER8 Processor User’s Manual for the Single-Chip Module:
addi addis add add. subf subf. addic subfic adde addme subfme addze. subfze neg
neg. nego
1 - 2 cycles (GPR)
2 cycles (XER)
5 cycles (CR)
6/cycle,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94135
--- Comment #4 from Jens Seifert ---
Setting CA in XER increases issue to issue latency by 1 on Power8.
See:
Table 10-14. Issue-to-Issue Latencies
In addition, setting the CA restricts instruction reordering.
++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
#include
void patch(std::string& s)
{
std::replace(s.begin(),s.end(),'.','-');
}
gcc replace.C
In file included from
/opt/rh/devtoolset-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
Jens Seifert changed:
What|Removed |Added
Summary|std::replace internal |PPCLE std::replace internal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
--- Comment #3 from Jens Seifert ---
Created attachment 48110
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48110&action=edit
Pre-processed file created using -save-temps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
--- Comment #5 from Jens Seifert ---
No options. Same failure with -O2. System is a RHEL 7.5.
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/ppc64le-redhat-linux/8/lto-wrapper
Target: ppc64le-
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
#include
static const double dsmall[] = { -DBL_MAX };
gcc ccerr.C
ccerr.C:3:1: internal compiler error: Segmentation fault
static const double dsmall[] = { -DBL_MAX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94519
Jens Seifert changed:
What|Removed |Added
Status|RESOLVED|CLOSED
--- Comment #2 from Jens Seifert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
Jens Seifert changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
--- Comment #8 from Jens Seifert ---
Too old libgmp got picked up. Setting LD_LIBRARY_PATH=/lib64 solved the issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
Jens Seifert changed:
What|Removed |Added
Status|RESOLVED|CLOSED
--- Comment #9 from Jens Seifert
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Created attachment 48741
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48741&action=edit
input with branchless 128-bit shifts
PowerPC processors don
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704
--- Comment #1 from Jens Seifert ---
Created attachment 48742
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48742&action=edit
assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704
--- Comment #3 from Jens Seifert ---
GCC 8.3 generates:
_Z3shloy:
.LFB0:
.cfi_startproc
addi 9,5,-64
cmpwi 7,9,0
blt 7,.L2
sld 4,3,9
li 3,0
blr
.p2align 4,,15
.L2:
srdi 9,3,1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704
--- Comment #5 from Jens Seifert ---
Power9 code is branchfree but not good at all.
_Z3shloy:
.LFB0:
.cfi_startproc
addi 8,5,-64
subfic 6,5,63
srdi 10,3,1
li 7,0
sld 4,4,5
sld 5,3,5
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long negativeLessThan(unsigned long long a, unsigned long long b)
{
return -(a < b);
}
gcc -m64 -O2 -save-temps negativeLessThan.C
crea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737
Jens Seifert changed:
What|Removed |Added
Status|RESOLVED|UNCONFIRMED
Resolution|DUPLICATE
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long msk66()
{
return 0xULL;
}
gcc -maix64 -O2 const.C -save-temps
Output:
._Z5msk66v:
LFB..0
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
int mod(int x, int y, int &z)
{
z = x % y;
if (y == 0)
{
// division by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93013
--- Comment #7 from Jens Seifert ---
The modulo at the beginning was done for optimization purpose. As the divide
takes long and the special cases are extreme edge cases, it is wise to execute
the divide as early as possible on PPC as divide on P
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned __int128 and128WithConst(unsigned __int128 a)
{
unsigned __int128 c128 = (((unsigned __int128)(~0ULL
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
#include
double vmax(double a, double b)
{
#ifdef _BIG_ENDIAN
const long PREF = 0;
#else
const long PREF = 1;
#endif
vector double va
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93127
Jens Seifert changed:
What|Removed |Added
Target||powerpc-*-*-*
--- Comment #1 from Jens Se
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
vec_promote can leave half of register undefined and therefore should not issue
extra instruction.
Input:
#include
double vmax
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
#include
double d2()
{
return 2.0;
}
vector double v2()
{
return vec_splats(2.0);
}
gcc -O2 -maix64
P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
void memclear16(char *p)
{
memset(p, 0, 16);
}
void memFF16(char *p)
{
memset(p, 0xFF, 16);
}
Output:
._Z10memclear16Pc:
LFB..0:
li
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93128
--- Comment #1 from Jens Seifert ---
Wrong number range for Power7: -16..15
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
void memspace16(char *p)
{
memset(p, ' ', 16);
}
Expected result:
li 4,0x2020
rldimi 4,4,16,0
rldimi 4,4,32,0
std 4,0(3)
Splatting the memset input to 64-bit c
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
All 64-bit constants containing a sequence of ones can be constructed with 2
instructions (li/lis + rldicl). gcc creates up to 5 instructions.
Input:
unsigned long
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
unsigned long long hi16msbon_low16msboff()
{
return 0x87654321ULL; // expected: li 3,0x4321 ; oris 3,0x8765
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I am currently porting an application to PPCLE and found that I am lacking
compiler builtins for decimal floating point quantize on
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I am currently porting an application from AIX to PPCLE and found that I am
lacking compiler builtins for transforming
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
2 samples:
unsigned long long load8r(unsigned long long *in)
{
return __builtin_bswap64(*in);
}
unsigned long long rldimi(unsigned int hi, unsigned int lo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449
--- Comment #2 from Jens Seifert ---
#include
typedef float _Decimal128 __attribute__((mode(TD)));
_Decimal128 bcdtodpd(vector double v)
{
_Decimal128 res;
memcpy(&res, &v, sizeof(res));
res = __builtin_denbcdq(0, res);
return res;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448
--- Comment #4 from Jens Seifert ---
The inline asm constraint "d" works. Thank you.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449
--- Comment #4 from Jens Seifert ---
Power8 has bcdadd which can be only combined with _Decimal128 if you have some
kind of conversion in between BCDs stored in vector register and _Decimal128.
On Power9 vec_load_len/vec_store_len can be used to
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Documentation says:
double __builtin_mtfsf(const int,double)
Not documented in 8.3.0, but somehow works, nevertheless looks like the
prototype is wrong and should be
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
fmr is a 6 cycle instruction on Power8. Why is gcc not using the 2 cycle xxlor
instruction )
Input:
double setflm(double x)
{
double r = __builtin_mffs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70928
Jens Seifert changed:
What|Removed |Added
CC||jens.seifert at de dot ibm.com
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int extract(vector signed int v)
{
return v[2];
}
Command line:
gcc -mcpu=power8 -maltivec -m64 -O3 -save-temps extract.C
Output:
_Z7extractDv4_i:
.LFB0
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
double sign(double in)
{
return in == 0.0 ? 0.0 : copysign(1.0, in);
}
Command line:
gcc m64 -O2 -save
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98020
Jens Seifert changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
gcc only provides
unsigned int __builtin_addg6s (unsigned int, unsigned int);
but addg6s is a 64-bit operation. I require
unsigned long long __builtin_addg6s (unsigned long long
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Initializing a __int128 from 2 64-bit integers is implemented very inefficient.
The most natural code which works good on all other platforms generate
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
https://gcc.gnu.org/onlinedocs/gcc/Basic-PowerPC-Built-in-Functions-Available-on-ISA-3_002e1.html#Basic-PowerPC-Built-in-Functions-Available-on-ISA-3_002e1
Please improve the
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned __int128 div(unsigned __int128 a, unsigned __int128 b)
{
return a/b;
}
__int128 div(__int128 a, __int128 b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809
--- Comment #1 from Jens Seifert ---
Same applies to modulo.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100808
--- Comment #1 from Jens Seifert ---
https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1.html
vector unsigned long long int vec_gnb (vector unsigned __int128, const unsigned
char)
should be
unsigned
mal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector unsigned short revb(vector unsigned short a)
{
return vec_revb(a);
}
creates:
_Z4revbDv8_t:
.L
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector unsigned short revb(vector unsigned short a)
{
return vec_revb(a);
}
Creates:
_Z4revbDv4_j:
.LFB1
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector double reve(vector double a)
{
return vec_reve(a);
}
creates:
_Z4reveDv2_d:
.LFB3:
.cfi_startproc
.LCF3:
0: addis 2,12,.TOC.-.LCF3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector double reve(vector double a)
{
return vec_reve(a);
}
creates:
_Z4reveDv2_d:
.LFB3:
.cfi_startproc
larl%r5,.L12
vl
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
Input:
vector double doublee(vector float a)
{
return vec_doublee(a);
}
cause compile error:
vec.C: In function ‘__vector(2) double doublee
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100808
--- Comment #3 from Jens Seifert ---
- Avoid additional "int" unsigned long long int => unsigned long long
Why? Those are exactly the same types!
Yes, but the rest of the documentation uses unsigned long long.
This is just for consistency wit
mal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector unsigned short load_be(unsigned short *c)
{
return vec_xl_be(0L, c);
}
creates:
_Z7load_bePt:
.L
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Using the same names like xlC appreciated:
vec_extsbd, vec_extsbw, vec_extshd, vec_extshw, vec_extswd
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
vector unsigned long long mul64(vector unsigned long long a, vector unsigned
long long b)
{
return a * b;
}
creates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866
--- Comment #7 from Jens Seifert ---
Regarding vec_revb for vector unsigned int. I agree that
revb:
.LFB0:
.cfi_startproc
vspltish %v1,8
vspltisw %v0,-16
vrlh %v2,%v2,%v1
vrlw %v2,%v2,%v0
blr
work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866
--- Comment #9 from Jens Seifert ---
I know that if I would use vec_perm builtin as an end user, that you then need
to fulfill to the LE specification, but you can always optimize the code as you
like as long as it creates correct results afterw
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
#include
vector unsigned __int128 vsubcuq(vector unsigned __int128 a, vector unsigned
__int128 b)
{
return vec_vsubcuq(a, b);
}
Command line:
gcc -m64 -O2 -maltivec -mcpu
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
bool test(const char *fmt, size_t numTokens, ...)
{
return __builtin_va_arg_pack_len() != numTokens
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #4 from Jens Seifert ---
PPCLE with no special option means -mcpu=power8 -maltivec (altivecle to be mor
precise).
vec_promote(, 1) should be a noop on ppcle. But value gets
splatted to both left and right part of vector register. =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #6 from Jens Seifert ---
The left part of VSX registers overlaps with floating point registers, that is
why no register xxpermdi is required and mfvsrd can access all (left) parts of
VSX registers directly.
The xxpermdi x,y,y,3 indic
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Created attachment 53409
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53409&action=edit
source code
1)
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int compare2(unsigned long long a, unsigned long long b)
{
return (a > b ? 1 : (a < b ? -1 : 0));
}
Output:
_Z8compare2yy:
cmpld 0,3,4
bgt
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Created attachment 53443
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53443&action=edit
source code
long long gtRef(long
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int lt(int a, int b)
{
return a < b;
}
generates:
cr %r2,%r3
lhi %r1,1
lhi %r2,0
locrnl %r1,%r2
l
: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long subfic(unsigned long long a)
{
if (a > 15) __builtin_unreacha
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
unsigned int extr(vector unsigned int v)
{
return vec_extract(v, 2);
}
Generates:
_Z4extrDv4_j:
.LFB1
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
int cmp2(double a, double b)
{
vector double va = vec_promote(a, 1);
vector double vb = vec_promote(b, 1);
vector long long vlt = (vector long long
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #1 from Jens Seifert ---
vec_extract(vr, 1) should extract the left element. But xxpermdi x,x,x,3
extracts the right element.
Looks like a bug in vec_extract for PPCLE and not a problem regarding
unnecessary xxpermdi.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #2 from Jens Seifert ---
vec_extract(vr, 1) should extract the left element. But xxpermdi x,x,x,3
extracts the right element.
Looks like a bug in vec_extract for PPCLE and not a problem regarding
unnecessary xxpermdi.
Using assembly
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
vector unsigned short popcnt(vector unsigned short a)
{
return vec_popcnt(a);
}
Generates with -march=z13
_Z6popcntDv8_t:
.LFB1
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I can't find builtin for vmsumudm instruction.
I also found nothing in the Power vector instrinsic programming reference.
https://openpowerfoundation.org/?resource_lib=
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long M8()
{
return 0x;
}
Generates:
.LC0:
.quad 0x
.text
.align 8
.globl _Z2M8v
.type
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int overflow();
int negOverflow(long long in)
{
if (in
mal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
__int128 imul128(long long a, long long b)
{
return (__int128)a * (__int128)b;
}
creates sequence with 3 multipl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102117
--- Comment #1 from Jens Seifert ---
Sorry small bug in optimal sequence.
__int128 imul128_opt(long long a, long long b)
{
unsigned __int128 x = (unsigned __int128)(unsigned long long)a;
unsigned __int128 y = (unsigned __int128)(unsigned
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long ctzll(unsigned long long x)
{
return __builtin_ctzll(x);
}
creates:
lcgr%r1,%r2
ngr %r2,%r1
lghi%r1,63
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86160
--- Comment #4 from Jens Seifert ---
I am looking forward to get Power9 optimization using xststdcdp etc.
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Due to the fact that vslw, vsld, vsrd, ... only use the modulo of bit width for
shifting, the combination with 0xFF..FF vector can be used to create vector
constants
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
extern unsigned char magic1[256];
unsigned int hash(const unsigned char inp[4])
{
const unsigned long long INIT = 0x1ULL;
unsigned long long h1 = INIT;
h1 = magic1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949
--- Comment #1 from Jens Seifert ---
hash2 is only provided to show how the code should look like (without rlwinm).
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
extern unsigned char magic1[256];
unsigned int hash(const unsigned char inp[4])
{
const unsigned long long INIT = 0x1ULL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949
--- Comment #3 from Jens Seifert ---
*** Bug 108048 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108048
Jens Seifert changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Same issue for PPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949
extern unsigned char magic1[256];
unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108049
--- Comment #1 from Jens Seifert ---
Sample above got compiled with -march=z196
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Missing builtins for vector instructions xxblendvb, xxblendvw, xxblendvd,
xxblendvd.
#include
vector int blendv(vector int a, vector int b, vector int c)
{
return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106043
--- Comment #1 from Jens Seifert ---
Found in documentation:
https://gcc.gnu.org/onlinedocs/gcc-11.3.0/gcc/PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1.html#PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106043
Jens Seifert changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned short swap16(unsigned short in)
{
return __builtin_bswap16(in);
}
generates -O3 -march=z196
swap16(unsigned short):
lrvr%r2,%r2
srl %r2,16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176
--- Comment #7 from Jens Seifert ---
What happened ? Still waiting for improvement.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176
--- Comment #10 from Jens Seifert ---
Looks like no patch in the area got delivered. I did a small test for
unsigned long long c()
{
return 0xULL;
}
gcc 13.2.0:
li 3,0
ori 3,3,0x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115973
--- Comment #2 from Jens Seifert ---
Assembly that better integrates:
unsigned long long addc_opt(unsigned long long a, unsigned long long b,
unsigned long long *res)
{
unsigned long long rc;
__asm__("addc %0,%2,%3;\n\tsubfe
%1,%1,%1":"=r
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long bcdadd(vector __int128 a, vector __int128 b, vector __int128
*c)
{
return __builtin_bcdadd_ov(a, b, 0
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input setToIdentity.C:
#include
#include
#include
void setToIdentityGOOD(unsigned long long *mVec, unsigned int mLen)
{
for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355
--- Comment #1 from Jens Seifert ---
Same issue with gcc 13.2.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355
--- Comment #10 from Jens Seifert ---
Does this affect loop vectorize and slp vectorize ?
-fno-tree-loop-vectorize avoids loop vectorization to be performed and
workarounds this issue. Does the same problems also affect SLP vectorization,
which
1 - 100 of 104 matches
Mail list logo