[RFC] Formation of vector function name
Hi, compilation with options -fopenmp -ffast-math -O1 -msse4 of the test #pragma omp declare simd notinbranch simdlen(2) extern double log (double); int N = 3200; double b[3200]; double a[3200]; int main (void) { int i; #pragma omp simd for (i = 0; i < N; i += 1) { b[i] = log (a[i]); } return (0); } results in asm redirection for log to __log_finite and final vector function name becomes _ZGVbN2v___log_finite. With point of view from C Library side, it reflects in addition of asm redirections _ZGVbN2v___log_finite = _ZGVbN2v_log in the headers. May be the cleaner way is to base vector name on original name of declaration, not asm declaration name (use DECL_NAME instead of DECL_ASSEMBLER_NAME)? According libc-alpha thread is https://sourceware.org/ml/libc-alpha/2015-06/msg00213.html -- WBR, Andrew
[RFC] Difference between gcc and g++ drivers
Hi, Glibc 2.22 will have libm.so implemented as linker script helping to link as needed against vector math library libmvec.so without addition of -lmvec (for not static builds). Another words -lm is enough to link against libmvec.so. But g++ driver inserts -lm for linker command, gcc griver not. It gives different behavior of g++ and gcc (I mean Glibc 2.22 installed system-wide): g++ -fopenmp -ffast-math -O1 cos.c (compiled, -lm passed to linker by driver) gcc -fopenmp -ffast-math -O1 cos.c (linker error because of no -lm passed) /tmp/cclVmUxv.o: In function `main': cos.c:(.text+0x3a): undefined reference to `_ZGVbN2v_cos' cos.c:(.text+0x6b): undefined reference to `cos' collect2: error: ld returned 1 exit status cos.c: #include int N = 3200; double b[3200]; double a[3200]; int main (void) { int i; #pragma omp simd for (i = 0; i < N; i += 1) { b[i] = cos (a[i]); } return (0); } For static builds with gcc needed to add both options with the following order: -lmvec -lm. For static builds with g++ needed to add only -lmvec. Can anybody tell something about this difference in drivers? May be needed fix for g++ driver to add also -lmvec if GLIBC version >= 2.22 found on configure? May be some other way needed to achieve similar drivers behavior? -- WBR, Andrew
Re: [RFC] Formation of vector function name
2015-06-16 17:23 GMT+03:00 Joseph Myers : > On Mon, 15 Jun 2015, Andrew Pinski wrote: > >> > results in asm redirection for log to __log_finite and final vector >> > function name becomes _ZGVbN2v___log_finite. >> > >> > With point of view from C Library side, it reflects in addition of asm >> > redirections _ZGVbN2v___log_finite = _ZGVbN2v_log in the headers. >> > >> > May be the cleaner way is to base vector name on original name of >> > declaration, not asm declaration name (use DECL_NAME instead of >> > DECL_ASSEMBLER_NAME)? >> >> >> I don't think this would be useful really because if you have a >> function say logl where you have two options of long double, you want >> to support both you would name one logl and the other logl128 and then >> using DECL_ASSEMBLER_NAME to from the SIMD name would be useful to use >> the one with logl128 in it rather than logl. > > The point is that the vector versions may not be in one-to-one > correspondence with the scalar versions - you might have several different > scalar versions depending on compiler options, all of which correspond to > a single vector version. Lets come to agreement to change vector function name formation in future GCC versions? Patch to fix it looks quite simple. I am going to regtest it. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index d41688b..8d9de76 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -18687,7 +18687,7 @@ simd_clone_mangle (struct cgraph_node *node, } pp_underscore (&pp); - const char *str = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)); + const char *str = IDENTIFIER_POINTER (DECL_NAME (node->decl)); if (*str == '*') ++str; pp_string (&pp, str); -- WBR, Andrew
[RFC] Support register groups in inline asm
Hi, new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use of register groups. To support register groups feature in inline asm needed some extension with new constraints. Current proposal is the following syntax: __asm__ (“SMTH %[group], %[single]" : [single] "+x"(v0) : [group] "Yg4"(v1), “1+1"(v2), “1+2"(v3), “1+3"(v4)); where "YgN" constraint specifies group of N consecutive registers (which is started from register having number as "0 mod 2^ceil(log2(N))"), and "1+K" specifies the next registers in the group. Is this syntax ok? How to implement it? Any comments or proposals will be appreciated, thanks. -- WBR, Andrew
Re: [RFC] Support register groups in inline asm
2016-11-16 8:02 GMT+03:00 Andrew Pinski : > On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich > wrote: >> Hi, >> >> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use >> of register groups. >> >> To support register groups feature in inline asm needed some extension >> with new constraints. >> >> Current proposal is the following syntax: >> >> __asm__ (“SMTH %[group], %[single]" : >> [single] >> "+x"(v0) : >> [group] >> "Yg4"(v1), “1+1"(v2), “1+2"(v3), “1+3"(v4)); >> >> where "YgN" constraint specifies group of N consecutive registers >> (which is started from register having number as "0 mod >> 2^ceil(log2(N))"), >> and "1+K" specifies the next registers in the group. >> >> Is this syntax ok? How to implement it? > > > Have you looked into how AARCH64 back-end handles this via OI, etc. > Like: > /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments. */ > INT_MODE (OI, 32); > > /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon d-registers >(2 d-regs = 1 q-reg = TImode). */ > INT_MODE (CI, 48); > INT_MODE (XI, 64); > > > And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook? > And the x2 types are defined as a struct of an array like: > typedef struct int8x8x2_t > { > int8x8_t val[2]; > } int8x8x2_t; Thanks! We have to update proposal with changing "+" symbol to "#" specifying offset in a group (to avoid overloading the other meaning of “+” specifying that operand is both input and output). So current proposal of syntax is: __asm__ (“INSTR %[group], %[single]" : [single] "+x"(v0) : [group] "Yg4"(v1), “1#1"(v2), “1#2"(v3), “1#3"(v4)); where "YgN" constraint specifies group of N consecutive registers (which is started from register having number as "0 mod 2^ceil(log2(N))"), and "1#K" specifies the next registers in the group. Some other questions or comments? What about consensus on this syntax? -- WBR, Andrew
Re: Getting a build failure in glibc due to gcc changes on 32bit x86 glibc
On Wed, Nov 26, 2014 at 02:59:34PM -0800, Andrew Pinski wrote: > This looks like the same issue as I reported before about > check_consistency() since that is what is failing to assemble here > too. This is also fixed by this patch - https://sourceware.org/ml/libc-alpha/2014-10/msg00746.html -- WBR, Andrew
OpenMP vector function ABI for x86_64
Hi, during work on addition vector math functions to Glibc and discussions with community was found an issue with meaning of “#pragma omp declare simd” (which will appear in math.h). Issue is there are no working way to specify ISA of vector function in GCC 5.0, and hence no way to determine exact vector function name. Here is description of exact meaning of “#pragma omp declare simd” for x86_64. This is proposed as agreement between compilers supporting OpenMP. *** OpenMP vector function ABI for x86_64 *** Name of vector math function is based on Intel Vector Function ABI (http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf) with a little difference in part of name specifying ISA – namely letters b, c, d instead of x, y, Y. #pragma omp declare simd notinbranch simdlen(2) for some function “func” means what the name of vector version is: _ZGVbN2v_func (it is SSE4 implementation). #pragma omp declare simd notinbranch simdlen(4) for some function “func” means what the following names are available: _ZGVcN4v_func (it is AVX implementation) and _ZGVdN4v_func (it is AVX2 implementation). Every vector function should be provided by math library for each supported ISA (currently SSE4, AVX and AVX2). Semantics of those pragmas are independent of the processor for which code is being generated. Those pragmas must not be interpreted as meaning version of other ISA of functions are available even if code is being built for a processor with such ISA support. Any future ABI extension that defines additional vector function versions will also define a different pragma to declare their availability. * Any feedback? -- WBR, Andrew
Re: gcc Digest 26 Dec 2014 16:51:42 -0000 Issue 7953
> From: Andrew Senkevich > To: GCC Mailing List , > openmp-...@dcs-maillist2.engr.illinois.edu, libc-alpha > > Cc: > Date: Fri, 26 Dec 2014 19:51:05 +0300 > Subject: OpenMP vector function ABI for x86_64 > Hi, > > during work on addition vector math functions to Glibc and discussions > with community was found an issue with meaning of “#pragma omp declare > simd” (which will appear in math.h). > > Issue is there are no working way to specify ISA of vector function > in GCC 5.0, and hence no way to determine exact vector function name. > > Here is description of exact meaning of “#pragma omp declare simd” for x86_64. > > This is proposed as agreement between compilers supporting OpenMP. > > *** OpenMP vector function ABI for x86_64 *** > > Name of vector math function is based on Intel Vector Function ABI > (http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf) > with a little difference in part of name specifying ISA – namely > letters b, c, d instead of x, y, Y. > > #pragma omp declare simd notinbranch simdlen(2) for some function > “func” means what the name of vector version is: > > _ZGVbN2v_func (it is SSE4 implementation). > > #pragma omp declare simd notinbranch simdlen(4) for some function > “func” means what the following names are available: > > _ZGVcN4v_func (it is AVX implementation) > and > _ZGVdN4v_func (it is AVX2 implementation). > > Every vector function should be provided by math library for each > supported ISA (currently SSE4, AVX and AVX2). > Semantics of those pragmas are independent of the processor for which > code is being generated. > Those pragmas must not be interpreted as meaning version of other ISA > of functions are available even if code is being built for a processor > with such ISA support. > Any future ABI extension that defines additional vector function > versions will also define a different pragma to declare their > availability. > > * > > Any feedback? Hi, Jakub, is this agreement OK? Consensus is required to commit x86_64 vector math functions by Glibc maintainer. -- WBR, Andrew
Re: gcc Digest 26 Dec 2014 16:51:42 -0000 Issue 7953
2015-01-12 19:46 GMT+03:00 Jakub Jelinek : > On Mon, Jan 12, 2015 at 07:38:10PM +0300, Andrew Senkevich wrote: >> > during work on addition vector math functions to Glibc and discussions >> > with community was found an issue with meaning of “#pragma omp declare >> > simd” (which will appear in math.h). >> > >> > Issue is there are no working way to specify ISA of vector function >> > in GCC 5.0, and hence no way to determine exact vector function name. >> > >> > Here is description of exact meaning of “#pragma omp declare simd” for >> > x86_64. >> > >> > This is proposed as agreement between compilers supporting OpenMP. >> > >> > *** OpenMP vector function ABI for x86_64 *** >> > >> > Name of vector math function is based on Intel Vector Function ABI >> > (http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf) >> > with a little difference in part of name specifying ISA – namely >> > letters b, c, d instead of x, y, Y. >> > >> > #pragma omp declare simd notinbranch simdlen(2) for some function >> > “func” means what the name of vector version is: >> > >> > _ZGVbN2v_func (it is SSE4 implementation). >> > >> > #pragma omp declare simd notinbranch simdlen(4) for some function >> > “func” means what the following names are available: >> > >> > _ZGVcN4v_func (it is AVX implementation) >> > and >> > _ZGVdN4v_func (it is AVX2 implementation). >> > >> > Every vector function should be provided by math library for each >> > supported ISA (currently SSE4, AVX and AVX2). >> > Semantics of those pragmas are independent of the processor for which >> > code is being generated. >> > Those pragmas must not be interpreted as meaning version of other ISA >> > of functions are available even if code is being built for a processor >> > with such ISA support. >> > Any future ABI extension that defines additional vector function >> > versions will also define a different pragma to declare their >> > availability. >> > >> > * >> > >> > Any feedback? >> >> is this agreement OK? >> >> Consensus is required to commit x86_64 vector math functions by Glibc >> maintainer. > > With the difference that b stands for SSE2, not SSE4, and the fact > that those functions do not use the __regcall calling conventions, but > normal psABI calling conventions after replacing the arguments/return values > with the vectors documented in the 0.9.5 pdf (and/or adding the vector mask > arg) it describes what has been implemented, yes. But which name use for SSE4? Gcc generates the same as for SSE2, and we now have SSE4 implementations. -- WBR, Andrew
Re: gcc Digest 26 Dec 2014 16:51:42 -0000 Issue 7953
2015-01-13 14:28 GMT+03:00 Jakub Jelinek : > On Tue, Jan 13, 2015 at 02:14:30PM +0300, Andrew Senkevich wrote: >> >> Consensus is required to commit x86_64 vector math functions by Glibc >> >> maintainer. >> > >> > With the difference that b stands for SSE2, not SSE4, and the fact >> > that those functions do not use the __regcall calling conventions, but >> > normal psABI calling conventions after replacing the arguments/return >> > values >> > with the vectors documented in the 0.9.5 pdf (and/or adding the vector mask >> > arg) it describes what has been implemented, yes. >> >> But which name use for SSE4? >> Gcc generates the same as for SSE2, and we now have SSE4 implementations. > > You probably need to use IFUNC for that. The problem is that the > _Z*b* symbol can be called even in code that requires only SSE2 HW, so you > can't assume that because somebody called you through this symbol you have > SSE4 available. You know you have at least SSE2 or higher available. Ok, we can add SSE2 implementations and IFUNC selection and reflect it in agreement also. -- WBR, Andrew
Re: [RFC] Support register groups in inline asm
2016-12-05 16:31 GMT+01:00 Andrew Senkevich : > 2016-11-16 8:02 GMT+03:00 Andrew Pinski : >> On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich >> wrote: >>> Hi, >>> >>> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use >>> of register groups. >>> >>> To support register groups feature in inline asm needed some extension >>> with new constraints. >>> >>> Current proposal is the following syntax: >>> >>> __asm__ (“SMTH %[group], %[single]" : >>> [single] >>> "+x"(v0) : >>> [group] >>> "Yg4"(v1), “1+1"(v2), “1+2"(v3), “1+3"(v4)); >>> >>> where "YgN" constraint specifies group of N consecutive registers >>> (which is started from register having number as "0 mod >>> 2^ceil(log2(N))"), >>> and "1+K" specifies the next registers in the group. >>> >>> Is this syntax ok? How to implement it? >> >> >> Have you looked into how AARCH64 back-end handles this via OI, etc. >> Like: >> /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments. */ >> INT_MODE (OI, 32); >> >> /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon d-registers >>(2 d-regs = 1 q-reg = TImode). */ >> INT_MODE (CI, 48); >> INT_MODE (XI, 64); >> >> >> And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook? >> And the x2 types are defined as a struct of an array like: >> typedef struct int8x8x2_t >> { >> int8x8_t val[2]; >> } int8x8x2_t; > > Thanks! > > We have to update proposal with changing "+" symbol to "#" specifying > offset in a group (to avoid overloading the other meaning of “+” > specifying that operand is both input and output). > > So current proposal of syntax is: > > __asm__ (“INSTR %[group], %[single]" : > [single] "+x"(v0) > : > [group] > "Yg4"(v1), “1#1"(v2), “1#2"(v3), “1#3"(v4)); > > where "YgN" constraint specifies group of N consecutive registers > (which is started from register having number as "0 mod 2^ceil(log2(N))"), > and "1#K" specifies the next registers in the group. > > Some other questions or comments? > > What about consensus on this syntax? Hi Richard! Can we have agreement on this syntax, what do you think? -- WBR, Andrew
Re: [RFC] Support register groups in inline asm
2017-03-16 9:50 GMT+01:00 Richard Biener : > On Wed, 15 Mar 2017, Andrew Senkevich wrote: > >> 2016-12-05 16:31 GMT+01:00 Andrew Senkevich : >> > 2016-11-16 8:02 GMT+03:00 Andrew Pinski : >> >> On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich >> >> wrote: >> >>> Hi, >> >>> >> >>> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use >> >>> of register groups. >> >>> >> >>> To support register groups feature in inline asm needed some extension >> >>> with new constraints. >> >>> >> >>> Current proposal is the following syntax: >> >>> >> >>> __asm__ (“SMTH %[group], %[single]" : >> >>> [single] >> >>> "+x"(v0) : >> >>> [group] >> >>> "Yg4"(v1), “1+1"(v2), “1+2"(v3), “1+3"(v4)); >> >>> >> >>> where "YgN" constraint specifies group of N consecutive registers >> >>> (which is started from register having number as "0 mod >> >>> 2^ceil(log2(N))"), >> >>> and "1+K" specifies the next registers in the group. >> >>> >> >>> Is this syntax ok? How to implement it? >> >> >> >> >> >> Have you looked into how AARCH64 back-end handles this via OI, etc. >> >> Like: >> >> /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments. */ >> >> INT_MODE (OI, 32); >> >> >> >> /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon >> >> d-registers >> >>(2 d-regs = 1 q-reg = TImode). */ >> >> INT_MODE (CI, 48); >> >> INT_MODE (XI, 64); >> >> >> >> >> >> And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook? >> >> And the x2 types are defined as a struct of an array like: >> >> typedef struct int8x8x2_t >> >> { >> >> int8x8_t val[2]; >> >> } int8x8x2_t; >> > >> > Thanks! >> > >> > We have to update proposal with changing "+" symbol to "#" specifying >> > offset in a group (to avoid overloading the other meaning of “+” >> > specifying that operand is both input and output). >> > >> > So current proposal of syntax is: >> > >> > __asm__ (“INSTR %[group], %[single]" : >> > [single] >> > "+x"(v0) : >> > [group] >> > "Yg4"(v1), “1#1"(v2), “1#2"(v3), “1#3"(v4)); >> > >> > where "YgN" constraint specifies group of N consecutive registers >> > (which is started from register having number as "0 mod 2^ceil(log2(N))"), >> > and "1#K" specifies the next registers in the group. >> > >> > Some other questions or comments? >> > >> > What about consensus on this syntax? >> >> Hi Richard! >> >> Can we have agreement on this syntax, what do you think? > > I have no expertise / opinion here. Hi Jeff, are you proper person to ask? -- WBR, Andrew
Re: Can I use -Ofast without libmvec
2018-03-22 19:08 GMT+01:00 Steve Ellcey : > I have a question about the math vector library routines in libmvec. > If I compile a program on x86 with -Ofast, something like: > > void foo(double * __restrict x, double * __restrict y, double * __restrict z) > { > for (int i = 0; i < 1000; i++) x[i] = sin(y[i]); > } > > I get a call to the vector sin routine _ZGVbN2v_sin. That is fine, but > is there some way to compile with -Ofast and not use the libmvec vector > routines? I have tried -fopenmp, -fopenmp-simd, -fno-openmp, and -fno- > openmp-simd and I always get a call to _ZGVbN2v_sin. Is there anyway > to stop the use of the vectorized calls (without turning off -Ofast)? It looks you have Glibc version >= 2.23 and GCC >= 6.1? -fno-tree-loop-vectorize may help together with -fno-openmp for GCC >= 6.1. Or build your test agains Glibc built with disabled Libmvec. (some description of Libmvec is here - https://sourceware.org/glibc/wiki/libmvec) -- WBR, Andrew