[RFC] Formation of vector function name

2015-06-15 Thread Andrew Senkevich
Hi,

compilation with options -fopenmp -ffast-math -O1 -msse4

of the test

#pragma omp declare simd notinbranch simdlen(2)
extern double log (double);

int N = 3200;
double b[3200];
double a[3200];

int main (void)
{
  int i;

#pragma omp simd
  for (i = 0; i < N; i += 1)
  {
b[i] = log (a[i]);
  }

  return (0);
}

results in asm redirection for log to __log_finite and final vector
function name becomes _ZGVbN2v___log_finite.

With point of view from C Library side, it reflects in addition of asm
redirections _ZGVbN2v___log_finite = _ZGVbN2v_log in the headers.

May be the cleaner way is to base vector name on original name of
declaration, not asm declaration name (use DECL_NAME instead of
DECL_ASSEMBLER_NAME)?

According libc-alpha thread is
https://sourceware.org/ml/libc-alpha/2015-06/msg00213.html


--
WBR,
Andrew


[RFC] Difference between gcc and g++ drivers

2015-06-24 Thread Andrew Senkevich
Hi,

Glibc 2.22 will have libm.so implemented as linker script helping to
link as needed against vector math library libmvec.so without addition
of -lmvec (for not static builds). Another words -lm is enough to link
against libmvec.so.

But g++ driver inserts -lm for linker command, gcc griver not.

It gives different behavior of g++ and gcc (I mean Glibc 2.22
installed system-wide):
g++ -fopenmp -ffast-math -O1 cos.c (compiled, -lm passed to linker by driver)
gcc -fopenmp -ffast-math -O1 cos.c (linker error because of no -lm passed)
/tmp/cclVmUxv.o: In function `main':
cos.c:(.text+0x3a): undefined reference to `_ZGVbN2v_cos'
cos.c:(.text+0x6b): undefined reference to `cos'
collect2: error: ld returned 1 exit status

cos.c:
#include 

int N = 3200;
double b[3200];
double a[3200];

int main (void)
{
  int i;

  #pragma omp simd
  for (i = 0; i < N; i += 1)
  {
b[i] = cos (a[i]);
  }

  return (0);
}

For static builds with gcc needed to add both options with the
following order: -lmvec -lm.
For static builds with g++ needed to add only -lmvec.

Can anybody tell something about this difference in drivers?

May be needed fix for g++ driver to add also -lmvec if GLIBC version
>= 2.22 found on configure?

May be some other way needed to achieve similar drivers behavior?


--
WBR,
Andrew


Re: [RFC] Formation of vector function name

2016-02-20 Thread Andrew Senkevich
2015-06-16 17:23 GMT+03:00 Joseph Myers :
> On Mon, 15 Jun 2015, Andrew Pinski wrote:
>
>> > results in asm redirection for log to __log_finite and final vector
>> > function name becomes _ZGVbN2v___log_finite.
>> >
>> > With point of view from C Library side, it reflects in addition of asm
>> > redirections _ZGVbN2v___log_finite = _ZGVbN2v_log in the headers.
>> >
>> > May be the cleaner way is to base vector name on original name of
>> > declaration, not asm declaration name (use DECL_NAME instead of
>> > DECL_ASSEMBLER_NAME)?
>>
>>
>> I don't think this would be useful really because if you have a
>> function say logl where you have two options of long double, you want
>> to support both you would name one logl and the other logl128 and then
>> using DECL_ASSEMBLER_NAME to from the SIMD name would be useful to use
>> the one with logl128 in it rather than logl.
>
> The point is that the vector versions may not be in one-to-one
> correspondence with the scalar versions - you might have several different
> scalar versions depending on compiler options, all of which correspond to
> a single vector version.

Lets come to agreement to change vector function name formation in
future GCC versions?

Patch to fix it looks quite simple. I am going to regtest it.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..8d9de76 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -18687,7 +18687,7 @@ simd_clone_mangle (struct cgraph_node *node,
 }

   pp_underscore (&pp);
-  const char *str = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+  const char *str = IDENTIFIER_POINTER (DECL_NAME (node->decl));
   if (*str == '*')
 ++str;
   pp_string (&pp, str);



--
WBR,
Andrew


[RFC] Support register groups in inline asm

2016-11-15 Thread Andrew Senkevich
Hi,

new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use
of register groups.

To support register groups feature in inline asm needed some extension
with new constraints.

Current proposal is the following syntax:

__asm__ (“SMTH %[group], %[single]" :
[single] "+x"(v0) :
[group]
"Yg4"(v1),  “1+1"(v2), “1+2"(v3), “1+3"(v4));

where "YgN" constraint specifies group of N consecutive registers
(which is started from register having number as "0 mod
2^ceil(log2(N))"),
and "1+K" specifies the next registers in the group.

Is this syntax ok? How to implement it?

Any comments or proposals will be appreciated, thanks.


--
WBR,
Andrew


Re: [RFC] Support register groups in inline asm

2016-12-05 Thread Andrew Senkevich
2016-11-16 8:02 GMT+03:00 Andrew Pinski :
> On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich
>  wrote:
>> Hi,
>>
>> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use
>> of register groups.
>>
>> To support register groups feature in inline asm needed some extension
>> with new constraints.
>>
>> Current proposal is the following syntax:
>>
>> __asm__ (“SMTH %[group], %[single]" :
>> [single] 
>> "+x"(v0) :
>> [group]
>> "Yg4"(v1),  “1+1"(v2), “1+2"(v3), “1+3"(v4));
>>
>> where "YgN" constraint specifies group of N consecutive registers
>> (which is started from register having number as "0 mod
>> 2^ceil(log2(N))"),
>> and "1+K" specifies the next registers in the group.
>>
>> Is this syntax ok? How to implement it?
>
>
> Have you looked into how AARCH64 back-end handles this via OI, etc.
> Like:
> /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
> INT_MODE (OI, 32);
>
> /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon d-registers
>(2 d-regs = 1 q-reg = TImode).  */
> INT_MODE (CI, 48);
> INT_MODE (XI, 64);
>
>
> And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook?
> And the x2 types are defined as a struct of an array like:
> typedef struct int8x8x2_t
> {
>   int8x8_t val[2];
> } int8x8x2_t;

Thanks!

We have to update proposal with changing "+" symbol to "#" specifying
offset in a group (to avoid overloading the other meaning of “+”
specifying that operand is both input and output).

So current proposal of syntax is:

__asm__ (“INSTR %[group], %[single]" :
[single] "+x"(v0) :
[group]
"Yg4"(v1),  “1#1"(v2), “1#2"(v3), “1#3"(v4));

where "YgN" constraint specifies group of N consecutive registers
(which is started from register having number as "0 mod 2^ceil(log2(N))"),
and "1#K" specifies the next registers in the group.

Some other questions or comments?

What about consensus on this syntax?


--
WBR,
Andrew


Re: Getting a build failure in glibc due to gcc changes on 32bit x86 glibc

2014-11-27 Thread Andrew Senkevich
On Wed, Nov 26, 2014 at 02:59:34PM -0800, Andrew Pinski wrote:
> This looks like the same issue as I reported before about
> check_consistency() since that is what is failing to assemble here
> too.

This is also fixed by this patch -
https://sourceware.org/ml/libc-alpha/2014-10/msg00746.html


--
WBR,
Andrew


OpenMP vector function ABI for x86_64

2014-12-26 Thread Andrew Senkevich
­Hi,

during work on addition vector math functions to Glibc and discussions
with community was found an issue with meaning of “#pragma omp declare
simd” (which will appear in math.h).

Issue is there are no working way­ to specify ISA of vector function
in GCC 5.0, and hence no way to determine exact vector function name.

Here is description of exact meaning of “#pragma omp declare simd” for x86_64.

This is proposed as agreement between compilers supporting OpenMP.

*** OpenMP vector function ABI for x86_64 ***

Name of vector math function is based on Intel Vector Function ABI
(http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf)
with a little difference in part of name specifying ISA – namely
letters b, c, d instead of x, y, Y.

#pragma omp declare simd notinbranch simdlen(2) for some function
“func” means what the name of vector version is:

_ZGVbN2v_func (it is SSE4 implementation).

#pragma omp declare simd notinbranch simdlen(4) for some function
“func” means what the following names are available:

 _ZGVcN4v_func (it is AVX implementation)
and
_ZGVdN4v_func (it is AVX2 implementation).

Every vector function should be provided by math library for each
supported ISA (currently SSE4, AVX and AVX2).
Semantics of those pragmas are independent of the processor for which
code is being generated.
Those pragmas must not be interpreted as meaning version of other ISA
of functions are available even if code is being built for a processor
with such ISA support.
Any future ABI extension that defines additional vector function
versions will also define a different pragma to declare their
availability.

*

Any feedback?


--
WBR,
Andrew


Re: gcc Digest 26 Dec 2014 16:51:42 -0000 Issue 7953

2015-01-12 Thread Andrew Senkevich
> From: Andrew Senkevich 
> To: GCC Mailing List , 
> openmp-...@dcs-maillist2.engr.illinois.edu, libc-alpha 
> 
> Cc:
> Date: Fri, 26 Dec 2014 19:51:05 +0300
> Subject: OpenMP vector function ABI for x86_64
> ­Hi,
>
> during work on addition vector math functions to Glibc and discussions
> with community was found an issue with meaning of “#pragma omp declare
> simd” (which will appear in math.h).
>
> Issue is there are no working way­ to specify ISA of vector function
> in GCC 5.0, and hence no way to determine exact vector function name.
>
> Here is description of exact meaning of “#pragma omp declare simd” for x86_64.
>
> This is proposed as agreement between compilers supporting OpenMP.
>
> *** OpenMP vector function ABI for x86_64 ***
>
> Name of vector math function is based on Intel Vector Function ABI
> (http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf)
> with a little difference in part of name specifying ISA – namely
> letters b, c, d instead of x, y, Y.
>
> #pragma omp declare simd notinbranch simdlen(2) for some function
> “func” means what the name of vector version is:
>
> _ZGVbN2v_func (it is SSE4 implementation).
>
> #pragma omp declare simd notinbranch simdlen(4) for some function
> “func” means what the following names are available:
>
>  _ZGVcN4v_func (it is AVX implementation)
> and
> _ZGVdN4v_func (it is AVX2 implementation).
>
> Every vector function should be provided by math library for each
> supported ISA (currently SSE4, AVX and AVX2).
> Semantics of those pragmas are independent of the processor for which
> code is being generated.
> Those pragmas must not be interpreted as meaning version of other ISA
> of functions are available even if code is being built for a processor
> with such ISA support.
> Any future ABI extension that defines additional vector function
> versions will also define a different pragma to declare their
> availability.
>
> *
>
> Any feedback?

Hi, Jakub,

is this agreement OK?

Consensus is required to commit x86_64 vector math functions by Glibc
maintainer.


--
WBR,
Andrew


Re: gcc Digest 26 Dec 2014 16:51:42 -0000 Issue 7953

2015-01-13 Thread Andrew Senkevich
2015-01-12 19:46 GMT+03:00 Jakub Jelinek :
> On Mon, Jan 12, 2015 at 07:38:10PM +0300, Andrew Senkevich wrote:
>> > during work on addition vector math functions to Glibc and discussions
>> > with community was found an issue with meaning of “#pragma omp declare
>> > simd” (which will appear in math.h).
>> >
>> > Issue is there are no working way­ to specify ISA of vector function
>> > in GCC 5.0, and hence no way to determine exact vector function name.
>> >
>> > Here is description of exact meaning of “#pragma omp declare simd” for 
>> > x86_64.
>> >
>> > This is proposed as agreement between compilers supporting OpenMP.
>> >
>> > *** OpenMP vector function ABI for x86_64 ***
>> >
>> > Name of vector math function is based on Intel Vector Function ABI
>> > (http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf)
>> > with a little difference in part of name specifying ISA – namely
>> > letters b, c, d instead of x, y, Y.
>> >
>> > #pragma omp declare simd notinbranch simdlen(2) for some function
>> > “func” means what the name of vector version is:
>> >
>> > _ZGVbN2v_func (it is SSE4 implementation).
>> >
>> > #pragma omp declare simd notinbranch simdlen(4) for some function
>> > “func” means what the following names are available:
>> >
>> >  _ZGVcN4v_func (it is AVX implementation)
>> > and
>> > _ZGVdN4v_func (it is AVX2 implementation).
>> >
>> > Every vector function should be provided by math library for each
>> > supported ISA (currently SSE4, AVX and AVX2).
>> > Semantics of those pragmas are independent of the processor for which
>> > code is being generated.
>> > Those pragmas must not be interpreted as meaning version of other ISA
>> > of functions are available even if code is being built for a processor
>> > with such ISA support.
>> > Any future ABI extension that defines additional vector function
>> > versions will also define a different pragma to declare their
>> > availability.
>> >
>> > *
>> >
>> > Any feedback?
>>
>> is this agreement OK?
>>
>> Consensus is required to commit x86_64 vector math functions by Glibc
>> maintainer.
>
> With the difference that b stands for SSE2, not SSE4, and the fact
> that those functions do not use the __regcall calling conventions, but
> normal psABI calling conventions after replacing the arguments/return values
> with the vectors documented in the 0.9.5 pdf (and/or adding the vector mask
> arg) it describes what has been implemented, yes.

But which name use for SSE4?
Gcc generates the same as for SSE2, and we now have SSE4 implementations.


--
WBR,
Andrew


Re: gcc Digest 26 Dec 2014 16:51:42 -0000 Issue 7953

2015-01-13 Thread Andrew Senkevich
2015-01-13 14:28 GMT+03:00 Jakub Jelinek :
> On Tue, Jan 13, 2015 at 02:14:30PM +0300, Andrew Senkevich wrote:
>> >> Consensus is required to commit x86_64 vector math functions by Glibc
>> >> maintainer.
>> >
>> > With the difference that b stands for SSE2, not SSE4, and the fact
>> > that those functions do not use the __regcall calling conventions, but
>> > normal psABI calling conventions after replacing the arguments/return 
>> > values
>> > with the vectors documented in the 0.9.5 pdf (and/or adding the vector mask
>> > arg) it describes what has been implemented, yes.
>>
>> But which name use for SSE4?
>> Gcc generates the same as for SSE2, and we now have SSE4 implementations.
>
> You probably need to use IFUNC for that.  The problem is that the
> _Z*b* symbol can be called even in code that requires only SSE2 HW, so you
> can't assume that because somebody called you through this symbol you have
> SSE4 available.  You know you have at least SSE2 or higher available.

Ok, we can add SSE2 implementations and IFUNC selection and reflect it
in agreement also.


--
WBR,
Andrew


Re: [RFC] Support register groups in inline asm

2017-03-15 Thread Andrew Senkevich
2016-12-05 16:31 GMT+01:00 Andrew Senkevich :
> 2016-11-16 8:02 GMT+03:00 Andrew Pinski :
>> On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich
>>  wrote:
>>> Hi,
>>>
>>> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use
>>> of register groups.
>>>
>>> To support register groups feature in inline asm needed some extension
>>> with new constraints.
>>>
>>> Current proposal is the following syntax:
>>>
>>> __asm__ (“SMTH %[group], %[single]" :
>>> [single] 
>>> "+x"(v0) :
>>> [group]
>>> "Yg4"(v1),  “1+1"(v2), “1+2"(v3), “1+3"(v4));
>>>
>>> where "YgN" constraint specifies group of N consecutive registers
>>> (which is started from register having number as "0 mod
>>> 2^ceil(log2(N))"),
>>> and "1+K" specifies the next registers in the group.
>>>
>>> Is this syntax ok? How to implement it?
>>
>>
>> Have you looked into how AARCH64 back-end handles this via OI, etc.
>> Like:
>> /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
>> INT_MODE (OI, 32);
>>
>> /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon d-registers
>>(2 d-regs = 1 q-reg = TImode).  */
>> INT_MODE (CI, 48);
>> INT_MODE (XI, 64);
>>
>>
>> And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook?
>> And the x2 types are defined as a struct of an array like:
>> typedef struct int8x8x2_t
>> {
>>   int8x8_t val[2];
>> } int8x8x2_t;
>
> Thanks!
>
> We have to update proposal with changing "+" symbol to "#" specifying
> offset in a group (to avoid overloading the other meaning of “+”
> specifying that operand is both input and output).
>
> So current proposal of syntax is:
>
> __asm__ (“INSTR %[group], %[single]" :
> [single] "+x"(v0) 
> :
> [group]
> "Yg4"(v1),  “1#1"(v2), “1#2"(v3), “1#3"(v4));
>
> where "YgN" constraint specifies group of N consecutive registers
> (which is started from register having number as "0 mod 2^ceil(log2(N))"),
> and "1#K" specifies the next registers in the group.
>
> Some other questions or comments?
>
> What about consensus on this syntax?

Hi Richard!

Can we have agreement on this syntax, what do you think?


--
WBR,
Andrew


Re: [RFC] Support register groups in inline asm

2017-03-16 Thread Andrew Senkevich
2017-03-16 9:50 GMT+01:00 Richard Biener :
> On Wed, 15 Mar 2017, Andrew Senkevich wrote:
>
>> 2016-12-05 16:31 GMT+01:00 Andrew Senkevich :
>> > 2016-11-16 8:02 GMT+03:00 Andrew Pinski :
>> >> On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich
>> >>  wrote:
>> >>> Hi,
>> >>>
>> >>> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use
>> >>> of register groups.
>> >>>
>> >>> To support register groups feature in inline asm needed some extension
>> >>> with new constraints.
>> >>>
>> >>> Current proposal is the following syntax:
>> >>>
>> >>> __asm__ (“SMTH %[group], %[single]" :
>> >>> [single] 
>> >>> "+x"(v0) :
>> >>> [group]
>> >>> "Yg4"(v1),  “1+1"(v2), “1+2"(v3), “1+3"(v4));
>> >>>
>> >>> where "YgN" constraint specifies group of N consecutive registers
>> >>> (which is started from register having number as "0 mod
>> >>> 2^ceil(log2(N))"),
>> >>> and "1+K" specifies the next registers in the group.
>> >>>
>> >>> Is this syntax ok? How to implement it?
>> >>
>> >>
>> >> Have you looked into how AARCH64 back-end handles this via OI, etc.
>> >> Like:
>> >> /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
>> >> INT_MODE (OI, 32);
>> >>
>> >> /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon 
>> >> d-registers
>> >>(2 d-regs = 1 q-reg = TImode).  */
>> >> INT_MODE (CI, 48);
>> >> INT_MODE (XI, 64);
>> >>
>> >>
>> >> And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook?
>> >> And the x2 types are defined as a struct of an array like:
>> >> typedef struct int8x8x2_t
>> >> {
>> >>   int8x8_t val[2];
>> >> } int8x8x2_t;
>> >
>> > Thanks!
>> >
>> > We have to update proposal with changing "+" symbol to "#" specifying
>> > offset in a group (to avoid overloading the other meaning of “+”
>> > specifying that operand is both input and output).
>> >
>> > So current proposal of syntax is:
>> >
>> > __asm__ (“INSTR %[group], %[single]" :
>> > [single] 
>> > "+x"(v0) :
>> > [group]
>> > "Yg4"(v1),  “1#1"(v2), “1#2"(v3), “1#3"(v4));
>> >
>> > where "YgN" constraint specifies group of N consecutive registers
>> > (which is started from register having number as "0 mod 2^ceil(log2(N))"),
>> > and "1#K" specifies the next registers in the group.
>> >
>> > Some other questions or comments?
>> >
>> > What about consensus on this syntax?
>>
>> Hi Richard!
>>
>> Can we have agreement on this syntax, what do you think?
>
> I have no expertise / opinion here.

Hi Jeff, are you proper person to ask?


--
WBR,
Andrew


Re: Can I use -Ofast without libmvec

2018-03-22 Thread Andrew Senkevich
2018-03-22 19:08 GMT+01:00 Steve Ellcey :
> I have a question about the math vector library routines in libmvec.
> If I compile a program on x86 with -Ofast, something like:
>
> void foo(double * __restrict x, double * __restrict y, double * __restrict z)
> {
> for (int i = 0; i < 1000; i++) x[i] = sin(y[i]);
> }
>
> I get a call to the vector sin routine _ZGVbN2v_sin.  That is fine, but
> is there some way to compile with -Ofast and not use the libmvec vector
> routines?  I have tried -fopenmp, -fopenmp-simd, -fno-openmp, and -fno-
> openmp-simd and I always get a call to _ZGVbN2v_sin.  Is there anyway
> to stop the use of the vectorized calls (without turning off -Ofast)?

It looks you have Glibc version >= 2.23 and GCC >= 6.1?
-fno-tree-loop-vectorize may help together with -fno-openmp for GCC >= 6.1.

Or build your test agains Glibc built with disabled Libmvec.

(some description of Libmvec is here -
https://sourceware.org/glibc/wiki/libmvec)


--
WBR,
Andrew