[RFC, VECTOR ABI] Allow __attribute__((vector)) in GCC by default.

2015-10-05 Thread Kirill Yukhin
Hello, Recently vector ABI was introduced into GCC Vector versions of math functions were incorporated in to GlibC starting from v2.22. Unfortunately, to get this functions work `-fopenmp' switch must be added to compiler invocation. This is due to the fact that vector variant of math functions gen

Offload Library

2014-05-16 Thread Kirill Yukhin
Dear steering committee, To support the offloading features for Intel's Xeon Phi cards we need to add a foreign library (liboffload) into the gcc repository. README with build instructions is attached. I am also copy-pasting the header comment from one of the liboffload files. The header

Re: Offload Library

2014-05-19 Thread Kirill Yukhin
Hello Ian, On 16 May 07:07, Ian Lance Taylor wrote: > On Fri, May 16, 2014 at 4:47 AM, Kirill Yukhin > wrote: > > > > To support the offloading features for Intel's Xeon Phi cards > > we need to add a foreign library (liboffload) into the gcc repo

Re: Offload Library

2014-05-19 Thread Kirill Yukhin
Hello, Thomas! On 16 May 19:30, Thomas Schwinge wrote: > On Fri, 16 May 2014 15:47:58 +0400, Kirill Yukhin > wrote: > > To support the offloading features for Intel's Xeon Phi cards > > we need to add a foreign library (liboffload) into the gcc repository. >

Re: Offload Library

2014-05-26 Thread Kirill Yukhin
Hello, On 19 May 16:53, Kirill Yukhin wrote: > Hello Ian, > On 16 May 07:07, Ian Lance Taylor wrote: > > On Fri, May 16, 2014 at 4:47 AM, Kirill Yukhin > > wrote: > > > > > > To support the offloading features for Intel's Xeon Phi cards > >

Re: Offload Library

2014-06-24 Thread Kirill Yukhin
Hello David, On 20 Jun 14:46, David Edelsohn wrote: > On Fri, May 16, 2014 at 7:47 AM, Kirill Yukhin > wrote: > > Does this look OK? > > The GCC SC has decided to allow this library in the GCC sources. Great news, thanks! > If the library is not going to be expanded to

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Kirill Yukhin
Hello, Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to main trunk. Thanks everybody who helped w/ development and review. -- Thanks, K

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Kirill Yukhin
Hi Tobias, On 13 Nov 16:15, Tobias Burnus wrote: > Kirill Yukhin wrote: > > Support of OpenMP 4.0 offloading to future Xeon Phi was > > fully checked in to main trunk. > > Thanks. If I understood it correctly: > > * GCC 5 supports code generation for Xeon Phi (

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Kirill Yukhin
Hi Tobias, On 13 Nov 16:15, Tobias Burnus wrote: > Kirill Yukhin wrote: > > Support of OpenMP 4.0 offloading to future Xeon Phi was > > fully checked in to main trunk. > > Thanks. If I understood it correctly: > > * GCC 5 supports code generation for Xeon Phi (

Re: Offloading GSOC 2015

2015-03-20 Thread Kirill Yukhin
Hello Güray, On 20 Mar 12:14, guray ozen wrote: > I've started to prepare my gsoc proposal for gcc's openmp for gpus. I think that here is wide range for exploration. As you know, OpenMP 4 contains vectorization pragmas (`pragma omp simd') which not perfectly suites for GPGPU. Another problem is

Re: loading of zeros into {x,y,z}mm registers

2017-11-30 Thread Kirill Yukhin
Hello Jan, On 29 Nov 08:59, Jan Beulich wrote: > Kirill, > > in an unrelated context I've stumbled across a change of yours > from Aug 2014 (revision 213847) where you "extend" the ways > of loading zeros into registers. I don't understand why this was > done, and the patch submission mail also do

Re: loading of zeros into {x,y,z}mm registers

2017-12-01 Thread Kirill Yukhin
Hello Richard, On 01 Dec 12:44, Richard Biener wrote: > On Fri, Dec 1, 2017 at 6:45 AM, Kirill Yukhin wrote: > > Hello Jan, > > On 29 Nov 08:59, Jan Beulich wrote: > >> Kirill, > >> > >> in an unrelated context I've stumbled across a change of yours

Defining constraint for registers tuple

2011-07-29 Thread Kirill Yukhin
Hi guys, I'm working on implementation of `mulx` (which is part of BMI2). One of improvements compared generic `mul` is that it allows to specify destination registers. For `mul` we have `A` constraint, which stands for AX:DX pair. So, is there a possibility to relax such cinstraint and allow any p

Re: Defining constraint for registers tuple

2011-08-01 Thread Kirill Yukhin
> Don't change the constraint, just add an alternative.  Or use a > different insn with an insn predicate. This is misunderstanding beacuse of my great English :) I am not going to update existing constraint. I am going to implement new one. Actually, I am looking for some expample, where similar

Re: Defining constraint for registers tuple

2011-08-16 Thread Kirill Yukhin
quot; "imul") (set_attr "length_immediate" "0") (set_attr "mode" "")]) Maybe there is examples from other ports? Any help is appreciated Thanks, K On Mon, Aug 1, 2011 at 4:28 PM, Kirill Yukhin wrote: >> Don't change the constraint, just

Re: define_split for specific split pass

2011-08-16 Thread Kirill Yukhin
I think, Ilya, wants to run his pass, say, in 208r.split4 only. Seems both split2, split3 and split4 all run under `reload_complete` set to true. Any ideas? -- Thanks, K On Tue, Aug 16, 2011 at 8:47 PM, Andrew Pinski wrote: > On Tue, Aug 16, 2011 at 6:32 AM, Ilya Enkovich wrote: >> Hello, >> >

Re: Defining constraint for registers tuple

2011-08-16 Thread Kirill Yukhin
That is exactly it! Thank you very much! BMI2 support is almost here :) -- K On Tue, Aug 16, 2011 at 6:58 PM, Richard Henderson wrote: > On 08/16/2011 04:20 AM, Kirill Yukhin wrote: >> Hi guys, >> the question is still opened. Let me try to explain further. >> >>

GCC testting infrastructure issue

2011-09-28 Thread Kirill Yukhin
Hi folks, I have a question. For DejaGNU we have only one option for each test. It may be e.g. either "dg-do" compile or "dg-do run". This is really not as suitable For instance, we cheking some new instructio autogeneration. We have to do 2 tests: 1. We have to write some routine which will co

Re: GCC testting infrastructure issue

2011-09-28 Thread Kirill Yukhin
Thanks a lot. That is exactly what I was looking for! K On Wed, Sep 28, 2011 at 2:49 PM, Richard Guenther wrote: > On Wed, Sep 28, 2011 at 12:18 PM, Kirill Yukhin > wrote: >> Hi folks, >> I have a question. For DejaGNU we have only one option for each test. >> >>

Re: _mm{,256}_i{32,64}gather_{ps,pd,epi32,epi64} intrinsics semantics

2011-11-02 Thread Kirill Yukhin
Hi Jakub, Actually I did not get the point. If we have no src/masking, destination must be unchanged until gather will write to it (at least partially) If we have all 1's in mask, scr must not be changed at all. So, nullification in intrinsics just useless. Having such snippet: (1) vmovdqa k(

Re: _mm{,256}_i{32,64}gather_{ps,pd,epi32,epi64} intrinsics semantics

2011-11-03 Thread Kirill Yukhin
> %ymm0 is all ones (this is code from the auto-vectorization). > (2) is not useless, %ymm6 contains the mask, for auto-vectorization > (3) is useless, it is there just because the current gather insn patterns > always use the previous value of the destination register. Sure, I am constantly mix In

Re: _mm{,256}_i{32,64}gather_{ps,pd,epi32,epi64} intrinsics semantics

2011-11-05 Thread Kirill Yukhin
Hello Jakub, I've talked to our engineers, who work on vectorization in ICC They all said, "yes you can optimize vpxor out both in f1 and f2" Thanks, K

Vectorizer question: DIV to RSHIFT conversion

2011-12-13 Thread Kirill Yukhin
Hi guys, While looking at Spec2006/401.bzip2 I found such a loop: for (i = 1; i <= alphaSize; i++) { j = weight[i] >> 8; j = 1 + (j / 2); weight[i] = j << 8; } Which is not vectorizeble (using Intel's AVX2) because division by two is not recognized as rshift: 5: ==> exa

Re: Vectorizer question: DIV to RSHIFT conversion

2011-12-13 Thread Kirill Yukhin
The full case attached. Jakub, you are right, we have to convert signed ints into something a bit more tricky. BTW, here is output for that cases from Intel compiler: vpxor %ymm1, %ymm1, %ymm1 #184.23 vmovdqu .L_2il0floatpacket.12(%rip), %ymm0

Re: Vectorizer question: DIV to RSHIFT conversion

2011-12-13 Thread Kirill Yukhin
Great! Thanks, K > > Let me hack up a quick pattern recognizer for this... > >        Jakub

Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-30 Thread Kirill Yukhin
On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote: > On 07/24/2013 05:23 AM, Richard Biener wrote: > > "H.J. Lu" wrote: > > > >> Hi, > >> > >> Here is a patch to extend x86-64 psABI to support AVX-512: > > > > Afaik avx 512 doubles the amount of xmm registers. Can we get them cal

Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-08-02 Thread Kirill Yukhin
On 30 Jul 17:55, Kirill Yukhin wrote: > On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote: > > On 07/24/2013 05:23 AM, Richard Biener wrote: > > > "H.J. Lu" wrote: > > > > > >> Hi, > > >> > > >> Here is a pa

Re: [RFC] Offloading Support in libgomp

2013-09-13 Thread Kirill Yukhin
Hello, Adding Richard who might want to take a look at LTO stuff. -- Thanks, K

[gomp4] GOMP_target fall back execution

2013-09-18 Thread Kirill Yukhin
Hello, It seems that currently GOMP_target perform call to host variant of the routine: void GOMP_target (int device, void (*fn) (void *), const char *fnname, size_t mapnum, void **hostaddrs, size_t *sizes, unsigned char *kinds) { device = resolve_device (device); if

[gomp4] Building binaries for offload.

2013-10-15 Thread Kirill Yukhin
Hello, Let me somewhat summarize current understanding of host binary linking as well as target binary building/linking. We put code which supposed to be offloaded to dedicated sections, with name starting with gnu.target_lto_ At link time (I mean, link time of host app): 1. Generate dedicated

setjmp () detection in RTL

2013-02-14 Thread Kirill Yukhin
Hi, Could anybody pls advise, if I can detect that given RTL `call` is actually a setjmp ()? I see no references in dump... (call_insn 6 5 7 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:DI ("_setjmp") [flags 0x41] ) [0 _setjmp S1 A8]) (const_int 0 [0]))) 4.c:17 -1 (expr_li

Re: setjmp () detection in RTL

2013-02-14 Thread Kirill Yukhin
> Isn't the REG_SETJMP note sufficient for this purpose? Yeah, missed that. Sorry for flood. Thanks a lot!