Hello,
Recently vector ABI was introduced into GCC
Vector versions of math functions were incorporated in to GlibC
starting from v2.22.
Unfortunately, to get this functions work `-fopenmp'
switch must be added to compiler invocation. This is due to the fact that
vector variant of math functions gen
Dear steering committee,
To support the offloading features for Intel's Xeon Phi cards
we need to add a foreign library (liboffload) into the gcc repository.
README with build instructions is attached.
I am also copy-pasting the header comment from one of the liboffload files.
The header
Hello Ian,
On 16 May 07:07, Ian Lance Taylor wrote:
> On Fri, May 16, 2014 at 4:47 AM, Kirill Yukhin
> wrote:
> >
> > To support the offloading features for Intel's Xeon Phi cards
> > we need to add a foreign library (liboffload) into the gcc repo
Hello, Thomas!
On 16 May 19:30, Thomas Schwinge wrote:
> On Fri, 16 May 2014 15:47:58 +0400, Kirill Yukhin
> wrote:
> > To support the offloading features for Intel's Xeon Phi cards
> > we need to add a foreign library (liboffload) into the gcc repository.
>
Hello,
On 19 May 16:53, Kirill Yukhin wrote:
> Hello Ian,
> On 16 May 07:07, Ian Lance Taylor wrote:
> > On Fri, May 16, 2014 at 4:47 AM, Kirill Yukhin
> > wrote:
> > >
> > > To support the offloading features for Intel's Xeon Phi cards
> >
Hello David,
On 20 Jun 14:46, David Edelsohn wrote:
> On Fri, May 16, 2014 at 7:47 AM, Kirill Yukhin
> wrote:
> > Does this look OK?
>
> The GCC SC has decided to allow this library in the GCC sources.
Great news, thanks!
> If the library is not going to be expanded to
Hello,
Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to main
trunk.
Thanks everybody who helped w/ development and review.
--
Thanks, K
Hi Tobias,
On 13 Nov 16:15, Tobias Burnus wrote:
> Kirill Yukhin wrote:
> > Support of OpenMP 4.0 offloading to future Xeon Phi was
> > fully checked in to main trunk.
>
> Thanks. If I understood it correctly:
>
> * GCC 5 supports code generation for Xeon Phi (
Hi Tobias,
On 13 Nov 16:15, Tobias Burnus wrote:
> Kirill Yukhin wrote:
> > Support of OpenMP 4.0 offloading to future Xeon Phi was
> > fully checked in to main trunk.
>
> Thanks. If I understood it correctly:
>
> * GCC 5 supports code generation for Xeon Phi (
Hello Güray,
On 20 Mar 12:14, guray ozen wrote:
> I've started to prepare my gsoc proposal for gcc's openmp for gpus.
I think that here is wide range for exploration. As you know, OpenMP 4
contains vectorization pragmas (`pragma omp simd') which not perfectly
suites for GPGPU.
Another problem is
Hello Jan,
On 29 Nov 08:59, Jan Beulich wrote:
> Kirill,
>
> in an unrelated context I've stumbled across a change of yours
> from Aug 2014 (revision 213847) where you "extend" the ways
> of loading zeros into registers. I don't understand why this was
> done, and the patch submission mail also do
Hello Richard,
On 01 Dec 12:44, Richard Biener wrote:
> On Fri, Dec 1, 2017 at 6:45 AM, Kirill Yukhin wrote:
> > Hello Jan,
> > On 29 Nov 08:59, Jan Beulich wrote:
> >> Kirill,
> >>
> >> in an unrelated context I've stumbled across a change of yours
Hi guys,
I'm working on implementation of `mulx` (which is part of BMI2). One
of improvements compared generic `mul` is that it allows to specify
destination registers.
For `mul` we have `A` constraint, which stands for AX:DX pair.
So, is there a possibility to relax such cinstraint and allow any p
> Don't change the constraint, just add an alternative. Or use a
> different insn with an insn predicate.
This is misunderstanding beacuse of my great English :)
I am not going to update existing constraint. I am going to implement new one.
Actually, I am looking for some expample, where similar
quot; "imul")
(set_attr "length_immediate" "0")
(set_attr "mode" "")])
Maybe there is examples from other ports? Any help is appreciated
Thanks, K
On Mon, Aug 1, 2011 at 4:28 PM, Kirill Yukhin wrote:
>> Don't change the constraint, just
I think, Ilya, wants to run his pass, say, in 208r.split4 only. Seems
both split2, split3 and split4 all run under `reload_complete` set to
true.
Any ideas?
--
Thanks, K
On Tue, Aug 16, 2011 at 8:47 PM, Andrew Pinski wrote:
> On Tue, Aug 16, 2011 at 6:32 AM, Ilya Enkovich wrote:
>> Hello,
>>
>
That is exactly it! Thank you very much!
BMI2 support is almost here :)
--
K
On Tue, Aug 16, 2011 at 6:58 PM, Richard Henderson wrote:
> On 08/16/2011 04:20 AM, Kirill Yukhin wrote:
>> Hi guys,
>> the question is still opened. Let me try to explain further.
>>
>>
Hi folks,
I have a question. For DejaGNU we have only one option for each test.
It may be e.g. either "dg-do" compile or "dg-do run". This is really
not as suitable
For instance, we cheking some new instructio autogeneration. We have
to do 2 tests:
1. We have to write some routine which will co
Thanks a lot. That is exactly what I was looking for!
K
On Wed, Sep 28, 2011 at 2:49 PM, Richard Guenther
wrote:
> On Wed, Sep 28, 2011 at 12:18 PM, Kirill Yukhin
> wrote:
>> Hi folks,
>> I have a question. For DejaGNU we have only one option for each test.
>>
>>
Hi Jakub,
Actually I did not get the point.
If we have no src/masking, destination must be unchanged until gather
will write to it (at least partially)
If we have all 1's in mask, scr must not be changed at all.
So, nullification in intrinsics just useless.
Having such snippet:
(1) vmovdqa k(
> %ymm0 is all ones (this is code from the auto-vectorization).
> (2) is not useless, %ymm6 contains the mask, for auto-vectorization
> (3) is useless, it is there just because the current gather insn patterns
> always use the previous value of the destination register.
Sure, I am constantly mix In
Hello Jakub,
I've talked to our engineers, who work on vectorization in ICC
They all said, "yes you can optimize vpxor out both in f1 and f2"
Thanks, K
Hi guys,
While looking at Spec2006/401.bzip2 I found such a loop:
for (i = 1; i <= alphaSize; i++) {
j = weight[i] >> 8;
j = 1 + (j / 2);
weight[i] = j << 8;
}
Which is not vectorizeble (using Intel's AVX2) because division by two
is not recognized as rshift:
5: ==> exa
The full case attached.
Jakub, you are right, we have to convert signed ints into something a
bit more tricky.
BTW, here is output for that cases from Intel compiler:
vpxor %ymm1, %ymm1, %ymm1 #184.23
vmovdqu .L_2il0floatpacket.12(%rip), %ymm0
Great!
Thanks, K
>
> Let me hack up a quick pattern recognizer for this...
>
> Jakub
On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
> On 07/24/2013 05:23 AM, Richard Biener wrote:
> > "H.J. Lu" wrote:
> >
> >> Hi,
> >>
> >> Here is a patch to extend x86-64 psABI to support AVX-512:
> >
> > Afaik avx 512 doubles the amount of xmm registers. Can we get them cal
On 30 Jul 17:55, Kirill Yukhin wrote:
> On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
> > On 07/24/2013 05:23 AM, Richard Biener wrote:
> > > "H.J. Lu" wrote:
> > >
> > >> Hi,
> > >>
> > >> Here is a pa
Hello,
Adding Richard who might want to take a look at LTO stuff.
--
Thanks, K
Hello,
It seems that currently GOMP_target perform call to host variant of the routine:
void
GOMP_target (int device, void (*fn) (void *), const char *fnname,
size_t mapnum, void **hostaddrs, size_t *sizes,
unsigned char *kinds)
{
device = resolve_device (device);
if
Hello,
Let me somewhat summarize current understanding of
host binary linking as well as target binary building/linking.
We put code which supposed to be offloaded to dedicated sections,
with name starting with gnu.target_lto_
At link time (I mean, link time of host app):
1. Generate dedicated
Hi,
Could anybody pls advise, if I can detect that given RTL `call` is
actually a setjmp ()?
I see no references in dump...
(call_insn 6 5 7 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("_setjmp") [flags 0x41]
) [0 _setjmp S1 A8])
(const_int 0 [0]))) 4.c:17 -1
(expr_li
> Isn't the REG_SETJMP note sufficient for this purpose?
Yeah, missed that. Sorry for flood. Thanks a lot!
32 matches
Mail list logo