[Bug tree-optimization/56741] New: Why not to perform 128-bit vector iteration when vectorizing loop by 256-bit

2013-03-26 Thread kirill.yukhin at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56741



 Bug #: 56741

   Summary: Why not to perform 128-bit vector iteration when

vectorizing loop by 256-bit

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: kirill.yuk...@intel.com





Created attachment 29730

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29730

Reproducer



Hi guys,

Suppse we vectorize loop with AVX[2].

E.g.:

do i=0..N-1, ++i

  stmt [i];

enddo



If vectorization is allowed & possible we'll have something like



rem = N % VL /* VL is vector length.  */

/* Vectorized loop.  */

do i=0..N-rem-1, i+=VL

  v_stmt [i..i+VL];

enddo



/* Remainder.  */

do j=0..rem, ++j

  stmt [j+i];

enddo



Remainder maybe unrolled, if allowed.



For 128-bit vectors, we have remainder of 3 for floats and 1 for doubles

maximum iterations.



For 256-bit vectors this number of iterations is 7 and 3 correspondingly.



Attached test shows 30% increase in instruction count because of loop remainder

maximum iterations count.



Why for AVX[2] not to add one iteration on 128-bit registers, having 3 and 1

iteration is remainder?



Like this (necessary checks are omitted):



rem_1 = N % VL1 /* VL1 is widest vector length - 256-bit.  */

/* Vectorized loop.  */

do i=0..N-rem_1-1, i+=VL1

  v1_stmt[i..i+VL1]; /* Vectorized with 256-bit vector.  */

enddo



/* Additional iteration.  */

v2_stmt [i..(i+VL2)]; /* Vectorized with 128-bit vector.  */



rem_2 = rem_1-VL2; /* VL2 is narrow vector length - 128-bit.  */



/* Remainder.  */

do j=0..rem_2, ++j

  stmt[j+i];

enddo





Here is how to reproduce:

$ gcc -static -m64 -fstrict-aliasing -fno-prefetch-loop-arrays -Ofast

-funroll-loops -fwhole-program -msse4 ./loop_vers.c -o loop_sse



$ gcc -static -m64 -fstrict-aliasing -fno-prefetch-loop-arrays -Ofast

-funroll-loops -fwhole-program -mavx ./loop_vers.c -o loop_avx



$ sde -icount -- ./loop_sse 7

0.00$$ TID: 0 ICOUNT: 16001317



$ sde -icount -- ./loop_avx 7

0.00$$ TID: 0 ICOUNT: 20847322


[Bug target/54564] [4.8 Regression] Broken __builtin_ia32_vfmadds[sd]3

2012-09-13 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54564

--- Comment #3 from Yukhin Kirill  2012-09-13 
11:57:35 UTC ---
Fails also occur on real HW.


[Bug tree-optimization/58137] New: [trunk, ICE] full unroll + AVX2 vectorization

2013-08-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58137

Bug ID: 58137
   Summary: [trunk, ICE] full unroll + AVX2 vectorization
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirill.yukhin at intel dot com

Created attachment 30635
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30635&action=edit
Reproducer

Hello attached test produces ICE, when compiled as
$ gcc -S -O3 1.c -mavx2

It seems that full unroll or copyprop (or whatever) introduces something wrong.

1.c: In function 'more_xrv':
1.c:23:1: error: type mismatch in pointer plus expression
 more_xrv(void)
 ^
struct XRV *

struct XRV *

struct XRV *

vect_vec_iv_.15_88 = vect_cst_.13_60 + { 64B, 64B, 64B, 64B };
1.c:23:1: error: type mismatch in pointer plus expression
struct XRV *

struct XRV *

struct XRV *

...


[Bug tree-optimization/58137] [trunk, ICE] full unroll + AVX2 vectorization

2013-08-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58137

--- Comment #1 from Yukhin Kirill  ---
Actually, this case come while debugging Spec2000's perl workload on AVX-512
changes (with bigger tripcount).


[Bug tree-optimization/58137] [trunk, ICE] full unroll + AVX2 vectorization

2013-08-13 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58137

--- Comment #4 from Yukhin Kirill  ---
> Could some one check if the generated code is now correct ?
Patch works both on attached AVX2 testcase and on root AVX-512 issue, thanks.

I think it should be submitted to gcc-patches.


[Bug middle-end/50074] [4.7 Regression] gcc.dg/sibcall-6.c execution test on x86_64-apple-darwin10

2011-08-16 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50074

--- Comment #8 from Yukhin Kirill  2011-08-16 
08:48:21 UTC ---
Hi,
I agree, this is a performance regression. Fix to tail-call optimization made
it very conservative. By using some additional tweaks, we may relax it.
However, my fix cured a stability problem (see, 49519 for details).


[Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way

2011-08-17 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

 Bug #: 50107
   Summary: [IRA, i386] allocates regiters in very non-optimal way
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Created attachment 25032
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25032
Patch, enabling MULX insn

Hi,
I am working on enabling of new MULX instruction for GCC.
It have to relax generic unsigned mult in two ways: no falgs are clobbered, and
(the main) destination may be arbitrary 2 GPR's.

Patch is attached along with testcase.

Problem is that such relaxation leads to useless spills/fills.
Command line is:
./build-x86_64-linux/gcc/xgcc -B./build-x86_64-linux/gcc test.c -S -Ofast
Here is assembly with MULX:
test_mul_64:
.LFB0:
movq%rdi, %rdx
pushq   %rbx  <
mulx%rsi, %rbx, %rcx
addq$3, %rcx
adcq$0, %rbx
movq%rcx, %rax
movq%rcx, k2(%rip)
movq%rbx, %rdx<
movq%rbx, k2+8(%rip)
popq%rbx  <
ret

You can see, that if we replace ebx usage with edx, instruction marked with
arrows will dissapear. 

Maybe the problem is connected with my definition of MULX?
But it seems to me as IRA misoptimization.

BTW, r8, r9 etc. regs are caller-safe, so we may just use them without saving
to stack? Why IRA doesn't do that?

Thanks, K


[Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way

2011-08-17 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #1 from Yukhin Kirill  2011-08-17 
13:41:58 UTC ---
Created attachment 25033
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25033
Testcase


[Bug target/50155] [4.7 Regression] AVX2 support broke -mavx

2011-08-22 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50155

--- Comment #1 from Yukhin Kirill  2011-08-22 
18:52:40 UTC ---
Hi,
thanks, for investigation.
Here is a patch:
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01808.html

K


[Bug testsuite/50185] [4.7 Regression] Bad AVX2 tests

2011-08-26 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50185

Yukhin Kirill  changed:

   What|Removed |Added

URL||http://gcc.gnu.org/ml/gcc-p
   ||atches/2011-08/msg02137.htm
   ||l

--- Comment #2 from Yukhin Kirill  2011-08-26 
11:46:23 UTC ---
Thanks for finding that.


[Bug testsuite/50185] [4.7 Regression] Bad AVX2 tests

2011-08-26 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50185

--- Comment #4 from Yukhin Kirill  2011-08-26 
12:04:59 UTC ---
(In reply to comment #3)
> I don't think using -dp and matching insn names is a good approach, any time
> you macroize the insns or rename you'll need to adjust the tests.
> You can try to match the insn name followed by spaces/tabs followed by the
> first operand...

Thanks, I've updated the patch (see ML).


[Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves

2011-09-22 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

 Bug #: 50480
   Summary: 10% performance regression on Spec2006 410.bwaves
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hi,
Recently Richard fixed this http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49957
According to my measurements, fix for that bug caused (on Spec2006):

For SandyBride CPU:
* 410.bwaves degradation is -9.54% for peak32
* 410.bwaves degradation is -6.91% for base32
* 410.bwaves improvement is 1.00% for peak64
* 410.bwaves improvement is 0.91% 3or base64

For Corei7 CPU:
* 410.bwaves degradation is -3.91% for peak32
* 410.bwaves degradation is -3.91% for base32
* 410.bwaves improvement is 1.94% for peak64
* 410.bwaves improvement is 3.23% 3or base64

For AMD (Phenom(tm) II X3 B75) CPU:
* 410.bwaves degradation is -7.32% for peak32
* 410.bwaves degradation is -6.56% for base32
* 410.bwaves improvement is 2.01% for peak64
* 410.bwaves degradation is -1.34% 3or base64


[Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves

2011-09-22 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #1 from Yukhin Kirill  2011-09-22 
10:00:34 UTC ---
Checkin URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177368


[Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves

2011-09-22 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #2 from Yukhin Kirill  2011-09-22 
10:33:06 UTC ---
Here is optset details:
base=-static -O2 -ffast-math ("-m32 -msse2 -mfpmath=sse" if 32 bit mode)
peak=-static -O3 -funroll-loops -ffast-math ("-m32 -msse2 -mfpmath=sse" if 32
bit mode)

For SandyBridge: += "-mavx -march=corei7" 
For Core i7: += "-march=corei7" 
For AMD: += "-march=amdfam10" (not sure this is the best)


[Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves

2011-09-27 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #4 from Yukhin Kirill  2011-09-27 
08:31:35 UTC ---
(In reply to comment #3)
> For 32bit only it seems.  Supposedly a cost model issue, the register pressure
> will be higher and we have only half the number of SSE regs.

Richard, what's wrong maybe with cost model? If you're increasing liverange and
you have not as much registers (32-bit case), obviously register pressure will
increase and degrade performance. But again, how it is connected with cost
model?


[Bug bootstrap/50543] New: Bootstrap fails to build for latest 4.6.0

2011-09-27 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50543

 Bug #: 50543
   Summary: Bootstrap fails to build for latest 4.6.0
Classification: Unclassified
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hi,
I am not sure, but possibly this will duplicate some other bug.

Bootstrap miscompares on latest 4.6.0 sources:


$ ../gcc/configure i686-linux --with-arch=corei7 --with-cpu=corei7
--enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld
--enable-cloog-backend=isl --with-fpmath=sse --prefix=$PREFIX
--enable-languages=c,c++,fortran
$ make -j24
...
make[2]: Entering directory `/export/users/kyukhin/ws_ref/4.6_build'
make[3]: Entering directory `/export/users/kyukhin/ws_ref/4.6_build'
rm -f stage_current
make[3]: Leaving directory `/export/users/kyukhin/ws_ref/4.6_build'
Comparing stages 2 and 3
warning: gcc/cc1plus-checksum.o differs
warning: gcc/cc1-checksum.o differs
Bootstrap comparison failure!
gcc/tree-ssa-loop-im.o differs
gcc/loop-iv.o differs
gcc/ira-build.o differs
gcc/reload.o differs
gcc/gcov.o differs
gcc/tree-vect-stmts.o differs
gcc/tree-vrp.o differs
gcc/reload1.o differs
gcc/bb-reorder.o differs
gcc/real.o differs
gcc/ira.o differs
gcc/dwarf2out.o differs
gcc/tree-ssa-loop-prefetch.o differs
gcc/fold-const.o differs
gcc/ira-emit.o differs
gcc/cfgexpand.o differs
gcc/build/genautomata.o differs
gcc/omega.o differs
gcc/ira-conflicts.o differs
gcc/gcc.o differs
gcc/store-motion.o differs
gcc/ipa-split.o differs
gcc/sel-sched.o differs
gcc/plugin.o differs
gcc/tree-predcom.o differs
gcc/driver-i386.o differs
gcc/tree.o differs
libcpp/directives.o differs
libcpp/charset.o differs
libcpp/traditional.o differs
libcpp/expr.o differs
libdecnumber/bid2dpd_dpd2bid.o differs
libiberty/sha1.o differs
libiberty/pic/sha1.o differs
lto-plugin/.libs/lto-plugin.o differs
make[2]: *** [compare] Error 1
make[2]: Leaving directory `/export/users/kyukhin/ws_ref/4.6_build'
make[1]: *** [stage3-bubble] Error 2
make[1]: Leaving directory `/export/users/kyukhin/ws_ref/4.6_build'
make: *** [all] Error 2


[Bug bootstrap/50543] Bootstrap fails to build for latest 4.6.0

2011-09-27 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50543

--- Comment #2 from Yukhin Kirill  2011-09-27 
16:17:15 UTC ---
(In reply to comment #1)
> what do you mean latest 4.6.0?  the 4.6.0 release, or the latest sources on 
> the
> 4.6 branch? (which will become 4.6.2)

Latest sources. Sorry for misunderstanding


[Bug bootstrap/50543] Bootstrap fails to build for latest 4.6

2011-09-28 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50543

--- Comment #5 from Yukhin Kirill  2011-09-28 
07:30:46 UTC ---
(In reply to comment #4)
> I have no problem with
> 
> /export/gnu/import/git/gcc-release/configure --enable-clocale=gnu
> --with-system-zlib --with-demangler-in-ld --enable-languages=c,c++ i686-linux
> --prefix=/usr/gcc-4.6.2-corei7 --with-local-prefix=/usr/local
> --enable-gnu-indirect-function --with-arch=corei7 --with-cpu=corei7
> --with-fpmath=sse
> 
> on gcc-4_6-branch at revision 179242.

That it strange. 
Build fails from first svn 4.6 branch revision.
Trunk had the same problem which was fixed on 20110729 between svn revisions
176852 176905.


[Bug bootstrap/50543] Bootstrap fails to build for latest 4.6

2011-09-28 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50543

--- Comment #7 from Yukhin Kirill  2011-09-28 
19:42:52 UTC ---
Anybody but me and Evgeny can confirm that?

I've tried really general path of build it and got fail to compare different
stages...


[Bug bootstrap/50621] New: [4.7 Regression] Bootstrap failure

2011-10-05 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50621

 Bug #: 50621
   Summary: [4.7 Regression] Bootstrap failure
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hi, rev 179554 fails to bootstrap
At least on x86-64-generic, and x86-64-corei7-avx

Here is a log

\
gcc -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fPIC -g
-DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -fb\
uilding-libgcc -fno-stack-protector   -I. -I. -I../../.././gcc
-I../../../../src-trunk/libgcc -I../../../../src-trunk/libgcc/.
-I../../../../src-trunk/libgcc/../gcc -I../../../../src-trunk/\
libgcc/../include -I../../../../src-trunk/libgcc/config/libbid
-DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS  -DUSE_TLS -o _gcov_merge_ior.o -MT
_gcov_merge_ior.o -MD -MP -MF _gcov_merge_ior.de\
p -DL_gcov_merge_ior -c ../../../../src-trunk/libgcc/libgcov.c
/nightly/gcc.svn/test-intel64corei7avx/gcc-build-trunk/bld/./gcc/xgcc
-B/nightly/gcc.svn/test-intel64corei7avx/gcc-build-trunk/bld/./gcc/
-B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/l\
ocal/x86_64-unknown-linux-gnu/lib/ -isystem
/usr/local/x86_64-unknown-linux-gnu/include -isystem
/usr/local/x86_64-unknown-linux-gnu/sys-include-g -O2 -m32 -O2  -I. -I.
-I../../src-trun\
k/gcc -I../../src-trunk/gcc/. -I../../src-trunk/gcc/../include
-I../../src-trunk/gcc/../libdecnumber -I../../src-trunk/gcc/../libdecnumber/bid
-I../libdecnumber -I../../src-trunk/gcc/../lib\
gcc -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fPIC -g
-DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -fb\
uilding-libgcc -fno-stack-protector   -I. -I. -I../../.././gcc
-I../../../../src-trunk/libgcc -I../../../../src-trunk/libgcc/.
-I../../../../src-trunk/libgcc/../gcc -I../../../../src-trunk/\
libgcc/../include -I../../../../src-trunk/libgcc/config/libbid
-DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS  -DUSE_TLS -o unwind-dw2.o -MT
unwind-dw2.o -MD -MP -MF unwind-dw2.dep -fexceptions \
-c ../../../../src-trunk/libgcc/unwind-dw2.c -fvisibility=hidden -DHIDE_EXPORTS
../../../src-trunk/libgcc/config/libbid/bid64_noncomp.c: In function
'__bid64_totalOrderMag':
../../../src-trunk/libgcc/config/libbid/bid64_noncomp.c:938:1: internal
compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2243
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[6]: *** [bid64_noncomp.o] Error 1


[Bug bootstrap/50621] [4.7 Regression] Bootstrap failure

2011-10-05 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50621

--- Comment #1 from Yukhin Kirill  2011-10-05 
14:36:19 UTC ---
Revision 179538 is ok.


[Bug bootstrap/50621] [4.7 Regression] Bootstrap failure

2011-10-05 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50621

--- Comment #6 from Yukhin Kirill  2011-10-05 
15:43:54 UTC ---
This was caused by
gcc.gnu.org/svn/gcc/trunk@179553

Previous one bootstraps ok:
gcc.gnu.org/svn/gcc/trunk@179549


[Bug target/50766] Binutils 2.22.51 rejects bmi2 pext operation with memory operands

2011-10-19 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50766

--- Comment #2 from Yukhin Kirill  2011-10-19 
09:37:17 UTC ---
Hi,
this is obviously a bug (introduced by me).
Memory operand in GCC notation must occur at first place.


[Bug target/50766] Binutils 2.22.51 rejects bmi2 pext operation with memory operands

2011-10-19 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50766

--- Comment #3 from Yukhin Kirill  2011-10-19 
09:38:22 UTC ---
Created attachment 25553
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25553
Patch

I am testing it by now


[Bug target/50766] Binutils 2.22.51 rejects bmi2 pext operation with memory operands

2011-10-19 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50766

--- Comment #5 from Yukhin Kirill  2011-10-19 
09:48:51 UTC ---
(In reply to comment #4)
> (In reply to comment #2)
> > Hi,
> > this is obviously a bug (introduced by me).
> > Memory operand in GCC notation must occur at first place.
> 
> Please note that gcc also supports Intel notation with -masm=intel. Probably
> you need to introduce multiple variants of assembler languages syntax into 
> insn
> templates. Please see "Instruction output" section in GCC Internals manual.

Thanks, I'll verify it as well


[Bug target/50766] Binutils 2.22.51 rejects bmi2 pext operation with memory operands

2011-10-19 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50766

--- Comment #6 from Yukhin Kirill  2011-10-19 
13:09:30 UTC ---
Thread on gcc-patches ML:
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01719.html


[Bug middle-end/50823] [4.7 Regression] ICE in inline_small_functions, at ipa-inline.c:1407

2011-11-07 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50823

Yukhin Kirill  changed:

   What|Removed |Added

 CC||kirill.yukhin at intel dot
   ||com

--- Comment #10 from Yukhin Kirill  2011-11-07 
09:59:07 UTC ---
Spec2000/176.gcc fails on peak with '-flto' option (actually this was mentioned
in #50868).

Here is the output:
gcc -static -flto -O3 -funroll-loops -ffast-math  -DSPEC_CPU2000_LP64
c-parse.o c-lang.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
c-aux-info.o c-common.o c-iterate.o toplev.o version.o tree.o print-tree.o
stor-layout.o fold-const.o function.o stmt.o expr.o calls.o expmed.o explow.o
optabs.o varasm.o rtl.o print-rtl.o rtlanal.o emit-rtl.o real.o dbxout.o
sdbout.o dwarfout.o xcoffout.o integrate.o jump.o cse.o loop.o unroll.o flow.o
stupid.o combine.o regclass.o local-alloc.o global.o reload.o reload1.o
caller-save.o insn-peep.o reorg.o sched.o final.o recog.o reg-stack.o
insn-opinit.o insn-recog.o insn-extract.o insn-output.o insn-emit.o
insn-attrtab.o m88k.o getpwd.o convert.o bc-emit.o bc-optab.o obstack.o   -lm 
-o cc1
...
lto1: internal compiler error: in inline_small_functions, at ipa-inline.c:1413
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
lto-wrapper: gcc returned 1 exit status
/usr/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status
specmake: *** [cc1] Error 1


[Bug target/53192] New: Incorrect arguments to AVX2's gather intrinsics

2012-05-02 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53192

 Bug #: 53192
   Summary: Incorrect arguments to AVX2's gather intrinsics
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hello,
Seems Intel's latest Spec contains a bug, which was reproduced in avx2intrin.h

_mm_i32gather_epi64 (long long int const *base,
 __m128i index, const int scale)

This has led to an incompatibility between Intel and Gnu compilers. The Intel
version of immintrin.h specifies the type as __int64 const *. The type __int64
is a non-standard MS invention, which apparently is compatible with the
standard type int64_t (inttypes.h). The GCC version of avx2intrin.h specifies
the same parameter as long long int const *. 
Unfortunately, these two types are incompatible under 64-bit Linux.


[Bug target/53194] [4.8 Regression] Many x86 failures

2012-05-02 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53194

--- Comment #2 from Yukhin Kirill  2012-05-02 
19:20:59 UTC ---
The problem is here:
+
+  sprintf (hle_macro, "__ATOMIC_HLE_ACQUIRE=%d", IX86_HLE_ACQUIRE);
+  def_or_undef (parse_in, hle_macro);
+
+  sprintf (hle_macro, "__ATOMIC_HLE_RELEASE=%d", IX86_HLE_RELEASE);
+  def_or_undef (parse_in, hle_macro);

Seems, when def_or_undef acts as `undef` we've got a problem:
$ /export/users/kyukhin/ws/build/build-x86_64-linux/gcc/xgcc
-B/export/users/kyukhin/ws/build/build-x86_64-linux/gcc/ /export/users/ky
o-diagnostics-show-caret  -S -o sse-22.s
/export/users/kyukhin/ws/git/gcc/testsuite/gcc.target/i386/sse-22.c:54:21:
warning: extra tokens at end of #undef directive [enabled b
/export/users/kyukhin/ws/git/gcc/testsuite/gcc.target/i386/sse-22.c:54:21:
warning: extra tokens at end of #undef directive [enabled b


[Bug target/53201] [4.8 Regression] unrecognized command line option '-mno-lzcnt-mno-hle

2012-05-02 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53201

Yukhin Kirill  changed:

   What|Removed |Added

 CC||kirill.yukhin at intel dot
   ||com

--- Comment #1 from Yukhin Kirill  2012-05-03 
04:18:22 UTC ---
Hi,
I think this is obvious fix, although I've started bootstrapping with
-march=native


[Bug target/53201] [4.8 Regression] unrecognized command line option '-mno-lzcnt-mno-hle

2012-05-02 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53201

--- Comment #2 from Yukhin Kirill  2012-05-03 
05:15:47 UTC ---
Tobias, bootstrap (-march=native) is passing with your fix.
If nobody objects, I'll commit it as obvious fix


[Bug target/53194] [4.8 Regression] Many x86 failures

2012-05-03 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53194

--- Comment #3 from Yukhin Kirill  2012-05-03 
07:01:37 UTC ---
Created attachment 27299
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27299
Proposed solution


[Bug target/53194] [4.8 Regression] Many x86 failures

2012-05-03 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53194

--- Comment #4 from Yukhin Kirill  2012-05-03 
07:02:50 UTC ---
(In reply to comment #3)
> Created attachment 27299 [details]
> Proposed solution

Attached patch cures failing tests


[Bug target/53291] New: Code generated for xtest is wrong

2012-05-09 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53291

 Bug #: 53291
   Summary: Code generated for xtest is wrong
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hi, 
Andi discovered that xtest instruction generation is wrong.
It generates

  xorl%esi, %esi
  xtest
  sete%sil

but correct is reverse

  movl $1,%esi
  xtest
  setne %sil


[Bug target/53291] Code generated for xtest is wrong

2012-05-09 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53291

--- Comment #2 from Yukhin Kirill  2012-05-09 
16:53:12 UTC ---
(In reply to comment #1)
> Testcase?

It is trivial, so posting right here:

#include 
unsigned a;
int
rtm_xtest (void)
{
  if (_xtest ())
a = 1;
}

./build-x86_64-linux/gcc/xgcc -B./build-x86_64-linux/gcc 1.c -S -mrtm
$ cat 1.s
...
xtest
sete%al
movsbl  %al, %eax
testl   %eax, %eax
je  .L4
movl$1, a(%rip)
...


[Bug target/53399] New: "*ffs" pattern generates wrong code with BMI enabled (for corner cases)

2012-05-18 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53399

 Bug #: 53399
   Summary: "*ffs" pattern generates wrong code with BMI enabled
(for corner cases)
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


We have in GCC int. (__ffs description):
These functions return the index of the least significant 1-bit in a, or the
value zero if a is zero.

and in i386.md:
(define_insn "*ffs_1"
  [(set (reg:CCZ FLAGS_REG)
(compare:CCZ (match_operand:SWI48 1 "nonimmediate_operand" "rm")
 (const_int 0)))
   (set (match_operand:SWI48 0 "register_operand" "=r")
(ctz:SWI48 (match_dup 1)))]
  ""
{
  if (TARGET_BMI)
return "tzcnt{}\t{%1, %0|%0, %1}";
  else 

This pattern works fine for bsf insn (although the result with zero input is
undefined)
But for tzcnt with 0 as input we have (operand_size+1) as output.

That contradicts with GCC int, right?

It also seems to fail gcc.c-torture/execute/builtin-bitops-1.c


[Bug target/53399] "*ffs" pattern generates wrong code with BMI enabled (for corner cases)

2012-05-18 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53399

--- Comment #1 from Yukhin Kirill  2012-05-18 
13:58:22 UTC ---
(In reply to comment #0)
> It also seems to fail gcc.c-torture/execute/builtin-bitops-1.c
It fails on BMI-capable CPU


[Bug target/53399] "*ffs" pattern generates wrong code with BMI enabled (for corner cases)

2012-05-20 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53399

--- Comment #4 from Yukhin Kirill  2012-05-20 
15:53:30 UTC ---
Created attachment 27449
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27449
testcase


[Bug target/53399] "*ffs" pattern generates wrong code with BMI enabled (for corner cases)

2012-05-20 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53399

--- Comment #5 from Yukhin Kirill  2012-05-20 
15:54:08 UTC ---
> 
> Can you please isolate failing test?

Sure, it is attached.
It works when compiled this way:
/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/xgcc
-B/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/ 1.c -march=core2

And fails to run (on BMI*- capable HW) when compiled this way:
/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/xgcc
-B/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/ 1.c -march=core-avx2


[Bug target/53399] "*ffs" pattern generates wrong code with BMI enabled (for corner cases)

2012-05-20 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53399

--- Comment #6 from Yukhin Kirill  2012-05-20 
15:54:38 UTC ---
> 
> Can you please isolate failing test?

Sure, it is attached.
It works when compiled this way:
/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/xgcc
-B/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/ 1.c -march=core2

And fails to run (on BMI*- capable HW) when compiled this way:
/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/xgcc
-B/export/home/kyukhin/gcc/build/build-x86_64-linux/gcc/ 1.c -march=core-avx2


[Bug target/53399] "*ffs" pattern generates wrong code with BMI enabled

2012-05-21 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53399

--- Comment #11 from Yukhin Kirill  2012-05-21 
11:02:07 UTC ---
> 
> Please test the attached patch. The patch checks CCCmode for TARGET_BMI in ffs
> patterns.

Hi Uros, seems your patch fixes the problem, here is piece of asm from
testcase:
...
movlints(%rip), %eax
movl%eax, %edx
movl$-1, %eax
tzcntl  %edx, %ebx
cmovc   %eax, %ebx
...


[Bug target/53435] (ix86_expand_vec_perm) and (ix86_expand_vec_perm) do not pass arguments to avx2_permvar8s[f,i] correctly

2012-05-21 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53435

--- Comment #2 from Yukhin Kirill  2012-05-21 
12:17:41 UTC ---
(In reply to comment #0)
> 
> gcc.c-torture/execute/vshuf-v* and gcc.dg/torture/pr45720.c fail.
> 
This occurs on AVX2-capable HW


[Bug target/53192] Incorrect arguments to AVX2's gather intrinsics

2012-05-22 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53192

Yukhin Kirill  changed:

   What|Removed |Added

 CC||areg.melikadamyan at gmail
   ||dot com

--- Comment #3 from Yukhin Kirill  2012-05-22 
08:23:56 UTC ---
(In reply to comment #1)
> Please provide a testcase to show the problem.

I have no idea, which kind of test it should be.
These is just MS-ICC-GCC incompatibility issue


[Bug target/53192] Incorrect arguments to AVX2's gather intrinsics

2012-05-22 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53192

--- Comment #2 from Yukhin Kirill  2012-05-22 
08:22:12 UTC ---
(In reply to comment #1)
> Please provide a testcase to show the problem.

I have no idea, which kind of test it should be.
These is just MS-ICC-GCC incompatibility issue


[Bug target/53877] __lzcnt_u16/__lzcnt_u32/__lzcnt_u64 aren't implemented

2012-07-20 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53877

--- Comment #3 from Yukhin Kirill  2012-07-20 
08:58:17 UTC ---
Done.


[Bug target/54156] New: New fail on AVX target: gcc.dg/vect/pr53773.c. 190010 vs revision 189996

2012-08-01 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54156

 Bug #: 54156
   Summary: New fail on AVX target: gcc.dg/vect/pr53773.c. 190010
vs revision 189996
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hi,
we've got new fails on AVX-enabled machine
FAIL: gcc.dg/vect/pr53773.c -flto  scan-tree-dump-times vect "\\* 10" 4
FAIL: gcc.dg/vect/pr53773.c -flto  scan-tree-dump-times vect "\\* 10" 4
FAIL: gcc.dg/vect/pr53773.c scan-tree-dump-times vect "\\* 10" 4
FAIL: gcc.dg/vect/pr53773.c scan-tree-dump-times vect "\\* 10" 4

http://gcc.gnu.org/ml/gcc-regression/2012-07/msg00276.html


[Bug target/52932] AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type

2012-04-11 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932

Yukhin Kirill  changed:

   What|Removed |Added

 CC||kirill.yukhin at intel dot
   ||com

--- Comment #3 from Yukhin Kirill  2012-04-11 
17:32:18 UTC ---
> Kirill, can you please test proposed patch on AVX2 target?
Sure, will do tomorrow morning!

K


[Bug target/52932] AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type

2012-04-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932

--- Comment #5 from Yukhin Kirill  2012-04-12 
13:52:26 UTC ---
(In reply to comment #2)
> Created attachment 27133 [details]
> Proposed patch
> 
> Kirill, can you please test proposed patch on AVX2 target?

Uros, I've slightly updated your patch: idx and vector were intermixed.
Attached.
It passes AVX2 testing now

K


[Bug target/52932] AVX2 intrinsic _mm256_permutevar8x32_ps has wrong parameter type

2012-04-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52932

--- Comment #4 from Yukhin Kirill  2012-04-12 
13:51:26 UTC ---
Created attachment 27140
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27140
Updated patch


[Bug target/53020] New: __atomic_fetch_or doesn't generate `1 insn` variant

2012-04-17 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53020

 Bug #: 53020
   Summary: __atomic_fetch_or doesn't generate `1 insn` variant
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hello,
while working on Intel's TSX extensions, I've found strange (to me) thing.

We have in config/i386/sync.md:
(define_insn "atomic_"
  [(set (match_operand:SWI 0 "memory_operand" "+m")
(unspec_volatile:SWI
  [(any_logic:SWI (match_dup 0)
  (match_operand:SWI 1 "nonmemory_operand" ""))
   (match_operand:SI 2 "const_int_operand")];; model
  UNSPECV_LOCK))
...

any_logic covers (unconditionally) covers AND IOR and XOR ops.

However, generated insn-opinit.c lacks IOR variant initalization:
...
set_direct_optab_handler (atomic_and_optab, QImode, CODE_FOR_atomic_andqi);
set_direct_optab_handler (atomic_xor_optab, QImode, CODE_FOR_atomic_xorqi);
...

So, having such simple test:
void
foo (int *p, int v)
{
  __atomic_fetch_or (p, 1, __ATOMIC_ACQUIRE | __ATOMIC_HLE_ACQUIRE);
}

`lock orl  %edx, (%eax)` wont' be generated, since there is no
corresponding occurence in IOR optab.
Here is the code, that fails to find it:
optabs.c:maybe_emit_op
...
  if (use_memmodel)
{
  icode = direct_optab_handler (optab->mem_no_result, mode);
...

The most strange thing to me is that it works fine with XOR and AND ops.


[Bug target/53020] __atomic_fetch_or doesn't generate `1 insn` variant

2012-04-17 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53020

--- Comment #1 from Yukhin Kirill  2012-04-17 
16:23:26 UTC ---
Instead, of single `locked` instruction, it generates:.L2:
movl%eax, %ecx
orl $1, %ecx
lock cmpxchgl   %ecx, (%edx)
Similar variant for AND operation:
lock andl  %edx, (%eax)


[Bug target/53020] __atomic_fetch_or doesn't generate `1 insn` variant

2012-04-17 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53020

--- Comment #3 from Yukhin Kirill  2012-04-17 
17:00:34 UTC ---
(In reply to comment #2)
> Uh...
> 
> Index: config/i386/sync.md
> ===
> --- config/i386/sync.md (revision 186501)
> +++ config/i386/sync.md (working copy)
> @@ -576,7 +576,7 @@
>return "lock{%;} sub{}\t{%1, %0|%0, %1}";
>  })
> 
> -(define_insn "atomic_"
> +(define_insn "atomic_"
>[(set (match_operand:SWI 0 "memory_operand" "+m")
> (unspec_volatile:SWI
>   [(any_logic:SWI (match_dup 0)

Oh, I see. Thanks!


[Bug target/58421] [4.9 regression] FAIL: gcc.c-torture/compile/20051216-1.c -O3 -fomit-frame-pointer (internal compiler error)

2013-11-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58421

Yukhin Kirill  changed:

   What|Removed |Added

 CC||kirill.yukhin at intel dot com

--- Comment #2 from Yukhin Kirill  ---
I cannot reproduce it on r204382, however it fails on r202525.


[Bug target/52731] internal compiler error: in ia64_st_address_bypass_p, at config/ia64/ia64.c:9357

2013-11-19 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52731

--- Comment #1 from Yukhin Kirill  ---
Reproduced on recent trunk.

It seems that we have in ia64.c:
int
ia64_st_address_bypass_p (rtx producer, rtx consumer)
{
  rtx dest, reg, mem;

  gcc_assert (producer && consumer);
  dest = ia64_single_set (producer);
  gcc_assert (dest);
  ...

The problem is that we have as `producer':
(insn 18 17 4 2 (cond_exec (eq (reg:BI 262 p6 [351])
(const_int 0 [0]))
(parallel [
(set (reg:DI 16 r16 [orig:346 D.1446 ] [346])
(reg/v:DI 112 r32 [orig:340 size ] [340]))
(set (reg/v:DI 112 r32 [orig:340 size ] [340])
(reg/v:DI 112 r32 [orig:340 size ] [340]))
])) 1188 {*p epilogue_deallocate_stack}
 (nil))

ia64_single_set can handle cond_exec (this is actually its purpose).
But it (after going into cond_exec) calls rtlanal.c's `single_set_2',
which returns non zero if we have only one *live* set expr after
the insn. It returns 0 otherwise (this case), which in turn triggers
assert in `ia64_st_address_bypass_p'.

I think, we could fix `ia64_st_address_bypass_p' not to use
`ia64_single_set', but iterate through all set exprs in producer.


[Bug target/59405] New: Incorrect FP<->MMX transition during call/ret

2013-12-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59405

Bug ID: 59405
   Summary: Incorrect FP<->MMX transition during call/ret
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirill.yukhin at intel dot com

Hello,
Attached test reproduces the error:
  $ gcc -m32 -mmmx 1.c
  $ ./a.out
  Aborted (core dumped)

Disassembly of the foo is:
foo32x2_be:
.LFB0:
pushl   %ebp# 22*pushsi2[length = 1]
movl%esp, %ebp  # 23*movsi_internal/1   [length = 2]
subl$16, %esp   # 24pro_epilogue_adjust_stack_si_add/1 
[length = 3]
movq%mm0, -8(%ebp)  # 3 *movv2sf_internal/9 [length = 4]
movl-4(%ebp), %eax  # 7 *movsf_internal/4   [length = 3]
movl%eax, -12(%ebp) # 14*movsf_internal/5   [length = 3]
flds-12(%ebp)   # 21*movsf_internal/1   [length = 3]
leave   # 27leave   [length = 1]
ret # 28simple_return_internal  [length = 1]

We're passing v2sf vector using MMX register, which aliased to x87 stack.
Then we're trying to load FP to it, which leds to NaN.

As far as I understand, we need `emms' instruction between last MMX use and
before first x87 use.

Reproduces everywhere, up to 4.7.2 (may be earlier, I have no such).


[Bug target/59405] Incorrect FP<->MMX transition during call/ret

2013-12-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59405

--- Comment #2 from Yukhin Kirill  ---
Created attachment 31389
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31389&action=edit
Testcase


[Bug target/59405] Incorrect FP<->MMX transition during call/ret

2013-12-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59405

--- Comment #3 from Yukhin Kirill  ---
(In reply to Uroš Bizjak from comment #1)
> There is no testcase attached, but you need to *manually* insert _mm_empty
> (== emms) to switch from MMX to x87 state.
> 
> The compiler does not automatically insert emms for you.

Well, then problem is different, test as simple as empty call.
I doubt we should emit wrong code here.

[Bug target/59405] Incorrect FP<->MMX transition during call/ret

2013-12-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59405

--- Comment #5 from Yukhin Kirill  ---
I see. So, it seems like a limitation to passing vectors as arguments in 32-bit
mode. We may implement something similar to `vzerroupper' autogeneration or
simply close the bug as `user misunderstanding.'


[Bug tree-optimization/59617] New: [vectorizer] ICE in vectorizable_mask_load_store with AVX-512F's gathers enabled.

2013-12-28 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59617

Bug ID: 59617
   Summary: [vectorizer] ICE in vectorizable_mask_load_store with
AVX-512F's gathers enabled.
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirill.yukhin at intel dot com

Created attachment 31529
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31529&action=edit
Reproducer

Hello, I am going to check in a patch, which will
enable AVX-512F new gathers instructions.

New gathers are use mask registers (%kN), while
vectorizable_mask_load_store asserts mask type to be compatible
with operand type.

This fails 416.gamess to build with -mavx512f -Ofast.

Reproducer attached.

Reproduce:
$ gfortran -S -Ofast -mavx512f hss2a.fppized.f

Back trace:
0xbcfc27 vectorizable_mask_load_store
/export/users/kyukhin/gcc/git/gcc/gcc/tree-vect-stmts.c:1901
0xbddf6c vectorizable_call
/export/users/kyukhin/gcc/git/gcc/gcc/tree-vect-stmts.c:2172
0xbe1021 vect_transform_stmt(gimple_statement_base*, gimple_stmt_iterator*,
bool*, _slp_tree*, _slp_instance*)
/export/users/kyukhin/gcc/git/gcc/gcc/tree-vect-stmts.c:7017
0xbe4731 vect_transform_loop(_loop_vec_info*)
/export/users/kyukhin/gcc/git/gcc/gcc/tree-vect-loop.c:6046
0xc00838 vectorize_loops()
/export/users/kyukhin/gcc/git/gcc/gcc/tree-vectorizer.c:476

I'll check my patch in with disabled gathers, so to enable it -
remove `#if 0' in gcc/config/i386/i386.c


[Bug tree-optimization/59617] [vectorizer] ICE in vectorizable_mask_load_store with AVX-512F's gathers enabled.

2014-04-03 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59617

--- Comment #11 from Yukhin Kirill  ---
Maybe simply do:
#ifdef __restrict
#undef __restrict

In some common header (say, avx512f-check.h)?


[Bug tree-optimization/59617] [vectorizer] ICE in vectorizable_mask_load_store with AVX-512F's gathers enabled.

2014-04-03 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59617

--- Comment #16 from Yukhin Kirill  ---
(In reply to Dominique d'Humieres from comment #15)
> 19:02:01.0 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512f-gather-5.c  2014-04-03
> 15:17:05.0 +0200
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O3 -mavx512f" } */
> +/* { dg-additional-options "-std=c99" { target *-*-darwin* } } */
>  
>  #include "avx512f-gather-4.c"

Then I think we need to add such a line in all files with __restrict keyword.

[Bug middle-end/61573] New: [ICE] Segfault while Linux 3.15 build

2014-06-20 Thread kirill.yukhin at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61573

Bug ID: 61573
   Summary: [ICE] Segfault while Linux 3.15 build
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirill.yukhin at intel dot com

Hello,
While building Linux using recent GCC trunk I've got ICE:
  CC  kernel/locking/spinlock.o
kernel/locking/spinlock.c: In function ‘_raw_read_unlock_bh’:
kernel/locking/spinlock.c:280:1: internal compiler error: Segmentation fault
 }
 ^
0x8b5e25 crash_signal
/export/users/kyukhin/gcc/git/gcc/gcc/toplev.c:337
0x4eca53 lookup_page_table_entry
/export/users/kyukhin/gcc/git/gcc/gcc/ggc-page.c:634
0x4eca53 ggc_set_mark(void const*)
/export/users/kyukhin/gcc/git/gcc/gcc/ggc-page.c:1515
0x711b34 gt_ggc_mx_eh_status(void*)
/export/users/kyukhin/gcc/build/build-x86_64-linux/gcc/gtype-desc.c:928
0x711cb5 gt_ggc_mx_function(void*)
   
/export/users/kyukhin/gcc/build/build-x86_64-linux/gcc/gtype-desc.c:1409
0x410425 gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:384
0x410270 gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:516
0x410270 gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:516
0x70e30a gt_ggc_mx
/export/users/kyukhin/gcc/git/gcc/gcc/vec.h:1098
0x70e30a gt_ggc_mx_vec_constructor_elt_va_gc_(void*)
   
/export/users/kyukhin/gcc/build/build-x86_64-linux/gcc/gtype-desc.c:1326
0x410169 gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:568
0x40fd76 gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:284
0x410ccd gt_ggc_mx_c_binding(void*)
./gt-c-c-decl.h:104
0x410cf7 gt_ggc_mx_c_binding(void*)
./gt-c-c-decl.h:107
0x40f9dd gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:626
0x40fd3e gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:280
0x7111f5 gt_ggc_mx_symtab_node(void*)
   
/export/users/kyukhin/gcc/build/build-x86_64-linux/gcc/gtype-desc.c:1283
0x4103d0 gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:379
0x410ccd gt_ggc_mx_c_binding(void*)
./gt-c-c-decl.h:104
0x40f9dd gt_ggc_mx_lang_tree_node(void*)
./gt-c-c-decl.h:626
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[2]: *** [kernel/locking/spinlock.o] Error 1
make[1]: *** [kernel/locking] Error 2
make: *** [kernel] Error 2

Not reduced reproduction:
- Get Linux (https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.15.tar.xz,
MD5 97ca1625bb40368dc41b9a7971549071)
- make menuconfig (no changes, simply exit)
- make -j1

Revision used:
$ git log -1
commit 03e6428d81ac6978330c5f9cffe0e36aeb754f25
Author: jason 
Date:   Thu Jun 19 09:36:09 2014 +

PR c++/59296
* call.c (add_function_candidate): Also set LOOKUP_NO_TEMP_BIND.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@211821
138bc75d-0d04-0410-961f-82ee72b054a4

[Bug rtl-optimization/59754] New: [ree.c] Incorrect merge while working with vector registers

2014-01-10 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59754

Bug ID: 59754
   Summary: [ree.c] Incorrect merge while working with vector
registers
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirill.yukhin at intel dot com

Hello,
It seems that this revision:
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@206418
138bc75d-0d04-0410-961f-82ee72b054a4

made bunch of AVX-512F tests failing (at runtime):

The difference in assembler is as following.

For good (prev. rev., testname: gcc.target/i386/avx512f-vpmovzxwd-2.c):
...
vmovdqa 160(%esp), %ymm0
movl$-22854, %eax
leal384(%esp), %ebx
kmovw   %eax, %k1
xorl%edx, %edx
vpmovzxwd   %ymm0, %zmm1
vmovdqa64   %zmm1, 192(%esp)
vmovdqa64   256(%esp), %zmm1
vpmovzxwd   %ymm0, %zmm1{%k1}
vpmovzxwd   %ymm0, %zmm0{%k1}{z}
vmovdqa64   %zmm1, 256(%esp)
vmovdqa64   %zmm0, 320(%esp)
...

For broken:
...
vpmovzxwd   160(%esp), %zmm1
movl$-22854, %eax
leal384(%esp), %ebx
kmovw   %eax, %k1
xorl%edx, %edx
vmovdqa64   %zmm1, %zmm0
vmovdqa64   %zmm1, 192(%esp)
vmovdqa64   256(%esp), %zmm1
vpmovzxwd   %ymm0, %zmm1{%k1}
vpmovzxwd   %ymm0, %zmm0{%k1}{z}
vmovdqa64   %zmm1, 256(%esp)
vmovdqa64   %zmm0, 320(%esp)
...


So it seems that it is allowed to convert:
  (set r0 [mem])
  (set r1 sign_extend (r0))

to:
  (set r1 sign_extend ([mem]))
  (set r0 r1)

IMHO this should work with scalar, but not with vectors.
I suspect eliminating such extends for vector types were
prohibited initially.


[Bug rtl-optimization/59754] [ree.c] Incorrect merge while working with vector registers

2014-01-10 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59754

--- Comment #1 from Yukhin Kirill  ---
> made bunch of AVX-512F tests failing (at runtime):
FAIL: gcc.target/i386/avx512f-vpmovsxdq-2.c execution test
FAIL: gcc.target/i386/avx512f-vpmovsxwd-2.c execution test
FAIL: gcc.target/i386/avx512f-vpmovzxdq-2.c execution test
FAIL: gcc.target/i386/avx512f-vpmovzxwd-2.c execution test


[Bug rtl-optimization/59754] [ree.c] Incorrect merge while working with vector registers

2014-01-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59754

--- Comment #6 from Yukhin Kirill  ---
(In reply to Jeffrey A. Law from comment #3)
> Kirill, can you verify that Jakub's patch restores proper behaviour for your
> tests?  It'd be greatly appreciated.

Hello,
I've checked recent trunk with Jakub's changes checked in and it seems that
at the moment all of AVX-512 tests are pass (under simulator).

Thanks a lot for fixing that!



[Bug target/59797] GCC doesn't warn AVX-512 ABI change

2014-01-13 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59797

--- Comment #1 from Yukhin Kirill  ---
Sorry, didn't get the problem.

According to output you provided - GCC warns ABI changes

Here is analogue for AVX2:
$ cat 2.c
typedef long long __m256i __attribute__ ((__vector_size__ (32),
__may_alias__));

__m256i
f1(__m256i x, __m256i y)
{
  return y;
}
$ gcc -S 2.c
2.c: In function ‘f1’:
2.c:4:1: note: The ABI for passing parameters with 32-byte alignment has
changed in GCC 4.6
 f1(__m256i x, __m256i y)
 ^
2.c:4:1: warning: AVX vector argument without AVX enabled changes the ABI
[enabled by default]

Difference is that AVX[2] warns about using data types without enabling
AVX2. Is that the case

[Bug testsuite/59808] [4.9 Regression] r206596 caused: FAIL: gcc.target/i386/sse-14.c (test for excess errors)

2014-01-14 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59808

--- Comment #4 from Yukhin Kirill  ---
(In reply to Uroš Bizjak from comment #2)
> Kirill, please update also sse-13.c with new builtins.

Fix is posted as part of:
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00761.html
I may strip it into separate one...

[Bug testsuite/59808] [4.9 Regression] r206596 caused: FAIL: gcc.target/i386/sse-14.c (test for excess errors)

2014-01-14 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59808

--- Comment #5 from Yukhin Kirill  ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Uroš Bizjak from comment #2)
> > Kirill, please update also sse-13.c with new builtins.
> 
> And sse-12.c with new options.

Sure, I think this is obvious change if no regressions.
Will do today.

[Bug testsuite/59808] [4.9 Regression] r206596 caused: FAIL: gcc.target/i386/sse-14.c (test for excess errors)

2014-01-15 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59808

--- Comment #11 from Yukhin Kirill  ---
(In reply to Uroš Bizjak from comment #10)
> (In reply to Uroš Bizjak from comment #9)
>  
> > This is not a good ChangeLog entry. You should say somethin along
> > 
> > * gcc.target/i386/sse-14.c: Update constant avx512erintrin.h tests.
> 
> * gcc.target/i386/sse-14.c: Update constants for avx512erintrin.h tests.

Agree, not good. Fixed.

[Bug target/59952] -march=core-avx2 should not enable RTM

2014-01-27 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59952

--- Comment #9 from Yukhin Kirill  ---
(In reply to Jakub Jelinek from comment #6)
> Prerelease samples shouldn't count, people using those just can avoid using
> -march=haswell and use -march=ivybridge -mavx2 or similar instead.  Can
> anyone from Intel verify if all released Haswell CPUs have BMI2 (and if
> there aren't plans to ship Haswell CPUs without BMI2)?

I am checking and will get back to. I though all AVX2 parts had BMI1/2


[Bug target/59952] -march=core-avx2 should not enable RTM

2014-01-31 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59952

--- Comment #11 from Yukhin Kirill  ---
(In reply to Yukhin Kirill from comment #9)
> (In reply to Jakub Jelinek from comment #6)
> > Prerelease samples shouldn't count, people using those just can avoid using
> > -march=haswell and use -march=ivybridge -mavx2 or similar instead.  Can
> > anyone from Intel verify if all released Haswell CPUs have BMI2 (and if
> > there aren't plans to ship Haswell CPUs without BMI2)?
> 
> I am checking and will get back to. I though all AVX2 parts had BMI1/2

AVX2 must imply BMI. AVX2 w/ BMI disabled are not-for-sale SKUs.


[Bug tree-optimization/60510] New: SLP blocks loop vectorization (with reduction)

2014-03-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60510

Bug ID: 60510
   Summary: SLP blocks loop vectorization (with reduction)
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirill.yukhin at intel dot com

Hello,
This case is not vectorized:
$ cat f2.f
  subroutine foo(a,x,y,n)
  implicit none
  integer n,i

  real*8 y(n),x(n),a

  do i=1,n
 a=a+x(i)*y(i)+x(i)
  enddo

  return
  end

When `+x(i)` removed, vectorization passes.

Compilation: ./build-x86_64-linux/gcc/gfortran -B./build-x86_64-linux/gcc -S
-Ofast -mavx2 f2.f -fno-unroll-loops -fdump-tree-vect-all

vect report says:
f2.f:7:0: note: type of def: 3.
f2.f:7:0: note: vect_is_simple_use: operand _13
f2.f:7:0: note: def_stmt: _13 = _12 + prephitmp_32;

f2.f:7:0: note: type of def: 3.
f2.f:7:0: note: Build SLP for # VUSE <.MEM_2>
_12 = *x_11(D)[_10];

f2.f:7:0: note: Build SLP failed: not grouped load # VUSE <.MEM_2>
_12 = *x_11(D)[_10];


[Bug target/49002] 128-bit AVX load incorrectly becomes 256-bit AVX load

2011-05-18 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49002

--- Comment #2 from Yukhin Kirill  2011-05-18 
08:24:10 UTC ---
Created attachment 24278
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24278
The patch

Hi,
Here is fix for the bug. I made bootrstrap and make check on 4.6
BTW, it also have to be committed to trunk, since the problem is there is well

K


[Bug middle-end/49465] [4.7 Regression] Revision 175114 miscompiled 403.gcc in SPEC CPU 2006

2011-06-21 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49465

--- Comment #4 from Yukhin Kirill  2011-06-22 
04:23:34 UTC ---
(In reply to comment #3)
> Fix looking good.  Doing a cpu2k6 int test right now.

Jeffrey, could you please share your patch?


[Bug target/49547] New: LZCNT should be enabled only if ABM or LZCNT bits are set

2011-06-27 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49547

   Summary: LZCNT should be enabled only if ABM or LZCNT bits are
set
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Hi,
according to freshest Specs, we need to enable LZCNT only when ABM or LZCNT
bits of CPUID (leaf 8000_0001h) is set.
However config/i386/i386.md has:

(define_insn "clz2_abm"
  [(set (match_operand:SWI248 0 "register_operand" "=r")
(clz:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "rm")))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_ABM || TARGET_BMI"
  "lzcnt{}\t{%1, %0|%0, %1}"
  [(set_attr "prefix_rep" "1")
   (set_attr "type" "bitmanip")
   (set_attr "mode" "")])

There is no connection to BMI anymore.


[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-06-28 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #4 from Yukhin Kirill  2011-06-29 
05:06:04 UTC ---
I've dived into the problem yesterday.
Seems the problem is connected with tail call optimization.
The refined difference is below. Assembler is extracted from step-14.cc

Tail call optimization converts this code:

.cfi_startproc
pushl   %ebx
.cfi_def_cfa_offset 8
.cfi_offset 3, -8
subl$40, %esp
.cfi_def_cfa_offset 48
movl52(%esp), %eax
movl56(%esp), %ecx
movl60(%esp), %ebx
movl%eax, %edx
testb   $1, %al
je  .L1498
movl(%ebx,%ecx), %edx
movl-1(%edx,%eax), %edx
.L1498:
movl76(%esp), %eax
movl%eax, 16(%esp)
movl72(%esp), %eax
movl%eax, 12(%esp)
movl68(%esp), %eax
movl%eax, 8(%esp)
movl64(%esp), %eax
movl%eax, 4(%esp)
addl%ebx, %ecx
movl%ecx, (%esp)
call*%edx
addl$40, %esp
.cfi_def_cfa_offset 8
popl%ebx
.cfi_def_cfa_offset 4
.cfi_restore 3
ret

To the following tail call optimized
.cfi_startproc
subl$8, %esp
.cfi_def_cfa_offset 12
movl%ebx, (%esp)
movl%esi, 4(%esp)
movl16(%esp), %eax
movl20(%esp), %ecx
movl24(%esp), %ebx
.cfi_offset 6, -8
.cfi_offset 3, -12
movl%eax, %edx
testb   $1, %al
je  .L1498
movl(%ebx,%ecx), %edx
movl-1(%edx,%eax), %edx
.L1498:
movl40(%esp), %eax
movl%eax, 28(%esp)
movl36(%esp), %esi
movl%esi, 24(%esp)
movl32(%esp), %esi
movl%esi, 20(%esp)
movl%eax, 16(%esp)
addl%ebx, %ecx
movl%ecx, 12(%esp)
movl(%esp), %ebx
movl4(%esp), %esi
addl$8, %esp
.cfi_def_cfa_offset 4
.cfi_restore 6
.cfi_restore 3
jmp *%edx

I've prepared to assemblers of step-14 with the only difference mentioned
above.
dealII compiled with first snippet works just fine, while tail-optimized case
gives SegFault

I believe the problem is that stack adjustment is wrong here. 
Continuing looking into


[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-06-29 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #5 from Yukhin Kirill  2011-06-29 
12:24:25 UTC ---
Problem here is that GCC incorrectly stores arguments to stack in case of
tail-call opt.
Here is snippet
movl40(%esp), %eax
movl%eax, 28(%esp)
movl36(%esp), %esi
movl%esi, 24(%esp)
movl32(%esp), %esi
movl%esi, 20(%esp)
movl%eax, 16(%esp)

Argument from 28(%esp) is not copied to 28(%esp) at all.
Correct sequence must be (semantically) like that:
movl40(%esp), %esi ; <- Use esi to move memory
movl28(%esp), %eax ; <- Save overlapping value
movl%esi, 28(%esp)
movl36(%esp), %esi
movl%esi, 24(%esp)
movl32(%esp), %esi
movl%esi, 20(%esp)
movl%eax, 16(%esp) ; <- Store saved value

Working toward the patch


[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-06-30 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #6 from Yukhin Kirill  2011-06-30 
15:11:41 UTC ---
I've looked into tail-call opt. Seems we need not call it at all if we have
new/old stack addresses for parameters overlap. BTW, I think it is to
conservative, anyway...
We have call to pointer and passing of 5 params. Last param is out of our
interest, but first 4 do. 
We have in expand:
  GIMPLE snippet:
D.172468_17 = MEM[(struct cons &)&arg_refs + 12].head;
D.172469_18 = MEM[(struct cons &)&arg_refs + 8].head;
D.172470_19 = MEM[(struct cons &)&arg_refs + 4].head;
D.172471_20 = MEM[(struct cons &)&arg_refs];
D.172462_21 = (sizetype) fun_ptr$__delta_26;
D.172463_22 = obj_3(D) + D.172462_21;
fun_ptr$__pfn_23 (D.172463_22, D.172471_20, D.172470_19, D.172469_18,
D.172468_17); [tail call]

And subsequently expanding it we have RTL:
(insn 19 18 20 4 (set (reg/f:SI 80)
(mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 28 [0x1c])) [0 MEM[(struct cons &)&arg_refs +
12].head+0 S4 A32])) include/base/thread_management.h:1534 -1
 (nil))

(insn 20 19 21 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 16 [0x10])) [0 S4 A32])
(reg/f:SI 80)) include/base/thread_management.h:1534 -1
 (nil))

(insn 21 20 22 4 (set (reg/f:SI 81)
(mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 24 [0x18])) [0 MEM[(struct cons &)&arg_refs +
8].head+0 S4 A32])) include/base/thread_management.h:1534 -1
 (nil))

(insn 22 21 23 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 12 [0xc])) [0 S4 A32])
(reg/f:SI 81)) include/base/thread_management.h:1534 -1
 (nil))

(insn 23 22 24 4 (set (reg/f:SI 82)
(mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 20 [0x14])) [0 MEM[(struct cons &)&arg_refs +
4].head+0 S4 A32])) include/base/thread_management.h:1534 -1
 (nil))

(insn 24 23 25 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 8 [0x8])) [0 S4 A32])
(reg/f:SI 82)) include/base/thread_management.h:1534 -1
 (nil))

(insn 25 24 26 4 (parallel [
(set (reg:SI 83)
(plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 16 [0x10])))
(clobber (reg:CC 17 flags))
]) step-14.cc:4271 -1
 (nil))

(insn 26 25 27 4 (set (reg/f:SI 84)   <
(mem/f/c:SI (reg:SI 83) [0 MEM[(struct cons &)&arg_refs]+0 S4 A32]))
include/base/thread_management.h:1534 -1 <
 (nil))

(insn 27 26 28 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 4 [0x4])) [0 S4 A32])
(reg/f:SI 84)) include/base/thread_management.h:1534 -1
 (nil))

(insn 28 27 29 4 (parallel [
(set (reg:SI 85)
(plus:SI (reg/v/f:SI 77 [ obj ])
(reg:SI 74 [ fun_ptr$__delta ])))
(clobber (reg:CC 17 flags))
]) include/base/thread_management.h:1534 -1
 (nil))

(insn 29 28 30 4 (set (mem:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32])
(reg:SI 85)) include/base/thread_management.h:1534 -1
 (nil))

(call_insn/j 30 29 31 4 (call (mem:QI (reg/f:SI 59 [ fun_ptr$__pfn ]) [0
*fun_ptr$__pfn_23 S1 A8])
(const_int 20 [0x14])) include/base/thread_management.h:1534 -1
 (nil)
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (reg/f:SI 53
virtual-incoming-args) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53
virtual-incoming-args)
(const_int 4 [0x4])) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53
virtual-incoming-args)
(const_int 8 [0x8])) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53
virtual-incoming-args)
(const_int 12 [0xc])) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI
53 virtual-incoming-args)
(const_int 16 [0x10])) [0 S4 A32]))
(nil)))


You can see that calculation of address of 4-th param is performed in different
way. We calculate a sum, store it to register, load memory from that address
and the put it on the new stack.

BUT. Predicate which check for memory overlapping looks like this:
 static bool
  mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size)
  {
HOST_WIDE_INT i;

if (addr == crtl->args.internal_arg_pointer)
  i = 0;
else if (GET_CODE (addr) == PLUS
 && XEXP (addr, 0) == crtl->args.internal_arg_pointer
 && CONST_INT_P (XEXP (addr, 1)))
  i = INTVAL (XEXP (addr, 1));
/* Return true for arg pointer based indexed addressing.  */
else if (GET_CODE (addr) == PLUS
  

[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-06-30 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #7 from Yukhin Kirill  2011-06-30 
15:22:58 UTC ---
Expanding arguments in different ways occurs because corresponding GIMPLE
statements are of different types.
For 'good' case we have expression of type
  COMPONENT_REF

While for 'bad' one it is just a 
  MEM_REF

For that different kinds we have slightly different expanding.

The different expression types comes from front-end, at least in einline phase
accesses are different:

  [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167199_17 =
MEM[(struct cons &)arg_list_2(D) + 12].head;
  [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167198_18 =
MEM[(struct cons &)arg_list_2(D) + 8].head;
  [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167197_19 =
MEM[(struct cons &)arg_list_2(D) + 4].head;
  [step-14.cc : 4271:1] D.167196_20 = MEM[(struct cons &)arg_list_2(D)];
  [include/base/thread_management.h : 1534:13] D.167205_21 = (sizetype)
fun_ptr$__delta_7;
  [include/base/thread_management.h : 1534:13] D.167204_22 = obj_1(D) +
D.167205_21;
  [include/base/thread_management.h : 1534:13] iftmp.53_23 (D.167204_22,
D.167196_20, D.167197_19, D.167198_18, D.167199_17);
  [include/base/thread_management.h : 1826:5] return;

Having all that said I believe that the issue somewhat connected to fron-end
generation.

Jason, could you prompt me something? 

Your patch changes a line which has a comment:
 /* Do array-to-pointer, function-to-pointer conversion, and ignore
top-level qualifiers as required.  */
...


[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-06-30 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #8 from Yukhin Kirill  2011-06-30 
15:26:36 UTC ---
If someone really need a quick fix, it may be done like this:
gcc/expor.s:

  static bool
  mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size)
  {
HOST_WIDE_INT i;

if (addr == crtl->args.internal_arg_pointer)
  i = 0;
else if (GET_CODE (addr) == PLUS
 && XEXP (addr, 0) == crtl->args.internal_arg_pointer
 && CONST_INT_P (XEXP (addr, 1)))
  i = INTVAL (XEXP (addr, 1));
/* Return true for arg pointer based indexed addressing.  */
else if (GET_CODE (addr) == PLUS
 && (XEXP (addr, 0) == crtl->args.internal_arg_pointer
 || XEXP (addr, 1) == crtl->args.internal_arg_pointer))
  else if (GET_CODE(addr) == REG)   <---
return true;
  else
return false;

But we possibly will be to conservative doing so


[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-06-30 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #9 from Yukhin Kirill  2011-06-30 
15:37:04 UTC ---
One more point for FE guys.
Function definition have no difference between 4 args. Here it is

include/base/thread_management.h:
template 
static inline void do_call (PFun fun_ptr,
C   &obj,
ArgList &arg_list,
internal::return_value &ret_val,
const int2type<4> &)
  {
ret_val.set ((obj.*fun_ptr) (arg_list.template get<0>(),
 arg_list.template get<1>(),
 arg_list.template get<2>(),
 arg_list.template get<3>()));
  }


[Bug c++/49639] New: [4.7 Regression] 447.dealII in SPEC CPU 2006 runtime fail

2011-07-05 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49639

   Summary: [4.7 Regression] 447.dealII in SPEC CPU 2006 runtime
fail
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Created attachment 24687
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24687
Build log

Hi,
I've built 447.dealII without optimizations (g++ -O0 -c ...)
And got segmentation fault.
This fail starts from this checkin
http://gcc.gnu.org/ml/gcc-cvs/2011-06/msg00832.html

Here is BT
$ gdb ./dealII
^[[?1034hGNU gdb (GDB) Fedora (7.2-51.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from
/export/users/kyukhin/specs/spec2006_w/benchspec/CPU2006/447.dealII/build/t/dealII...(no
debugging symbo\
ls found)...done.
(gdb) r
Starting program:
/export/users/kyukhin/specs/spec2006_w/benchspec/CPU2006/447.dealII/build/t/dealII

Program received signal SIGSEGV, Segmentation fault.
0x in ?? ()
Missing separate debuginfos, use: debuginfo-install glibc-2.13-1.x86_64
(gdb) bt
#0  0x in ?? ()
#1  0x0059ac1c in
internal::GridReordering3d::Mesh::build_connectivity() ()
#2  0x0059a805 in
internal::GridReordering3d::Mesh::Mesh(std::vector,
std::allocator > > cons\
t&) ()
#3  0x0059af80 in
internal::GridReordering3d::Orienter::Orienter(std::vector,
std::allocator \
> > const&) ()
#4  0x0059b021 in
internal::GridReordering3d::Orienter::orient_mesh(std::vector,
std::allocator > >&) ()
#5  0x0059bb2e in
GridReordering<3>::reorder_cells(std::vector,
std::allocator > >&) ()
#6  0x0057e839 in GridGenerator::hyper_ball(Triangulation<3>&, Point<3>
const&, double) ()
#7  0x00643aa2 in
Data::Exercise_2_3<3>::create_coarse_grid(Triangulation<3>&) ()
#8  0x00649612 in Data::SetUp,
3>::create_coarse_grid(Triangulation<3>&) const ()
#9  0x00644a8d in Framework<3>::run(Framework<3>::ProblemDescription
const&) ()
#10 0x00643d92 in main ()

Seems we have a zero-pointed call:
(gdb) fr 1
#1  0x0059ac1c in
internal::GridReordering3d::Mesh::build_connectivity() ()
(gdb) disassemble
...
   0x0059ac14 <+574>:   mov%rax,%rdi
   0x0059ac17 <+577>:   callq  0x0
=> 0x0059ac1c <+582>:   movl   $0x0,-0x6c(%rbp)
   0x0059ac23 <+589>:   mov-0x40(%rbp),%esi
...


[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-07-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #13 from Yukhin Kirill  2011-07-06 
08:47:20 UTC ---
I agree, that there is no problem with GIMPLE. As I mentioned we may just
forbid tailcall opt for non-MEMREFS, but I suspect it will lead to significant
perf. degradation. 
BTW, I am to extract simple testcase by now


[Bug c++/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-07-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #14 from Yukhin Kirill  2011-07-06 
10:25:01 UTC ---
Created attachment 24700
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24700
Reduced testcase


[Bug middle-end/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-07-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #16 from Yukhin Kirill  2011-07-06 
10:35:03 UTC ---
Yes.

This is because expander prepares arguments like this:
...
(insn 6 5 7 2 (parallel [
(set (reg:SI 64)
(plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) 2.cc:103 -1
 (nil))

(insn 7 6 8 2 (set (reg:SI 65)
(mem/c:SI (plus:SI (reg:SI 64)
(const_int 12 [0xc])) [0 MEM[(int &)&t + 12]+0 S4 A32]))
2.cc:103 -1
 (nil))

(insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 12 [0xc])) [0 S4 A32])
(reg:SI 65)) 2.cc:103 -1
 (nil))
...

So, calls.c/mem_overlaps_already_clobbered_arg_p unable to determine, that data
come from stack, so it returns true and enables tailcall.


[Bug middle-end/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-07-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #19 from Yukhin Kirill  2011-07-06 
11:49:34 UTC ---
Created attachment 24701
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24701
Patch to make tailcall check more conservative

Attached patch adds another check for clobbered stack area.
If address comes from a register - we have no idea about destination address.
That means we must act in conservative way - address possibly overlaps with
stack area of interest.


[Bug middle-end/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-07-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #20 from Yukhin Kirill  2011-07-06 
11:50:51 UTC ---
With patch attached both tescase and 447.dealII passing


[Bug middle-end/49519] [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

2011-07-06 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519

--- Comment #22 from Yukhin Kirill  2011-07-06 
11:57:21 UTC ---
(In reply to comment #21)
> On Wed, 6 Jul 2011, kirill.yukhin at intel dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519
> > 
> > --- Comment #19 from Yukhin Kirill  
> > 2011-07-06 11:49:34 UTC ---
> > Created attachment 24701 [details]
> >   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24701
> > Patch to make tailcall check more conservative
> > 
> > Attached patch adds another check for clobbered stack area.
> > If address comes from a register - we have no idea about destination 
> > address.
> > That means we must act in conservative way - address possibly overlaps with
> > stack area of interest.
> 
> That looks reasonable.  Can you bootstrap & test this fix and post it to
> gcc-patches?

Already in progress :)


[Bug target/49547] LZCNT should be enabled only if ABM or LZCNT bits are set

2011-07-26 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49547

--- Comment #2 from Yukhin Kirill  2011-07-27 
05:04:04 UTC ---
Patch prepared.
Discussion is here:
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02266.html


[Bug target/49547] LZCNT should be enabled only if ABM or LZCNT bits are set

2011-07-27 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49547

Yukhin Kirill  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED

--- Comment #3 from Yukhin Kirill  2011-07-27 
17:58:56 UTC ---
Changes approved and checked in.


[Bug target/49547] LZCNT should be enabled only if ABM or LZCNT bits are set

2011-07-27 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49547

Yukhin Kirill  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution||FIXED

--- Comment #5 from Yukhin Kirill  2011-07-27 
18:06:59 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > Changes approved and checked in.
> 
> When was it checked in? Where is the approved patch?

My fault, I mix it up with BMI testcases which were recently approved. Sorry


[Bug bootstrap/49964] New: Bootstrap failed with AVX turned on

2011-08-03 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49964

   Summary: Bootstrap failed with AVX turned on
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: kirill.yuk...@intel.com


Revision 177268 failed to bootstrap with AVX enabled.


[Bug bootstrap/49964] Bootstrap failed with AVX turned on

2011-08-03 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49964

--- Comment #1 from Yukhin Kirill  2011-08-03 
14:28:55 UTC ---
Started from here
http://gcc.gnu.org/ml/gcc-regression/2011-08/msg00051.html