Re: [x86-64-psABI] RFC: Add R_X86_64_RELAX_PC32 and R_X86_64_RELAX_PLT32

2015-05-13 Thread H.J. Lu
On Tue, May 12, 2015 at 11:22 PM, Jan Beulich  wrote:
 On 12.05.15 at 20:42,  wrote:
>> Here is the updated proposal.  I changed nop prefix from 0x48
>> to 0x67 and clarified how foo@GOTPCREL(%rip) should be
>> resolved.
>
> Mind clarifying how 67 is better than 48?

0x67 works for both x86-64 and i386.  We can use the same byte
for the "relax" prefix.

>> I am proposing to add 2 new relocations, R_X86_64_RELAX_PC32 and
>> R_X86_64_RELAX_PLT32:
>>
>> 1. They can only be used on 32-bit direct call/jmp instructions.
>> 2. call/jmp instructions must have a 0x67 prefix, which is the address
>> size prefix and is ignored by 32-bit direct call/jmp instructions.
>
> The same could have been said several years ago about segment
> overrides used with conditional branches, yet they obtained a
> meaning (even if only affecting performance, not correctness). Is
> it anywhere publicly stated that the address size override will
> continue to be ignored?

I will ask to put it in Intel SDM.


-- 
H.J.


Re: Missing barrier in outof_cfglayout

2015-05-13 Thread Georg-Johann Lay

Am 05/12/2015 um 05:13 PM schrieb Jeff Law:

On 05/12/2015 08:58 AM, Georg-Johann Lay wrote:

Ah, yes.  The ICE is actually in verify_flow_info: "wrong number of
branch edges after unconditional jump in bb 4".

It starts with an almost trivial jump table:


(jump_insn 82 81 83 19 (parallel [
 (set (pc)
 (reg/f:SI 60))
 (use (label_ref 83))
 ])
  (nil)
  -> 83)
;;  succ:   20 [50.0%]
;;  31 [50.0%]

;; Insn is not within a basic block
(code_label 83 82 84 15 "" [3 uses])
;; Insn is not within a basic block
(jump_table_data 84 83 85 (addr_vec:SI [
 (label_ref:SI 130)
 (label_ref:SI 86)
 (label_ref:SI 130)
 (label_ref:SI 130)
 ]))

At .cse1, cse.c:cse_insn executes

   /* We don't normally have an insn matching (set (pc) (pc)), so
  check for this separately here.  We will delete such an
  insn below.

  For other cases such as a table jump or conditional jump
  where we know the ultimate target, go ahead and replace the
  operand.  While that may not make a valid insn, we will
  reemit the jump below (and also insert any necessary
  barriers).  */
   if (n_sets == 1 && dest == pc_rtx
   && (trial == pc_rtx
   || (GET_CODE (trial) == LABEL_REF
   && ! condjump_p (insn
 {
   /* Don't substitute non-local labels, this confuses CFG.  */
   if (GET_CODE (trial) == LABEL_REF
   && LABEL_REF_NONLOCAL_P (trial))
 continue;

   SET_SRC (sets[i].rtl) = trial;
   cse_jumps_altered = true;
   break;
 }

on the jump insn and with trial = (label_ref 83).

The code pointed out by Jeff then replaces the original jump insn by

(jump_insn 139 81 82 16 (set (pc)
 (label_ref 83)) bug.c:33 -1
  (nil))

That's obviously nonsense as it jumps to the jump_table_data!

It's likely nonsensical.  You'd have to figure out why TRIAL has that seemingly



hmmm, it  would actually lead to valid machine code:  The jump table is 
actually a dispatch table, i.e. it contains a list of direct jumps to the case 
labels.  casesi loads the value of the label associated with the jump table 
into a register, adds an offset, and then jumps to that address (which in turn 
dispatches to the very cases via a direct jump).


In this specific case cse finds out that the offset (reg 59 below) is always 0, 
i.e. the offset computation will always yield the value of the label associated 
to the jump table (code_label 84), and then jumps to that label.


When expanded, casesi will yield something like

(note 138 77 78 19 [bb 19] NOTE_INSN_BASIC_BLOCK)

(insn 78 138 79 19 (set (reg:SI 61)
(high:SI (label_ref:SI 84

(insn 79 78 80 19 (set (reg:SI 60)
   (lo_sum:SI (reg:SI 61)
  (label_ref:SI 84

(insn 80 79 81 19 (set (reg:SI 59)
   (mult:SI (reg:SI 59)
(const_int 4

(insn 81 80 82 19 (set (reg:SI 60)
   (plus:SI (reg:SI 60)
(reg:SI 59

(jump_insn 82 81 83 19 (parallel [(set (pc)
   (reg:SI 60))
  (use (label_ref 84))])
 -> 84)
;;  succ:   20 [50.0%]
;;  31 [50.0%]

(barrier 83 82 84)

;; Insn is not within a basic block
(code_label 84 83 85 15 "" [3 uses])

;; Insn is not within a basic block
(jump_table_data 85 84 86 (addr_vec:SI [
(label_ref:SI 132)
(label_ref:SI 87)
(label_ref:SI 132)
(label_ref:SI 132)
]))
(barrier 86 85 87)



bogus value.  I would have expected it to have one of the entries from the
ADDR_VEC.


Other backends are also using dispatch tables and are jumping into the table, 
and internal docs for casesi read: "Instruction to jump through a dispatch 
table"...


Wrapping the tablejump's register (reg 60 above) into UNSPEC fixes the problem, 
but just because there is no insn to load such an unspec into a register and 
hence cse does not perform respective optimization.



But even with the right value in TRIAL, we shouldn't be slamming on the SET_SRC
of the jump, but instead should be using the routines from cfgrtl.c to
manipulate the CFG and RTL datastructures properly.

jeff


I am having the trouble with 4.9, but the current trunk still contains the same 
sequence...


Given the routines from from cfgrtl.c were used:  would it then be legal to 
jump to a label which is not contained in any basic block?


Johann



Re: gcc -S vs clang -S

2015-05-13 Thread Martin Sebor

On 05/12/2015 07:40 PM, Andrew Pinski wrote:

On Tue, May 12, 2015 at 6:36 PM, Fei Ding  wrote:

I think Thiago and Eric just want to know which code-gen is better and why...



You need to understand for a complex process (CISC ISAs) like x86,
there is no one right answer sometimes.  You need to look at each
micro-arch and understand the pipeline.  Sometimes different code
stream will performance the same but it also depends on the code size
too.


A good place to start is the Intel 64 and IA-32 Architectures
Optimization Reference Manual. It lists the throughput and
latencies of x86 instructions and gives guidance for which
ones might be more efficient on which processors. For example,
in the section titled Using LEA it discusses why the three
operand form of the instruction is slower on the Sandy Bridge
microarchitecture than on others:

http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf

Martin



Thanks,
Andrew Pinski



2015-05-12 23:29 GMT+08:00 Eric Botcazou :

Note that at -O3 there is a difference still:
clang (3.6.0):
 addl%esi, %edi
 movl%edi, %eax
 retq

gcc (4.9.2)
 leal(%rdi,%rsi), %eax
 ret

Can't tell which is best, if any.


But what's your point exactly here?  You cannot expect different compilers to
generate exactly the same code on a given testcase for non-toy architectures.

Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org

--
Eric Botcazou




Re: gcc -S vs clang -S

2015-05-13 Thread Jakub Jelinek
On Wed, May 13, 2015 at 08:41:39AM -0600, Martin Sebor wrote:
> On 05/12/2015 07:40 PM, Andrew Pinski wrote:
> >On Tue, May 12, 2015 at 6:36 PM, Fei Ding  wrote:
> >>I think Thiago and Eric just want to know which code-gen is better and 
> >>why...
> >
> >
> >You need to understand for a complex process (CISC ISAs) like x86,
> >there is no one right answer sometimes.  You need to look at each
> >micro-arch and understand the pipeline.  Sometimes different code
> >stream will performance the same but it also depends on the code size
> >too.
> 
> A good place to start is the Intel 64 and IA-32 Architectures
> Optimization Reference Manual. It lists the throughput and
> latencies of x86 instructions and gives guidance for which
> ones might be more efficient on which processors. For example,
> in the section titled Using LEA it discusses why the three
> operand form of the instruction is slower on the Sandy Bridge
> microarchitecture than on others:
> 
> http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf

But leal (%rdi,%rsi), %eax is not the slower case it talks about.
Furthermore, supposedly the generic tuning is used, in which case it really
doesn't matter that much if it is slower or faster on a particular CPU,
but if it is in general slower or faster on the whole basket of CPUs that
the generic tuning is based on.

Jakub


target attributes, pragmas and preprocessor macros

2015-05-13 Thread Kyrill Tkachov

Hi all,

Are target attributes supposed to redefine the preprocessor macros available?
For example, on aarch64 if the file is compiled with floating point support
the __ARM_FEATURE_FMA predefine is available. If the user adds to a function
a target attribute disabling floating point, then is __ARM_FEATURE_FMA supposed
to be undefined in the body of that function?

Looking at some backends, it seems that only #pragmas are supposed to have that 
effect,
but I just wanted to confirm.

Thanks,
Kyrill



Broken test gcc.target/i386/sibcall-2.c

2015-05-13 Thread Alexander Monakov
Hello,

Last year's x86 sibcall improvements added a currently xfailed test:

  /* { dg-do compile { target ia32 } } */
  /* { dg-options "-O2" } */

  extern int doo1 (int);
  extern int doo2 (int);
  extern void bar (char *);

  int foo (int a)
  {
char s[256];
bar (s);
return (a < 0 ? doo1 : doo2) (a);
  }

  /* { dg-final { scan-assembler-not "call\[ \t\]*.%eax" { xfail *-*-* } } } */

It was xfailed by https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00016.html

Can you tell me what the test is supposed to test?  A tail call is impossible
here, because 'bar' might save the address of 's' in a global variable, and
therefore 's' must be live when 'doo1' or 'doo2' are invoked.

Should we remove or unbreak this test?

Thanks.
Alexander


Re: Broken test gcc.target/i386/sibcall-2.c

2015-05-13 Thread Alexander Monakov
Ah.  I realize it's most likely for testing sibcall_[value]_pop_memory
peepholes, right?  In which case the testcase might look like this:

  /* { dg-do compile } */
  /* { dg-options "-O2" } */

  void foo (int a, void (**doo1) (void), void (**doo2) (void))
  {
char s[16] = {0};
do s[a] = 1; while (a &= a-1);
(*(s[8] ? doo1 : doo2)) ();
  }

  /* { dg-final { scan-assembler-not "call" } } */

However on the above testcase memory-indirect jump is currently generated only
for 64-bit x86.  With -mx32 it's impossible, but with -m32 the peephole
doesn't match.  Is that expected?

Can you also tell me why ..._pop call and sibcall instructions are predicated
on !TARGET_64BIT?

Thanks.

Alexander


Re: Fwd: xtensa PR65730

2015-05-13 Thread Richard Henderson
On 04/10/2015 06:38 AM, Max Filippov wrote:
> OTOH calling helper function to do multiplication by a constant 8 looks
> rather stupid. I guess we're not going to have non-8-bit bytes on xtensa
> anytime soon, maybe this multiplication can be replaced with shift?

Yes, that's what I'd do.


r~


[gomp4] Basic -misa support for nvptx (was: How to use old GPU (Fermi) in gcc with OpenACC?)

2015-05-13 Thread Thomas Schwinge
Hi!

On Sat, 9 May 2015 10:26:22 -0700, Satoshi_OHSHIMA  
wrote:
> I'm trying to use and evaluate gcc with OpenACC on some NVIDIA GPUs.
> I succeeded to build gcc with OpenACC by using
> http://scelementary.com/2015/04/25/openacc-in-gcc.html as a reference.

Heh, their build instructions very much look like the ones I provided in
:
trunk-offload-big.tar.bz2, trunk-offload-light.tar.bz2 (no problem with
them reusing these, of course).

> Then, I succeeded to use Kepler GPU.

:-)

> However, I tried to use it on old GPUs (Fermi), and I failed to execute it.
> I noticed that there are some "sm_30" and "COMPUTE_30" keywords in gcc and
> nvptx sources.
> Then, I modified them to "sm_20" and "COMPUTE_20", but I failed to execute
> my programs, too.
> Are there any developers who can make gcc with OpenACC to support other than
> "sm_30"?

"Can", yes, but this is unlikely to happen: a nontrivial amount of work
would be required to get the current code (and, in particular, our
patches under development) working on what nowadays is probably
considered "legacy" hardware.  (For example, if I remember correctly,
didn't Nvidia remove support for Fermi-class hardware from recent CUDA
toolkit releases?)

However, I committed the following patch to gomp-4_0-branch in r223182,
and you're of course very welcome to follow that route, and contribute
patches to properly conditionalize the respective PTX instructions,
provide replacement functions, and so on, in gcc/config/nvptx/nvptx.md,
libgcc/config/nvptx/, and probably other locations.

To use this patch, you'll also need to update your nvptx-tools sources.

commit 29001da9572e094164e1fca440925fafbceb67f2
Author: tschwinge 
Date:   Wed May 13 21:25:42 2015 +

Basic -misa support for nvptx

gcc/
* config/nvptx/nvptx-opts.h: New file.
* config/nvptx/nvptx.c (nvptx_file_start): Print the correct .target.
* config/nvptx/nvptx.h: Include "nvptx-opts.h".
(ASM_SPEC): Define.
(TARGET_SM35): New macro.
* config/nvptx/nvptx.md (atomic_fetch_): Enable with the
correct predicate.
* config/nvptx/nvptx.opt (ptx_isa, sm_30, sm_35): New enum and its
values.
(misa=): New option.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@223182 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp| 13 +
 gcc/config/nvptx/nvptx-opts.h | 31 +++
 gcc/config/nvptx/nvptx.c  |  5 -
 gcc/config/nvptx/nvptx.h  |  8 
 gcc/config/nvptx/nvptx.md |  3 +--
 gcc/config/nvptx/nvptx.opt| 14 ++
 6 files changed, 71 insertions(+), 3 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index a4683c3..f43f668 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,16 @@
+2015-05-13  Bernd Schmidt  
+
+   * config/nvptx/nvptx-opts.h: New file.
+   * config/nvptx/nvptx.c (nvptx_file_start): Print the correct .target.
+   * config/nvptx/nvptx.h: Include "nvptx-opts.h".
+   (ASM_SPEC): Define.
+   (TARGET_SM35): New macro.
+   * config/nvptx/nvptx.md (atomic_fetch_): Enable with the
+   correct predicate.
+   * config/nvptx/nvptx.opt (ptx_isa, sm_30, sm_35): New enum and its
+   values.
+   (misa=): New option.
+
 2015-05-13  Cesar Philippidis  
 
* except.c (finish_eh_generation): Don't finalize exeception
diff --git gcc/config/nvptx/nvptx-opts.h gcc/config/nvptx/nvptx-opts.h
new file mode 100644
index 000..512c37a
--- /dev/null
+++ gcc/config/nvptx/nvptx-opts.h
@@ -0,0 +1,31 @@
+/* Definitions for the NVPTX port needed for option handling.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt 
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#ifndef NVPTX_OPTS_H
+#define NVPTX_OPTS_H
+
+enum ptx_isa
+{
+  PTX_ISA_SM30,
+  PTX_ISA_SM35
+};
+
+#endif
+
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index 10ac976..9bec12f 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -2048,7 +2048,10 @@ nvptx_file_start (void)
 {
   fputs ("// BEGIN PREAMBLE\n", asm_out_file);
   fputs ("\t.version\t3.1\n", asm_out_file);
-  fputs ("\t.target\tsm_30\n", asm_out_file);
+  if (TARGET_SM35)
+fput

gcc-4.9-20150513 is now available

2015-05-13 Thread gccadmin
Snapshot gcc-4.9-20150513 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150513/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 223185

You'll find:

 gcc-4.9-20150513.tar.bz2 Complete GCC

  MD5=b70568dee84bb255c79e0a7d0520f9b2
  SHA1=f7380eaea6e690a8c8ec8d068f3cc76ac276bf64

Diffs from 4.9-20150506 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [gomp4] Basic -misa support for nvptx (was: How to use old GPU (Fermi) in gcc with OpenACC?)

2015-05-13 Thread Joseph Myers
On Wed, 13 May 2015, Thomas Schwinge wrote:

>   * config/nvptx/nvptx.opt (ptx_isa, sm_30, sm_35): New enum and its
>   values.
>   (misa=): New option.

New options do of course need documenting in invoke.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com