from:"Matt"

This doesn't change any functionality, it just moves and cleans up a
large number of complicated macros in vax.h to normal C code in vax.c.
It's the first major step to integrating PIC support that I did for
gcc 2.95.3.  It also switches from using SYMBOL_REF_FLAG to
SYMBOL_REF_LOCAL_P.
Committed.
--
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.
2005-03-26  Matt Thomas <[EMAIL PROTECTED]>

* config/vax/vax.c (legitimate_constant_address_p): New.  Formerly
CONSTANT_ADDRESS_P in config/vax/vax.h
(legitimate_constant_p): New.  Formerly CONSTANT_P in vax.h. 
(INDEX_REGISTER_P): New.
(BASE_REGISTER_P): New.
(indirectable_constant_address_p): New.  Adapted from
INDIRECTABLE_CONSTANT_ADDRESS_P in vax.h.
Use SYMBOL_REF_LOCAL_P.
(indirectable_address_p): New.  Adapted from
INDIRECTABLE_ADDRESS_P in vax.h.
(nonindexed_address_p): New.  Adapted from
GO_IF_NONINDEXED_ADDRESS in vax.h.
(index_temp_p): New.  Adapted from
INDEX_TERM_P in vax.h.
(reg_plus_index_p): New.  Adapted from
GO_IF_REG_PLUS_INDEX in vax.h.
(legitimate_address_p): New.  Adapted from
GO_IF_LEGITIMATE_ADDRESS in vax.h
(vax_mode_dependent_address_p): New.  Adapted from
GO_IF_MODE_DEPENDENT_ADDRESS in vax.h
* config/vax/vax.h (CONSTANT_ADDRESS_P): Use
legitimate_constant_address_p
(CONSTANT_P): Use legitimate_constant_p.
(INDIRECTABLE_CONSTANT_ADDRESS_P): Removed.
(INDIRECTABLE_ADDRESS_P): Removed.
(GO_IF_NONINDEXED_ADDRESS): Removed.
(INDEX_TEMP_P): Removed.
(GO_IF_REG_PLUS_INDEX): Removed.
(GO_IF_LEGITIMATE_ADDRESS): Use legitimate_address_p.
Two definitions, depending on whether REG_OK_STRICT is defined.
(GO_IF_MODE_DEPENDENT_ADDRESS): Use vax_mode_dependent_address_p.
Two definitions, depending on whether REG_OK_STRICT is defined.
* config/vax/vax-protos.h (legitimate_constant_address_p): Prototype
added.
(legitimate_constant_p): Prototype added.
(legitimate_address_p): Prototype added.
(vax_mode_dependent_address_p): Prototype added.


Index: vax.c
===
RCS file: /cvs/gcc/gcc/gcc/config/vax/vax.c,v
retrieving revision 1.60
diff -u -3 -p -r1.60 vax.c
--- vax.c   7 Apr 2005 21:44:57 -   1.60
+++ vax.c   26 Apr 2005 20:45:42 -
@@ -1100,3 +1100,227 @@ vax_output_conditional_branch (enum rtx_
 }
 }
 
+/* 1 if X is an rtx for a constant that is a valid address.  */
+
+int
+legitimate_constant_address_p (rtx x)
+{
+  return (GET_CODE (x) == LABEL_REF || GET_CODE (x) == SYMBOL_REF
+ || GET_CODE (x) == CONST_INT || GET_CODE (x) == CONST
+ || GET_CODE (x) == HIGH);
+}
+
+/* Nonzero if the constant value X is a legitimate general operand.
+   It is given that X satisfies CONSTANT_P or is a CONST_DOUBLE.  */
+
+int
+legitimate_constant_p (rtx x ATTRIBUTE_UNUSED)
+{
+  return 1;
+}
+
+/* The other macros defined here are used only in legitimate_address_p ().  */
+
+/* Nonzero if X is a hard reg that can be used as an index
+   or, if not strict, if it is a pseudo reg.  */
+#defineINDEX_REGISTER_P(X, STRICT)
+(GET_CODE (X) == REG && (!(STRICT) || REGNO_OK_FOR_INDEX_P (REGNO (X
+
+/* Nonzero if X is a hard reg that can be used as a base reg
+   or, if not strict, if it is a pseudo reg.  */
+#defineBASE_REGISTER_P(X, STRICT)
+(GET_CODE (X) == REG && (!(STRICT) || REGNO_OK_FOR_BASE_P (REGNO (X
+
+#ifdef NO_EXTERNAL_INDIRECT_ADDRESS
+
+/* Re-definition of CONSTANT_ADDRESS_P, which is true only when there
+   are no SYMBOL_REFs for external symbols present.  */
+
+static int
+indirectable_constant_address_p (rtx x)
+{
+  if (!CONSTANT_ADDRESS_P (x))
+return 0;
+  if (GET_CODE (x) == CONST && GET_CODE (XEXP ((x), 0)) == PLUS)
+x = XEXP (XEXP (x, 0), 0);
+  if (GET_CODE (x) == SYMBOL_REF && !SYMBOL_REF_LOCAL_P (x))
+return 0;
+
+  return 1;
+}
+
+#else /* not NO_EXTERNAL_INDIRECT_ADDRESS */
+
+static int
+indirectable_constant_address_p (rtx x)
+{
+  return CONSTANT_ADDRESS_P (x);
+}
+
+#endif /* not NO_EXTERNAL_INDIRECT_ADDRESS */
+
+/* Nonzero if X is an address which can be indirected.  External symbols
+   could be in a sharable image library, so we disallow those.  */
+
+static int
+indirectable_address_p(rtx x, int strict)
+{
+  if (indirectable_constant_address_p (x))
+return 1;
+  if (BASE_REGISTER_P (x, strict))
+return 1;
+  if (GET_CODE (x) == PLUS
+  && BASE_REGISTER_P (XEXP (x, 0), stric

GCC 4.1: Buildable on GHz machines only?

Over the past month I've been making sure that GCC 4.1 works on NetBSD.
I've completed bootstraps on sparc, sparc64, arm, x86_64, i386, alpha,
mipsel, mipseb, and powerpc.  I've done cross-build targets for vax.
Results have been sent to gcc-testsuite.
The times to complete bootstraps on older machines has been bothering me.
It took nearly 72 hours for 233MHz StrongArm with 64MB to complete a
bootstrap (with libjava).  It took over 48 hours for a 120MHz MIPS R4400
(little endian) with 128MB to finish (without libjava) and a bit over 24
hours for a 250MHz MIPS R4400 (big endian) with 256MB to finish (again,
no libjava).  That doesn't even include the time to run the testsuites.
I have a 50MHz 68060 with 96MB of memory (MVME177) approaching 100 hours
(48 hours just to exit stage3 and start on the libraries) doing a bootstrap
knowing that it's going to die when doing the ranlib of libjava.  The kernel
for the 060 isn't configured with a large enough dataspace to complete the
ranlib.
Most of the machines I've listed above are relatively powerful machines
near the apex of performance of their target architecture.  And yet GCC4.1
can barely be bootstrapped on them.
I do most of my GCC work on a 2GHz x86_64 because it's so fast.  I'm afraid
the widespread availability of such fast machines hides the fast that the
current performance of GCC on older architectures is appalling.
I'm going to run some bootstraps with --disable-checking just to see how
much faster they are.  I hope I'm going to pleasantly surprised but I'm
not counting on it.
--
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Re: GCC 4.1: Buildable on GHz machines only?

Richard Henderson wrote:
> On Tue, Apr 26, 2005 at 10:57:07PM -0400, Daniel Jacobowitz wrote:
> 
>>I would expect it to be drastically faster.  However this won't show up
>>clearly in the bootstrap.  The, bar none, longest bit of the bootstrap
>>is building stage2; and stage1 is always built with optimization off and
>>(IIRC) checking on.
> 
> 
> Which is why I essentially always supply STAGE1_CFLAGS='-O -g' when
> building on risc machines.

Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was
already doing) only decreased the bootstrap time by 10%.  By far, the
longest bit of the bootstrap is building libjava.

-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

[RFA] Which is better? More and simplier patterns? Fewer patterns with more embedded code?

Back when I modified gcc 2.95.3 to produce PIC code for NetBSD/vax, I changed
the patterns in vax.md to be more specific with the instructions that got
matched.  The one advantage (to me as the writer) was it made it much easier
to track down what pattern caused what instruction to be emitted.

For instance:

(define_insn "*pushal"
  [(set (match_operand:SI 0 "push_operand" "=g")
(match_operand:SI 0 "address_operand" "p"))]
  ""
  "pushal %a1")

I like the more and simplier patterns approach but I'm wondering what
the general recommendation is?
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Re: GCC 4.1: Buildable on GHz machines only?

Gary Funck wrote:
> 
>>-Original Message-
>>From: Matt Thomas
>>Sent: Tuesday, April 26, 2005 10:42 PM
> 
> [...]
> 
>>Alas, the --disable-checking and STAGE1_CFLAGS="-O2 -g" (which I was
>>already doing) only decreased the bootstrap time by 10%.  By far, the
>>longest bit of the bootstrap is building libjava.
>>
> 
> 
> Is it fair to compare current build times, with libjava included,
> against past build times when it didn't exist?  Would a closer
> apples-to-apples comparison be to bootstrap GCC Core only on
> the older sub Ghz platforms?

libjava is built on everything but vax and mips.  Bootstrapping core
might be better but do the configure on the fly it's not as easy as
it used to be.

It would be nice if bootstrap emitted timestamps when it was started
and when it completed a stage so one could just look at the make output.

Regardless, GCC4.1 is a computational pig.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Re: GCC 4.1: Buildable on GHz machines only?

2005-04-27 Thread Matt Thomas

David Edelsohn wrote:
>>>>>>Matt Thomas writes:
> 
> 
> Matt> Regardless, GCC4.1 is a computational pig.
> 
>   If you are referring to the compiler itself, this has no basis in
> reality.  If you are referring to the entire compiler collection,
> including runtimes, you are not using a fair comparison or are making
> extreme statements without considering the cause.

When I see the native stage2 m68k compiler spend 30+ minutes compute bound
with no paging activity compiling a single source file, I believe
that is an accurate term.  Compiling stage3 on a 50MHz 68060 took 18 hours.
(That 30 minutes was for fold-const.c if you care to know).

At some points, I had no idea whether GCC had just gone into an infinite
loop due a bug or was actually doing what it was supposed to.

>   GCC now supports C++, Fortran 90 and Java.  Those languages have
> extensive, complicated runtimes.  The GCC Java environment is becoming
> much more complete and standards compliant, which means adding more and
> more features.

That's all positive but if GCC also becomes too expensive to build then
all those extra features become worthless.  What is the slowest system
that GCC has been recently bootstrapped on?

>   If your point is that fully supporting modern, richly featured
> languages results in a longer build process, that is correct.  Using
> disparaging terms like "pig" is missing the point.  As others have pointed
> out, if you do not want to build some languages and runtimes, you can
> disable them.  GCC is providing features that users want and that has a
> cost.

Yes they have a cost, but the cost is mitigated by running fast processors.
They are just so fast they can hide ineffiences and bloat.  We have seen
that for NetBSD and it's just as true for GCC or any other software.
These slower processor perform usefull feedback but only if a GCC bootstrap
is attempted on them on a semi-regular basis.

Am I the only person who has attempted to do a native bootstrap on a system
as slow as a M68k?  I thought about doing a bootstrap on a MicroSparc based
system but instead I decided to use a UltraSparcIIi system running with a
32bit kernel.
-- 
Matt Thomas     email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Re: GCC 4.1: Buildable on GHz machines only?

2005-04-27 Thread Matt Thomas

Jonathan Wakely wrote:
> On Wed, Apr 27, 2005 at 08:05:39AM -0700, Matt Thomas wrote:
> 
> 
>>David Edelsohn wrote:
>>
>>
>>> GCC now supports C++, Fortran 90 and Java.  Those languages have
>>>extensive, complicated runtimes.  The GCC Java environment is becoming
>>>much more complete and standards compliant, which means adding more and
>>>more features.
>>
>>That's all positive but if GCC also becomes too expensive to build then
>>all those extra features become worthless.
> 
> 
> Worthless to whom?

To users of that platform that can no longer afford to build GCC.

> The features under discussion are new, they didn't exist before.

And because they never existed before, their cost for older platforms
may not have been correctly assessed.  If no one builds natively on
older platforms, the recognition that the new features maybe a problem
for older platforms will never be made.

> If you survived without them previously you can do so now.
> (i.e. don't build libjava if your machine isn't capable of it)

Yes, you can skip building libjava.  But can you skip building GCC?
Will GCC 3.x be supported forever?  If not, your compiler may have
to rely being cross-built.  Being able to do a bootstrap is useful
and is part of the expected GCC testing but when it can only be
done one or two a week, it becomes a less practical test method.

> But claiming it's "worthless" when plenty of people are using it is
> just, well ... worthless.

Depends on your point of view.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Re: GCC 4.1: Buildable on GHz machines only?

2005-04-27 Thread Matt Thomas

Mike Stump wrote:
On Apr 26, 2005, at 11:12 PM, Matt Thomas wrote:
It would be nice if bootstrap emitted timestamps when it was started
and when it completed a stage so one could just look at the make  output.

You can get them differenced for free by using:
time make boostrap
I know that.  But it's only works overall.  I want the per-stage
times.  Here's a sparc64--netbsd full bootstrap including libjava
(the machine has 640MB and was doing nothing but building gcc):
25406.01 real 21249.17 user  6283.15 sys
 0  maximum resident set size
 0  average shared memory size
 0  average unshared data size
 0  average unshared stack size
  54689526  page reclaims
  5349  page faults
   110  swaps
   723  block input operations
377302  block output operations
52  messages sent
52  messages received
285329  signals received
   1037478  voluntary context switches
253151  involuntary context switches
--
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Re: GCC 4.1: Buildable on GHz machines only?

2005-04-28 Thread Matt Thomas


Someone complained I was unfair in my gcc bootstrap times since
some builds included libjava/gfortran and some did not.

So in the past day, I've done bootstrap with just c,c++,objc on
both 3.4 and gcc4.1.  I've put the results in a web page at
http://3am-software.com/gcc-speed.html.  The initial bootstrap
compiler was gcc3.3 and they are all running off the same base
of NetBSD 3.99.3.

While taking out fortran and java reduced the disparity, there
is still a large increase in bootstrap times from 3.4 to 4.1.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Re: GCC 4.1: Buildable on GHz machines only?

2005-04-29 Thread Matt Thomas

Joe Buck wrote:
> I think you need to talk to the binutils people.  It should be possible
> to make ar and ld more memory-efficient.

Even though systems maybe demand paged, having super large libraries
that consume lots of address space can be a problem.

I'd like to libjava be split into multiple shared libraries.
In C, we have libc, libm, libpthread, etc.  In X11, there's X11, Xt, etc.
So why does java have everything in one shared library?  Could
the swing stuff be moved to its own?  Are there other logical
divisions?

Unlike other modern systems with a two level page table structure,
the VAX uses a single page table of indirection.  This greatly reduces
the amount of address space a process can efficiently use.  If there
are components that will not be needed by some java programs, it would
nice if they could be separated into their shared libraries.
-- 
Matt Thomas email: [EMAIL PROTECTED]
3am Software Foundry  www: http://3am-software.com/bio/matt/
Cupertino, CA  disclaimer: I avow all knowledge of this message.

Use $(VARRAY_H) in dependencies?

2005-05-08 Thread Matt Kraai

Howdy,

The rules for c-objc-common.o, loop-unroll.o, and tree-inline.o
include $(VARRAY_H), which is never defined, in their dependency
lists.  The rest of the targets that depend on varray.h include
varray.h in their dependency list.

varray.h includes machmode.h, system.h, coretypes.h, and tm.h, so
Makefile.in should define and use VARRAY_H, right?

-- 
Matt


signature.asc
Description: Digital signature

Re: Use $(VARRAY_H) in dependencies?

2005-05-08 Thread Matt Kraai

On Sun, May 08, 2005 at 07:31:38PM -0700, Matt Kraai wrote:
> On Mon, May 09, 2005 at 03:03:23AM +0100, Paul Brook wrote:
> > On Monday 09 May 2005 02:26, Matt Kraai wrote:
> > > Howdy,
> > >
> > > The rules for c-objc-common.o, loop-unroll.o, and tree-inline.o
> > > include $(VARRAY_H), which is never defined, in their dependency
> > > lists.  The rest of the targets that depend on varray.h include
> > > varray.h in their dependency list.
> > >
> > > varray.h includes machmode.h, system.h, coretypes.h, and tm.h, so
> > > Makefile.in should define and use VARRAY_H, right?
> > 
> > Already one step ahead of you :-)
> > 
> > 2005-05-07  Paul Brook  <[EMAIL PROTECTED]>
> > 
> > * Makefile.in: Fix dependencies.
> > (GCOV_IO_H, VARRAY_H): Set.
> 
> Great.

The dependencies for the rules for build/genautomata.o,
build/varray.o, and gtype-desc.o still include varray.h instead of
$(VARRAY_H).  Is this on purpose?  If so, why?

-- 
Matt


signature.asc
Description: Digital signature

Targets

2005-12-29 Thread Matt Ritchie

Hello:
I was wondering if the team could add the following
targets to GCC\G++\G77:

Basically make it even more crossplatform compiliant
and emulator friendly
eg: add the following cpu series : 8080, z80, 6502,
6800, and cpm/8000? :)
Maybe OS Specific librarys too (eg CP/M-86\CP/M-86
Also does G77 support Fortran-66?

PS: Can I help in any way(testing the mingw port(i
don't have linux\bsd\unix\vms\os/2 or mac, just
windows and dos

Matt Ritchie

bounty available for porting AVR backend to MODE_CC

2020-02-23 Thread Matt Wette


Hi All,

I don't subscribe but wanted developers to know there is a bounty 
available for

porting the gcc AVR backend to use MODE_CC.  Here is the reference:

https://www.bountysource.com/issues/84630749-avr-convert-the-backend-to-mode_cc-so-it-can-be-kept-in-future-releases

And this is a reference to the discussion on avrfreaks.net:

https://www.avrfreaks.net/forum/avr-gcc-and-avr-g-are-deprecated-now

Matt

Function attribute((optimize(...))) ignored on inline functions?

2015-07-30 Thread Matt Turner

I'd like to tell gcc that it's okay to inline functions (such as
rintf(), to get the SSE4.1 roundss instruction) at particular call
sights without compiling the entire source file or calling function
with different CFLAGS.

I attempted this by making inline wrapper functions annotated with
attribute((optimize(...))), but it appears that the annotation does
not apply to inline functions? Take for example, ex.c:

#include 

static inline float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper_inline(float x)
{
   return rintf(x);
}

float
rintf_wrapper_inline_call(float x)
{
   return rintf(x);
}

float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper(float x)
{
   return rintf(x);
}

% gcc -O2 -msse4.1 -c ex.c
% objdump -d ex.o

ex.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0: e9 00 00 00 00   jmpq   5 
   5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
   c: 00 00 00 00

0010 :
  10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0
  16: c3   retq

whereas I expected that rintf_wrapper_inline_call would be the same as
rintf_wrapper.

I've read that per-function optimization is broken [1]. Is this still
the case? Is there a way to accomplish what I want?

[1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html

RFA: [VAX] SUBREG of MEM with a mode dependent address

2014-05-25 Thread Matt Thomas


GCC 4.8 for VAX is generating a subreg:HI for mem:SI indexed address.  This 
eventually gets caught by an assert in change_address_1.  Since the MEM rtx is 
SI, legimate_address_p thinks it's fine.  

I have a change to vax.md which catches these but it's extremely ugly and I 
have to think there's a better way.  But I have to wonder why is gcc even 
constructing a subreg of a mem with a mode dependent address.  

(gdb) call debug_rtx(insn)
(insn 73 72 374 12 (set (reg/v:HI 0 %r0 [orig:29 iCol ] [29])
(subreg:HI (mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] 
[22])
(const_int 4 [0x4]))
(reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101])) [4 MEM[base: 
_154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movhi_2}
 (nil))

Since this wasn't movstricthi, this could be rewritten to avoid the subreg and 
just treat %r0 as SI as in:

(insn 73 72 374 12 (set (reg/v:SI 0 %r0 [orig:29 iCol ] [29])
(mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] [22])
(const_int 4 [0x4]))
(reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101]) [4 MEM[base: 
_154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movsi_2}

But even if  movhi is a define_expand, as far as I can tell there's isn't 
enough info to know whether that is possible.  At that time, how can I tell 
that operands[0] will be a hard reg or operands[1] will be subreg of a mode 
dependent memory access?

I've tried using secondary_reload and it called called with 

(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)

but it dies in change_address_1 before invoking the code returned in sri.

I've tracked this down to reload replacing (reg:SI 113) with reg_equiv_mem 
(133) in the rtx.  However, it doesn't verify the rtx is actually valid.  I 
added a gcc_assert to trap this and got:

#1  0x0089ab87 in eliminate_regs_1 (x=0x7f7fe7b5c498, 
mem_mode=VOIDmode, insn=0x0, may_use_invariant=true, for_costs=true)
at 
/u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:2850(gdb)
 list
2845  && reg_equivs
2846  && reg_equiv_memory_loc (REGNO (SUBREG_REG (x))) != 0)
2847{
2848  new_rtx = SUBREG_REG (x);
2849  rtx z = reg_equiv_memory_loc (REGNO (new_rtx));
2850  gcc_assert (memory_address_addr_space_p (GET_MODE (x),
2851   XEXP (z, 0),
2852   MEM_ADDR_SPACE (z)));
2853}
2854  else
(gdb) call debug_rtx(z)
(mem:SI (plus:SI (mult:SI (reg/v:SI 22 [ i ])
(const_int 4 [0x4]))
(reg/v/f:SI 101 [ aiCol ])) [4 MEM[base: _154, offset: 0B]+0 S4 A32])
(gdb) call debug_rtx(x)
(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)

#2  0x0089cb31 in elimination_costs_in_insn (insn=0x7f7fe7b5bbd0)
at 
/u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:3751
(gdb) call debug_rtx (insn)
(insn 73 72 374 12 (set (nil)
(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)) 
/u1/netbsd-HEAD/src/external/public-domain/sqlite/lib/../dist/sqlite3.c:92031 
14 {movhi}
 (expr_list:REG_DEAD (reg:SI 113 [ MEM[base: _154, offset: 0B] ])
(nil)))

And now I'm stymied.  The limits of gcc-ness are now exceeded :)  I'n looking 
for ideas on how to proceed.

Thanks.

Re: RFA: [VAX] SUBREG of MEM with a mode dependent address

2014-06-03 Thread Matt Thomas


On May 30, 2014, at 10:39 AM, Jeff Law  wrote:

> On 05/25/14 18:19, Matt Thomas wrote:
>> 
>> But even if  movhi is a define_expand, as far as I can tell there's
>> isn't enough info to know whether that is possible.  At that time,
>> how can I tell that operands[0] will be a hard reg or operands[1]
>> will be subreg of a mode dependent memory access?
> At that time, you can't know those things.  Not even close ;-)  You certainly 
> don't want to try and rewrite the insn to just use SImode. This is all an 
> indication something has gone wrong elsewhere and this would just paper over 
> the problem.
> 
>> 
>> I've tried using secondary_reload and it called called with
>> 
>> (subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)
>> 
>> but it dies in change_address_1 before invoking the code returned in
>> sri.
> I suspect if you dig deep enough, you can make a secondary reload do what you 
> want.  It's just amazingly painful.
> 
> You want to allocate an SImode temporary, do the load of the SI memory 
> location into that SImode temporary, then (subreg:SI (tempreg:SI)). Your best 
> bet is going to be to look at how some other ports handle their secondary 
> reloads.  But I warn you, it's going to be painful.

Doesn't work because the assert fires before the secondary reload takes place.

In expr.c:convert_mode there is code that would seem to prevent this:

  /* For truncation, usually we can just refer to FROM in a narrower mode.  */
  if (GET_MODE_BITSIZE (to_mode) < GET_MODE_BITSIZE (from_mode)
  && TRULY_NOOP_TRUNCATION_MODES_P (to_mode, from_mode))
{
  if (!((MEM_P (from)
 && ! MEM_VOLATILE_P (from)
 && direct_load[(int) to_mode]
 && ! mode_dependent_address_p (XEXP (from, 0),
MEM_ADDR_SPACE (from)))
|| REG_P (from)
|| GET_CODE (from) == SUBREG))
from = force_reg (from_mode, from);
  if (REG_P (from) && REGNO (from) < FIRST_PSEUDO_REGISTER
  && ! HARD_REGNO_MODE_OK (REGNO (from), to_mode))
from = copy_to_reg (from);
  emit_move_insn (to, gen_lowpart (to_mode, from));
  return;
}

but from at that point is just

(mem:SI (reg:SI 112 [ D.118399 ]) [4 MEM[base: _154, offset: 0B]+0 S4 A32])

So there is not enough information for mode_dependent_address_p to return true.

>> 
>> I've tracked this down to reload replacing (reg:SI 113) with
>> reg_equiv_mem (133) in the rtx.  However, it doesn't verify the rtx
>> is actually valid.  I added a gcc_assert to trap this and got:
> Right.  reload will make that replacement and it's not going to do any 
> verification at that point.  Verification would have happened earlier.

See above.  If anywhere, that is where it would have been done.

> You have to look at the beginning of the main reload loop and poke at that 
> for a while:
> 
> /* For each pseudo register that has an equivalent location defined,
> try to eliminate any eliminable registers (such as the frame pointer)
> assuming initial offsets for the replacement register, which
> is the normal case.
> 
> If the resulting location is directly addressable, substitute
> the MEM we just got directly for the old REG.
> 
> If it is not addressable but is a constant or the sum of a hard reg
> and constant, it is probably not addressable because the constant is
> out of range, in that case record the address; we will generate
> hairy code to compute the address in a register each time it is
> needed.  Similarly if it is a hard register, but one that is not
> valid as an address register.
> 
> If the location is not addressable, but does not have one of the
> above forms, assign a stack slot.  We have to do this to avoid the
> potential of producing lots of reloads if, e.g., a location involves
> a pseudo that didn't get a hard register and has an equivalent memory
> location that also involves a pseudo that didn't get a hard register.
> 
> Perhaps at some point we will improve reload_when_needed handling
> so this problem goes away.  But that's very hairy.  */

I found a simplier solution.  It seemed to me that reload_inner_reg_of_subreg
was the right place to make this happen.  The following diff (to gcc 4.8.3)
fixes the problem:

diff -u -p -r1.3 reload.c
--- gcc/reload.c1 Mar 2014 08:58:29 -   1.3
+++ gcc/reload.c3 Jun 2014 17:24:27 -
@@ -846,6 +846,7 @@ static bool
 reload_inner_reg_of_subreg (rtx x, enum machine_mode mode, bool output)

Re: GCC ARM: aligned access

2014-08-31 Thread Matt Thomas

On Aug 31, 2014, at 11:32 AM, Joel Sherrill  wrote:

>> Hi,
>> 
>> I am writing some code and found that system crashed. I found it was
>> unaligned access which causes `data abort` exception. I write a piece
>> of code and objdump
>> it. I am not sure this is right or not.
>> 
>> command:
>> arm-poky-linux-gnueabi-gcc -marm -mno-thumb-interwork -mabi=aapcs-linux
>> -mword-relocations -march=armv7-a -mno-unaligned-access
>> -ffunction-sections -fdata-sections -fno-common -ffixed-r9 -msoft-float
>> -pipe  -O2 -c 2.c -o 2.o
>> 
>> arch is armv7-a and used '-mno-unaligned access'
> 
> I think this is totally expected. You were passed a u8 pointer which is 
> aligned for that type (no restrictions likely). You cast it to a type with 
> stricter alignment requirements. The code is just flawed. Some CPUs handle 
> unaligned accesses but not your ARM.

While armv7 and armv6 supports unaligned access, that support has to be 
enabled by the underlying O/S.  Not knowing the underlying environment, 
I can't say whether that support is enabled.  One issue we had in NetBSD
in moving to gcc4.8 was that the NetBSD/arm kernel didn't enable unaligned
access for armv[67] CPUs.  We quickly changed things so unaligned access
is supported.

Missed optimization case

2014-12-22 Thread Matt Godbolt

Hi all,

While digging into some GCC-generated code, I noticed a missed
opportunity in GCC that Clang and ICC seem to take advantage of. All
versions of GCC (up to 4.9.0) seem to have the same trouble. The
following source (for x86_64) shows up the problem:

-
#include 

#define add_carry32(sum, v)  __asm__("addl %1, %0 ;"  \
"adcl $0, %0 ;"  \
: "=r" (sum)  \
: "g" ((uint32_t) v), "0" (sum))

unsigned sorta_checksum(const void* src, int n, unsigned sum)
{
  const uint32_t *s4 = (const uint32_t*) src;
  const uint32_t *es4 = s4 + (n >> 2);

  while( s4 != es4 ) {
add_carry32(sum, *s4++);
  }

  add_carry32(sum, *(const uint16_t*) s4);
  return sum;
}
-

(the example is a contrived version of the original code, which comes
from Solarflare's OpenOnload project).

GCC optimizes the loop but then re-calculates the "s4" variable
outside of the loop before the last add_carry32.  ICC and Clang both
realise that the 's4' value in the loop is fine to re-use. GCC has an
extra four instructions to calculate the same value known to be in a
register upon loop exit.

Compiler explorer links:
GCC 4.9.0: http://goo.gl/fi3p2J
ICC 13.0.1: http://goo.gl/PRTTc6
Clang 3.4.1: http://goo.gl/95JEQc

I'll happily file a bug if necessary but I'm not clear in what phase
the optimization opportunity has been missed.

Thanks all, Matt

Re: Missed optimization case

2014-12-23 Thread Matt Godbolt

On Tue, Dec 23, 2014 at 2:25 PM, Andi Kleen  wrote:
>
> Please file a bug with a test case. No need to worry about the phase
> too much initially, just fill in a reasonable component.
>

Thanks - filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64396

-matt

volatile access optimization (C++ / x86_64)

Hi all,

I'm investigating ways to have single-threaded writers write to memory
areas which are then (very infrequently) read from another thread for
monitoring purposes. Things like "number of units of work done".

I initially modeled this with relaxed atomic operations. This
generates a "lock xadd" style instruction, as I can't convey that
there are no other writers.

As best I can tell, there's no memory order I can use to explain my
usage characteristics. Giving up on the atomics, I tried volatiles.
These are less than ideal as their power is less expressive, but in my
instance I am not trying to fight the ISA's reordering; just prevent
the compiler from eliding updates to my shared metrics.

GCC's code generation uses a "load; add; store" for volatiles, instead
of a single "add 1, [metric]".

http://goo.gl/dVzRSq has the example (which is also at the bottom of my email).

Is there a reason why (in principal) the volatile increment can't be
made into a single add? Clang and ICC both emit the same code for the
volatile and non-volatile case.

Thanks in advance for any thoughts on the matter,

Matt

--- example code ---
#include 
std::atomic a(0);

void base_case() {
a++;
}

void relaxed() {
a.fetch_add(1, std::memory_order_relaxed);
}

void load_and_store_relaxed() {
  a.store(a.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed);
}

void cast_as_int_ptr() {
  (*(int*)&a) ++;
}

void cast_as_volatile_int_ptr() {
  (*(volatile int*)&a) ++;
}

---example output (gcc490)---

base_case():
  lock addl $1, a(%rip)
  ret
relaxed():
  lock addl $1, a(%rip)
  ret
load_and_store_relaxed():
  movl a(%rip), %eax
  addl $1, %eax
  movl %eax, a(%rip)
  ret
cast_as_int_ptr():
  addl $1, a(%rip)
  ret
cast_as_volatile_int_ptr():
  movl a(%rip), %eax
  addl $1, %eax
  movl %eax, a(%rip)
  ret

Re: volatile access optimization (C++ / x86_64)

On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley  wrote:
> On 26/12/14 20:32, Matt Godbolt wrote:
>> Is there a reason why (in principal) the volatile increment can't be
>> made into a single add? Clang and ICC both emit the same code for the
>> volatile and non-volatile case.
>
> Yes.  Volatiles use the "as if" rule, where every memory access is as
> written.  a volatile increment is defined as a load, an increment, and
> a store.

That makes sense to me from a logical point of view. My understanding
though is the volatile keyword was mainly used when working with
memory-mapped devices, where memory loads and stores could not be
elided. A single-instruction load-modify-write like "increment [addr]"
adheres to these constraints even though it is a single instruction.
I realise my understanding could be wrong here!  If not though, both
clang and icc are taking a short-cut that may puts them into
non-compliant state.

> If you want single atomic increment, atomics are what you
> should use.  If you want an increment to be written to memory, use a
> store barrier after the increment.

Thanks. I realise I was unclear in my original email. I'm really
looking for a way to say "do a non-lock-prefixed increment". Atomics
are too strong and enforce a bus lock.  Doing a store barrier after
the increment also appears heavy-handed: while I wish for eventual
consistency with memory, I do not require it. I do however need the
compiler to not move or elide my increment.

At the moment I think the best I can do is to use an inline assembly
version of the increment which prevents GCC from doing any
optimisation upon it. That seems rather ugly though, and if anyone has
any better suggestions I'd be very grateful.

To give a concrete example:

uint64_t num_done = 0;
void process_work() { /* does something somewhat expensive */}
void worker_thread(int num_work) {
  for  (int i = 0; i < num_work; ++i) {
process_work();
num_done++;  // ideally a relaxed atomic increment here
  }
}

void reporting_thread() {
  while(true) {
   sleep(60);
   printf("worker has done %d\n", num_done);  // ideally a relaxed read here
  }
}

In the non-atomic case above, no locked instructions are used. Given
enough information about what process_work() does, the compiler can
realise that num_done can be added to outside of the loop (num_done +=
num_work); which is the part I'd like to avoid.  By making the int
atomic and using relaxed, I get this guarantee but at the cost of a
"lock addl".

Thanks in advance for any ideas,

Matt

Re: volatile access optimization (C++ / x86_64)

On Fri, Dec 26, 2014 at 4:51 PM, Marc Glisse  wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677

Thanks Marc

Re: volatile access optimization (C++ / x86_64)

On Fri, Dec 26, 2014 at 5:19 PM, Andrew Haley  wrote:
> On 26/12/14 22:49, Matt Godbolt wrote:
>> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley  wrote:
>>> On 26/12/14 20:32, Matt Godbolt wrote:

>> I realise my understanding could be wrong here!
>> If not though, both clang and icc are taking a short-cut that may
>> puts them into non-compliant state.
>
> It's hard to be certain.  The language used by the standard is very
> unhelpful: it requires all accesses to be as written, but does not
> define exactly what constitutes an access.

Thanks. My world is very x86-centric and so I find it hard to
understand why a single instruction's RMW is different from three
separate instructions; but I appreciate the standard is vague around
volatiles, and that atomics go some way to using more well-defined
semantics.

>> Thanks. I realise I was unclear in my original email. I'm really
>> looking for a way to say "do a non-lock-prefixed increment".
>
> Why?

Performance. The single-threaded writers do not need to use a lock
prefix: the atomicity of their read-add-write is guaranteed by my
knowing no other threads write to the value. Thus the bus lock they
take out unnecessarily slows down the instruction and potentially
causes extra coherency traffic.  The order of stores (on x86) is
guaranteed and so provided I take a relaxed view in the consumer
there's not even a need for any other flush.  The memory write will
necessarily "eventually" become visible to the reader. Within the
constraints of the architecture I'm working in, this is plenty enough
for a metric.

> You could just use a compiler barrier: asm volatile(""); But this is
> good only for x86 and a few others.

This may be all I need, but my worry is this will inhibit other valid
optimisations. I know that the "trick" used elsewhere as a barrier
(asm voliatile("":::"memory");) has the effect of flushing
enregistered values to memory. Ideally this wouldn't be necessary.
I'll be honest; I don't know the semantics of an empty volatile asm(),
but I'm not sure how it could cause only the one write (metric++) to
be emitted without affecting other variables too.

> Everyone else needs a real store barrier.

This is certainly true if the writer needs to guarantee visibility to
other threads. But that's not the case for my use case.

> Well, that's the problem: do you want a barrier or not?  With no
> barrier there is no guarantee that the data will ever be written to
> memory.  Do you only care about x86 processors?

I appreciate your patience in understanding my case (given I'm not
explaining myself very well!)  In this instance, yes, only x86
processors. I do not need an explicit ISA-level flush. I do need a
guarantee that the compiler cannot optimise the increment by
loop-invariant motion.

>> To give a concrete example:
[snip]
>> By making the int
>> atomic and using relaxed, I get this guarantee but at the cost of a
>> "lock addl".
>
> Ok, I get that, but not why.  If you care about a particular x86
> instruction, you can use it in an inlne asm.  I'm not at all sure what
> you want, really.

I hope my other comments at least help to explain the why! It's not a
particular instruction inasmuch as communicating to the compiler that
there's only one writer, and so the lock prefix is unnecessary (for
x86) as the write of the read-modify-write will not race with other
writers (as none exist) and the write will eventually become visible
to other threads in strict memory order (as the x86 guarantees). This
last stage I believe is consistent with a "relaxed" model, with an
optimisation that if no other writers exist, no bus lock is required
on the writer.

Again, thanks for the reply and the time taken thinking about the
issue especially at this festive time of year!

Best regards, Matt

Re: volatile access optimization (C++ / x86_64)

On Fri, Dec 26, 2014 at 5:20 PM, NightStrike  wrote:
> Have you tried release and acquire/consume instead?

Yes; these emit the same instructions in this case. http://goo.gl/e94Ya7

Regards, Matt

Re: volatile access optimization (C++ / x86_64)

2014-12-27 Thread Matt Godbolt

On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley  wrote:
> On 27/12/14 00:02, Matt Godbolt wrote:
>> On Fri, Dec 26, 2014 at 5:19 PM, Andrew Haley  wrote:
>>> On 26/12/14 22:49, Matt Godbolt wrote:
>>>> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley  wrote:
>>> Why?
>>
>> Performance.
>
> Okay, but that's not what I was trying to ask: if you don't need an
> atomic access, why do you care that it uses a read-modify-write
> instruction instead of three instructions?  Is it faster?  Have you
> measured it?  Is it so much faster that it's critical for your
> application?

Good point. No; I've yet to measure it but I will. I'll be honest: my
instinct is that really it won't make a measurable difference. From a
microarchitectural point of view it devolves to almost exactly the
same set of micro-operations (barring the duplicate memory address
calculation). It does encode to a longer instruction stream (15 bytes
vs 7 bytes), so there's an argument it puts more pressure than needed
on the i-cache. But honestly, it's more from an aesthetic point of
view I prefer the increment. (The locked version *is* measurable
slower).

Also, it's always nice to understand why particular optimisations
aren't performed by the compiler from a correctness point of view! :)

Thanks all for your fascinating insights :)

-matt

Re: volatile access optimization (C++ / x86_64)

2014-12-27 Thread Matt Godbolt

> On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley  wrote:
> Is it faster?  Have you measured it?  Is it so much faster that it's critical 
> for your
> application?

Well, I couldn't really leave this be: I did a little bit of
benchmarking using my company's proprietary benchmarking library,
which I'll try and get open sourced. It follows Intel's
recommendations for using RDTSCP/CPUID etc, and I've also spent some
time looking at Agner Fog 's techniques. I believe it to be pretty
accurate, to within a clock cycle or two.

On my laptop (Core i5 M520) the volatile and non-volatile increments
are so fast as to be within the noise - 1-2 clock cycles. So that
certainly lends support to your theory Andrew that it's probably not
worth the effort (other than offending my aesthetic sensibilities!).
Obviously this doesn't really take into account the extra i-cache
pressure.

As a comparison, the "lock xaddl" versions come out at 18 cycles.
Obviously this is also pretty much "free" by any reasonable metric,
but it's hard to measure the impact of the bus lock on other
processors' memory accesses in a highly multi-threaded environment.

For completeness I also tried it on a few other machines:
X5670 : 0-2 for normal, 28 clocks for lock xadd
E5-2667 v2: as above, 27 clocks for lock xadd
E5-2667 v3: as above, 15 clocks for lock xadd

On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley  wrote:
> Well, in this case you now know: it's a bug!  But one that it's
>fairly hard to care deeply about, although it might get fixed now.

Understood completely! Thanks again,

Matt

Re: volatile access optimization (C++ / x86_64)

2014-12-30 Thread Matt Godbolt

On Tue, Dec 30, 2014 at 5:05 AM, Torvald Riegel  wrote:
> I agree with Andrew.  My understanding of volatile is that the generated
> code must do exactly what the abstract machine would do.

That makes sense. I suppose I don't understand what the difference is
in terms of an abstract machine of "load; add; store" versus the
"load-add-store". At least from on x86, from the perspective of the
memory bus, there's no difference I'm aware of.

> One can use volatiles for synchronization if one is also manually adding
> HW barriers and potentially compiler barriers (depending on whether you
> need to mix volatile and non-volatile) -- but volatiles really aim at a
> different use case than atomics.

Again, the processor's reordering and memory barriers are not of huge
concern to me in this instance. I completely agree about volatile
being the wrong use case.

> For the single-writer shared-counter case, a load and a store operation
> with memory_order_relaxed seem to be right approach.

I agree: this most closely models my intention: a non-atomic-increment
but which has the semantics of being visible to other threads in a
finite period of time (as per your previous email).

The relaxed-load; add; relaxed-store generates the same code as the
volatile code (as in; three separate instructions), but I prefer it
over the volatile as it is more intention-revealing.  As to whether
it's valid to peephole optimize the three instructions to be a single
increment in the case of x86 given relaxed memory ordering, I can
offer no good opinion (though my instinct is it should be able to be!)

Thanks all for your help, Matt

Re: volatile access optimization (C++ / x86_64)

2015-01-05 Thread Matt Godbolt

On Mon, Jan 5, 2015 at 11:53 AM, DJ Delorie  wrote:
>
> Matt Godbolt  writes:
>> GCC's code generation uses a "load; add; store" for volatiles, instead
>> of a single "add 1, [metric]".
>
> GCC doesn't know if a target's load/add/store patterns are
> volatile-safe, so it must avoid them.  There are a few targets that have
> been audited for volatile-safe-ness such that gcc *can* use the combined
> load/add/store when the backend says it's OK.  x86 is not yet one of
> those targets.

Thanks DJ.

One question: do you have an example of a non-volatile-safe machine so
I can get a feel for the problems one might encounter?  At best I can
imagine a machine that optimizes "add 0, [mem]" to avoid the
read/write, but I'm not aware of such an ISA.

Much appreciated, Matt

5.1.0/4.9.2 native mingw64 lto-wrapper.exe issues (PR 65559 and 65582)

2015-04-28 Thread Matt Breedlove

I was told I should repost this on this ML rather than the gcc-help
list I originally posted this under. Here was my original thread:

https://gcc.gnu.org/ml/gcc-help/2015-04/msg00167.html

I came across PR 65559 and 65582 while investigating why I was getting
the "lto1.exe: internal compiler error: in read_cgraph_and_symbols, at
lto/lto.c:2947" error during a native MINGW64 LTO build. This also
seems to be present when enabling bootstrap-lto within 5.1.0
presenting an error message akin to what is listed in PR 65582.

Under:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/lto-wrapper.c;h=404cb68e0d1f800628ff69b7672385b88450a3d5;hb=HEAD#l927

lto-wrapper processes command-line params for filenames match (in my
case) "./.libs/libspeexdsp.a@0x44e26" and separates the filename from
the offset into separate variables. Since the following check to see
if that file exists by opening it doesn't use the parsed filename
variable and instead continues to use the argv parameter, the attempt
to open it always fails and that file is not specifically parsed for
LTO options.

One other issue I've noticed in my build happens as a result of the
open call when trying to parse the options using libiberty. Under
mingw64 native, the open call opens the object file in text mode and
then passes the fd eventually to libiberty's
simple_object_internal_read within simple-object.c. The issue springs
up trying to perform a read and it hits a CTRL+Z (0x1A) within the
object at which point the next read will return 0 bytes and trigger
the break of the loop and a subsequent error message of "file too
short" which gets silently ignored. In my testing, changing the 0x1A
within the object file to something else returns the full read (or
more data until another CTRL+Z is hit).

Ref: https://msdn.microsoft.com/en-us/library/wyssk1bs.aspx

This still happens within 4.9.2 and 4.9 trunk however in 4.9, the
object file being checked for LTO sections is still passed along in
the command-line whereas in 5.1.0 it gets skipped but is still listed
within the res file most likely leading to the ICE within 65559. This
would also explain Kai's comment on why this issue only occurs on
native builds. The ICE in 5.1.0 can also be avoided by using an
lto-wrapper from 4.9 or prior allowing the link to complete though no
LTO options will get processed due to #1.

This is my first report so I wouldn't mind some guidance. I'm
familiar enough with debugging to gather whatever other level details
are requested. Most of this was found using gdb.

--
Matt Breedlove

5.1.0 / 5.1.1 mingw64 bootstrap LTO failure questions

2015-05-11 Thread Matt Breedlove

I've posted an update to PR 66014 regarding mingw64 slim LTO bootstrap
errors I had been getting I was hoping to get some comments on.
Though this resolves the problem for me, I'm wondering what other
potential issues similar to it may spring up and was hoping to get
some feedback.

In addition, there is another related failure when doing bootstrap-lto
or bootstrap-lto-noplugin (slim or fat) in mingw64 relating to
sys_siglist.  mingw64 (as far as I know) does not have an
implementation for it.  The issue is as follows:

1.  stage1 completes bootstrapping.  strsignal and sys_siglist are
undetected resulting in HAVE_STRSIGNAL and HAVE_SYS_SIGLIST.

2.  stage2 (or stagefeedback) detects strsignal but not sys_siglist
leaving HAVE_SYS_SIGLIST defined.  This causes libiberty to define
strsignal but skips sys_siglist during the build leaving an undefined
reference to sys_siglist.

3. Build fails when attempting to link against the new LTO
libiberty.a(strsignal.o) when building gcc-nm, gcc-ar, etc.


Non-LTO builds suffer neither problem and fat bootstraps only suffer
from the issue above which I have worked around by passing in
"libiberty_cv_var_sys_siglist=no" during configuration.  Combined with
building libiberty with "-fno-builtin-stpcpy" (PR 66014), I have
gotten all builds to finally succeed.  I could use some guidance on
where to go from here, however.

Thanks,
Matt

Re: X32 psABI status

2011-02-12 Thread Matt Thomas


On Feb 12, 2011, at 1:29 PM, H.J. Lu wrote:

> On Sat, Feb 12, 2011 at 1:10 PM, Florian Weimer  wrote:
>> * H. J. Lu:
>> 
>>> We made lots of progresses on x32 pABI:
>>> 
>>> https://sites.google.com/site/x32abi/
>>> 
>>> 1. Kernel interface with syscall is close to be finalized.
>>> 2. GCC x32 branch is stabilizing.
>>> 3. The Bionic C library works with the syscall kernel interface.
>>> 
>>> The next major milestone will be x32 glibc port.
>> 
>> It is a bit difficult to extract useful information from these
>> resources.
> 
> That is true. Contributions are more than welcome.
> 
>> Is off_t 32 bits?  Why is the ia32 compatiblity kernel interface used?
> 
> Yes.

off_t is not part of the psABI since it's OS dependent.

>> I'm sure a lot of people want to get rid of that in cases where they
>> control the whole software stack.
> 
> That is debatable. The current thought is the x32 user space API
> is the same as is ia32.  time_t is also an issue.

Any system call method is beyond the scope of the psABI since it's
OS dependent and user-code should never care.

Re: X32 psABI status

2011-02-12 Thread Matt Thomas

On Feb 12, 2011, at 7:02 PM, Andrew Pinski wrote:

> On Sat, Feb 12, 2011 at 3:04 PM, H. Peter Anvin  wrote:
>> On 02/12/2011 01:10 PM, Florian Weimer wrote:
>>> Why is the ia32 compatiblity kernel interface used?
>> 
>> Because there is no way in hell we're designing in a second
>> compatibility ABI in the kernel (and it has to be a compatibility ABI,
>> because of the pointer size difference.)
> 
> I think he is asking why not create a new ABI layer for the kernel
> like it is done for n32 for MIPS.

The kernel syscall ABI needs to be able to be pass 64-bit quantities
in a single register (since that's what the calling ABI is capable
of doing but I don't think the ia32 kernel interface can do)?

Maybe it's me, but I expected X32 to be the X86-64 ABI with 32-bit longs
and pointers (converted to 64-bit arguments when passed in register or
on the stack).  That allows the same syscall argument marshalling that
currently exists but just need a different set of syscall vectors.

Re: RFC: A new MIPS64 ABI

On Feb 14, 2011, at 12:29 PM, David Daney wrote:

> Background:
> 
> Current MIPS 32-bit ABIs (both o32 and n32) are restricted to 2GB of
> user virtual memory space.  This is due the way MIPS32 memory space is
> segmented.  Only the range from 0..2^31-1 is available.  Pointer
> values are always sign extended.
> 
> Because there are not already enough MIPS ABIs, I present the ...
> 
> Proposal: A new ABI to support 4GB of address space with 32-bit
> pointers.
> 
> The proposed new ABI would only be available on MIPS64 platforms.  It
> would be identical to the current MIPS n32 ABI *except* that pointers
> would be zero-extended rather than sign-extended when resident in
> registers.  In the remainder of this document I will call it
> 'n32-big'.  As a result, applications would have access to a full 4GB
> of virtual address space.  The operating environment would be
> configured such that the entire lower 4GB of the virtual address space
> was available to the program.

I have to wonder if it's worth the effort.  The primary problem I see
is that this new ABI requires a 64bit kernel since faults through the
upper 2G will go through the XTLB miss exception vector.  

> At a low level here is how it would work:
> 
> 1) Load a pointer to a register from memory:
> 
> n32:
>   LW $reg, offset($reg)
> 
> n32-big:
>   LWU $reg, offset($reg)

That might be sufficient for userland, but the kernel will need
to do similar things (even if a 64bit kernel) when accessing 
structures supplied by 32-bit syscalls.  

It seems to be workable but if you need the additional address space
why not use N64?

Re: RFC: A new MIPS64 ABI

On Feb 14, 2011, at 6:22 PM, David Daney wrote:

> On 02/14/2011 04:15 PM, Matt Thomas wrote:
>> 
>> I have to wonder if it's worth the effort.  The primary problem I see
>> is that this new ABI requires a 64bit kernel since faults through the
>> upper 2G will go through the XTLB miss exception vector.
>> 
> 
> Yes, that is correct.  It is a 64-bit ABI, and like the existing n32 ABI 
> requires a 64-bit kernel.

N32 doesn't require a LP64 kernel, just a 64-bit register aware kernel.
Your N32-big does require a LP64 kernel.

Re: RFC: A new MIPS64 ABI


On Feb 14, 2011, at 6:26 PM, David Daney wrote:

> On 02/14/2011 06:14 PM, Joe Buck wrote:
>> On Mon, Feb 14, 2011 at 05:57:13PM -0800, Paul Koning wrote:
>>> It seems that this proposal would benefit programs that need more than 2 GB 
>>> but less than 4 GB, and for some reason really don't want 64 bit pointers.
>>> 
>>> This seems like a microscopically small market segment.  I can't see any 
>>> sense in such an effort.
>> 
>> I remember the RHEL hugemem patch being a big deal for lots of their
>> customers, so a process could address the full 4GB instead of only 3GB
>> on a 32-bit machine.  If I recall correctly, upstream didn't want it
>> (get a 64-bit machine!) but lots of paying customers clamored for it.
>> 
>> (I personally don't have an opinion on whether it's worth bothering with).
>> 
> 
> Also look at the new x86_64 ABI (See all those X32 psABI messages) that the 
> Intel folks are actively working on.  This proposal is very similar to what 
> they are doing.

untrue.  N32 is closer to the X32 ABI since it is limited to 2GB.

Re: RFC: A new MIPS64 ABI