Re: Reload Problem in delete_output_reload

2006-12-04 Thread Ulrich Weigand
Erwin Unruh wrote:

> I have a problem with delete_output_reload. It sometimes deletes
> instructions
> which are needed. Here an analysis of a recent case (In a private
> version of
> the S390 port). The original S390 shows almost the same reloads, but
> chooses
> different registers.

What GCC version your compiler based on?

> Reloads for insn # 1598
> Reload 0: reload_in (SI) =3D (const_int 4080 [0xff0])
>   ADDR_REGS, RELOAD_FOR_OUTPUT_ADDRESS (opnum =3D 0)
>   reload_in_reg: (const_int 4080 [0xff0])
>   reload_reg_rtx: (reg:SI 2 2)
> Reload 1: reload_in (SI) =3D (const_int 4080 [0xff0])
>   ADDR_REGS, RELOAD_FOR_OTHER_ADDRESS (opnum =3D 0)
>   reload_in_reg: (const_int 4080 [0xff0])
>   reload_reg_rtx: (reg:SI 2 2)
> Reload 2: reload_in (SI) =3D (const_int 4080 [0xff0])
>   ADDR_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum =3D 2)
>   reload_in_reg: (const_int 4080 [0xff0])
>   reload_reg_rtx: (reg:SI 2 2)
> Reload 3: reload_in (DI) =3D (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15)
> (const_int
> 4080 [0xff0]))
> (const_int 3144
> [0xc48])) [0 S8 A8])
>   reload_out (DI) =3D (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15)
> (const_int
> 4080 [0xff0]))
> (const_int 3136
> [0xc40])) [0 S8 A8])
>   GENERAL_REGS, RELOAD_OTHER (opnum =3D 0), can't combine
>   reload_in_reg: (reg:DI 1391)
>   reload_out_reg: (reg:DI 1393)
>   reload_reg_rtx: (reg:DI 0 0)
> Reload 4: ADDR_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum =3D 2), can't
> combine, secondary_reload_p
>   reload_reg_rtx: (reg:SI 3 3)
> Reload 5: reload_in (SI) =3D (plus:SI (plus:SI (reg/f:SI 15 15)
> (const_int 4080
> [0xff0]))
> (const_int 3136
> [0xc40]))
>   ADDR_REGS, RELOAD_FOR_INPUT (opnum =3D 2), inc by 8
>   reload_in_reg: (plus:SI (plus:SI (reg/f:SI 15 15)
> (const_int 4080
> [0xff0]))
> (const_int 3136
> [0xc40]))
>   reload_reg_rtx: (reg:SI 2 2)
>   secondary_in_reload =3D 4
>   secondary_in_icode =3D reload_insi
> 
> These reloads are ok. In do_output_reload it is noted that both
> insn_1597.Reload_2 and insn_1598.Reload_3 write to the same stack slot.
> So the compiler decides to remove the first reload and use register
> (reg:DI 2) > directly. In this analysis it misses the fact that (reg:SI 2)
> is used for input reloads of insn 1598.

This should actually be caught by the free_for_value_p check in
choose_reload_regs.  You cannot inherit a value for a RELOAD_OTHER
reload (3) in a register that is already in use for a RELOAD_FOR_INPUT_
ADDRESS reload (2).

Could you try to find out why this doesn't work correctly?

> One critical point is the timing on the variables reg_reloaded_valid and
> spill_reg_store.
> Within the function emit_reload_insns they are first checked (within
> do_output_reload) and later updated (after the reload instructions are
> written).
> So they reflect the state before the "reload sequence". Not all usages
> reflect
> this semantics. Especially the check within delete_output_reload is not
> correct.

I'm not sure how delete_output_reload comes into play here.  The decision
to inherit was already made long ago, in choose_reload_regs, and that is
already incorrect.  Even if the output reload for insn 1597 were not 
deleted at this point, the code would still be incorrect.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  [EMAIL PROTECTED]


Re: Reload Problem in delete_output_reload

2006-12-05 Thread Ulrich Weigand
Erwin Unruh wrote:

> Sorry, I mislead you. Somehow I did confuse (mem/c:DI (reg:SI 2 2) [0 S8
> A8])
> with (reg:DI 2). Register 2 is used correctly.
> I do not think any reload is inherited in this case.

Ah, right.  That did confuse me ;-)

> I did find something which might be the real problem. Within
> delete_output_reload
> there are two calls to count_occurrences. The second one will be called
> with
> parameters
> 
> (parallel [
> (set (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15)
> (const_int 4080 [0xff0]))
> (const_int 3136 [0xc40])) [0 S8 A8])
> (plus:DI (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15)
> (const_int 4080 [0xff0]))
> (const_int 3144 [0xc48])) [0 S8 A8])
> (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15)
> (const_int 4080 [0xff0]))
> (const_int 3136 [0xc40])) [0 S8 A8])))
> (clobber (reg:CC 33 %cc))
> ])
> 
> (mem/c:DI (plus:SI (reg/f:SI 15 15)
> (const_int 7216 [0x1c30])) [0 S8 A8])
> 
> The offset from register 15 is represented in two different ways. In the
> 
> first parameter it is split in two constants, in the second it is kept=20
> as a single constant.
> 
> Due to this difference, no occurence is found. So the second operand=20
> of the (plus:DI ...) is not counted.

I see.  This does look exactly like the problem Alexandre Oliva fixed for
PR target/28146:
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00592.html

This patch is in mainline, but hasn't been applied to the 4.1 branch.
Could you check whether it fixes your problem?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  [EMAIL PROTECTED]


Re: Import GCC 4.2.0 PRs

2007-03-12 Thread Ulrich Weigand
Mark Mitchell wrote:

> * PR 28544 (Brook, Weigand) -- this is an ARM ICE which Paul has tracked
> to reload, but he's not sure what to do there.  Perhaps Ulrich can help.

This description doesn't appear to match the bugzilla record.  Maybe you're
referring to PR 28675?  I can have a look at that one.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  [EMAIL PROTECTED]


Broken debug info with G++ anon struct typedefs

2007-03-15 Thread Ulrich Weigand
Hello,

in debugging a failing GDB test case I've come upon what looks like bad
DWARF-2 debug info being generated by G++.

The problem has to do with typedefs of anonymous structs like:

typedef struct {
  int one;
  int two;
} tagless_struct;
tagless_struct v_tagless;

This causes the following DWARF-2 to be generated:

.uleb128 0x2 # (DIE (0x6c) DW_TAG_structure_type)
.ascii "\0"# DW_AT_name
.byte   0x8  # DW_AT_byte_size
.byte   0x1  # DW_AT_decl_file (xxx.ii)
.byte   0x1  # DW_AT_decl_line
.4byte  0xa4 # DW_AT_sibling
.uleb128 0x3 # (DIE (0x87) DW_TAG_member)
.ascii "one\0"   # DW_AT_name
.byte   0x1  # DW_AT_decl_file (xxx.ii)
.byte   0x2  # DW_AT_decl_line
.4byte  0xa4 # DW_AT_type
.byte   0x2  # DW_AT_data_member_location
.byte   0x23 # DW_OP_plus_uconst
.uleb128 0x0
.uleb128 0x3 # (DIE (0x95) DW_TAG_member)
.ascii "two\0"   # DW_AT_name
.byte   0x1  # DW_AT_decl_file (xxx.ii)
.byte   0x3  # DW_AT_decl_line
.4byte  0xa4 # DW_AT_type
.byte   0x2  # DW_AT_data_member_location
.byte   0x23 # DW_OP_plus_uconst
.uleb128 0x4
.byte   0x0  # end of children of DIE 0x6c

.uleb128 0x4 # (DIE (0xa4) DW_TAG_base_type)
.byte   0x4  # DW_AT_byte_size
.byte   0x5  # DW_AT_encoding
.ascii "int\0"   # DW_AT_name

.uleb128 0x5 # (DIE (0xab) DW_TAG_variable)
.ascii "v_tagless\0" # DW_AT_name
.byte   0x1  # DW_AT_decl_file (xxx.ii)
.byte   0x5  # DW_AT_decl_line
.4byte  0x6c # DW_AT_type
.byte   0x1  # DW_AT_external
.byte   0x5  # DW_AT_location
.byte   0x3  # DW_OP_addr
.4byte  v_tagless

Note how the v_tagless variable records DIE 0x6c as type, which is
the anonymous struct -- the typedef is completely missing.

This is incorrect; we should see "tagless_struct" as type of
"v_tagless".

The reason for this appears to be the following optimization in
cp/decl.c (grokdeclarator):

  /* If the user declares "typedef struct {...} foo" then the
 struct will have an anonymous name.  Fill that name in now.
 Nothing can refer to it, so nothing needs know about the name
 change.  */
  if (type != error_mark_node
  && unqualified_id
  && TYPE_NAME (type)
  && TREE_CODE (TYPE_NAME (type)) == TYPE_DECL
  && TYPE_ANONYMOUS_P (type)
  /* Don't do this if there are attributes.  */
  && (!attrlist || !*attrlist)
  && cp_type_quals (type) == TYPE_UNQUALIFIED)
{
  tree oldname = TYPE_NAME (type);
  tree t;

  /* Replace the anonymous name with the real name everywhere.  */
  for (t = TYPE_MAIN_VARIANT (type); t; t = TYPE_NEXT_VARIANT (t))
if (TYPE_NAME (t) == oldname)
  TYPE_NAME (t) = decl;

This attempts to change the  name to the typedef name.
However, it comes too late.  The original anonymous struct type has already
run through rest_of_type_compilation at this stage, which has already called
dwarf2out_decl on it.  This in turn has already allocated the DIE and set up
the name information.  Changing TYPE_NAME afterwards has no effect on the
debug data any more.

Any suggestions how to fix this?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  [EMAIL PROTECTED]


Re: Trouble understanding reload dump file format..

2007-03-19 Thread Ulrich Weigand
Dave Korn wrote:

> (define_insn "mulsi3"
>   [(set (match_operand:SI 0 "register_operand" "=d")
> (mult:SI (match_operand:SI 2 "register_operand" "r")
>  (match_operand:SI 1 "register_operand" "a")))
>(clobber (match_scratch:SI 3 "b"))]

You should be using "=b" to designate the operand as *output* only.

Otherwise, reload will attempt (unsuccessfully) to load up "scratch:SI"
as input value ...

B.t.w. if the register class denoted by "b" has just a single 
element, you might just as well use the hard reg directly in
the insn pattern here.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  [EMAIL PROTECTED]


Re: Trouble understanding reload dump file format..

2007-03-19 Thread Ulrich Weigand
Dave Korn wrote:

>   Also, it's not actually a hard reg, it's in fact a memory-mapped peripheral.
> This means that I need to specify secondary reloads (can't be loaded directly
> from memory as I have no mem->mem move insn, needs to go via a gpr) and that
> implies that the register has to exist as a class because you can only specify
> secondary reloads on a per-reg-class basis.

I see, that makes sense.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  [EMAIL PROTECTED]


Re: reload-branch created

2005-03-18 Thread Ulrich Weigand
ode != inmode && ! HARD_REGNO_MODE_OK (offsetted_regno, mode))
! {
!   nregs = HARD_REGNO_NREGS (offsetted_regno, inmode);
! 
!   for (k = 0; k < nregs; k++)
!   if (! TEST_HARD_REG_BIT (*usable_regs, offsetted_regno + k))
! return IT_NONE;
! 
!   can_use_inheritance_reg = 0;
! }
else
  {
nregs = HARD_REGNO_NREGS (offsetted_regno, mode);


-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: reload-branch created

2005-03-20 Thread Ulrich Weigand
Bernd Schmidt <[EMAIL PROTECTED]> wrote on 03/20/2005 07:41:14 PM:

> This is OK.  Would you check it in?

Done, thanks.


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: [EMAIL PROTECTED]



Re: [m68k]: Trouble trying to figure out LEGITIMIZE_RELOAD_ADDRESS

2005-03-30 Thread Ulrich Weigand
Peter Barada wrote:

> I figured out how to make it work using LEGITIMIZE_RELOAD_ADDRESS(at
> least for gcc-3.4.3) via:

I seem to recall rth opposing any use of LEGITIMIZE_RELOAD_ADDRESS
to ensure *correctness* of generated code; L_R_A's only purpose
is supposed to provide additional optimizations. 

> For ColdFire v4e, FP loads and stores can not use REG+REG addressing.
> 
> I think you are correct that this current hack can be improved by
> making an array of mode to determine if double_reg_address_ok should
> be true, but I think in the end that a more flexible scheme be thought
> about since this isn't the *only* peculiarity of the ColdFire.  One is
> that pc-relative addressing is only available on the *source*, not the
> destination, and currently GO_IF_LEGITIMATE_ADDRESS nor
> LEGITIMIZE_ADDRESS have no way of know if its a source or destination.

It looks like you really have different types of valid addresses,
depending on which instruction uses the address.  We have a similar
situation on s390, and to handle this I've introduced the
EXTRA_MEMORY_CONSTRAINT mechanism.

The idea is, you keep GO_IF_LEGITIMIZE_ADDRESS defined so as to accept
the most general form of addresses accepted by any instruction.  You
continue to use the 'm' constraint for instructions accepting the most
general form.

For each more restricted form of address you define an extra constraint,
and recognize exactly the appropriate forms in your implementation of
EXTRA_CONSTRAINT.  You use these constraints instead of 'm' in those
instructions that accept only a restricted form of addresses.

In addition, you define EXTRA_MEMORY_CONSTRAINT and have it accept all
constraints that correspond to such restricted memory addresses.  (Note
that EXTRA_MEMORY_CONSTRAINT assumes that at least a single base
register will be a valid address accepted by any constraint marked
thus.)

Reload will now go and first try to handle each operand marked with
an EXTRA_MEMORY_CONSTRAINT just like it does 'm' operands.  However,
it additionally checks EXTRA_CONSTRAINT, and if the current operand
is rejected, reload will automatically reload the operand into a
base register.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: reload-branch created

2005-04-14 Thread Ulrich Weigand
Bernd Schmidt <[EMAIL PROTECTED]> wrote on 11.04.2005 14:43:38:

>* reload.c (find_reloads): Only set INC field if we know we have an
>autoinc reload.

Yes, this helps for s390.  With the current reload-branch, and just my
scan_rtx patch on top, I was able to bootstrap and run the test suite
without regressions (all languages, including Ada) on s390-ibm-linux
and s390x-ibm-linux.

Thanks for taking care of this problem!


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: [EMAIL PROTECTED]



Re: GCC 3.4.4 RC2

2005-05-14 Thread Ulrich Weigand
Greg Schafer wrote:
> On Fri, May 13, 2005 at 03:44:59PM -0700, Mark Mitchell wrote:
> 
> > GCC 3.4.4 RC2 is now available here:
> > 
> > ftp://gcc.gnu.org/pub/gcc/prerelease-3.4.4-20050512
> > 
> > There are just a few changes from RC1 to fix critical problems people 
> > experienced with RC1.
> > 
> > Please download, build, and test.
> 
> Done.

Likewise, s390(x)-ibm-linux are working fine now:
http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00916.html
http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00917.html


> Something went wrong with the libstdc++ testsuite. It doesn't show up in the
> test results:
> 
>   http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00856.html
> 
> AFAICT, the libstdc++ testsuite _did_ run, but it doesn't show up. This
> could be pilot error on my behalf, but I've followed the exact same scripted
> procedure for ages so I don't think so.
> 
> Oops, I failed to realise it also happened with RC1:
> 
>   http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00670.html
Um?  This does have the libstd++ results ...

> I've run out of time right now and cannot investigate it immediately. Anyone
> else seeing this? Quick browse of other test results suggests I'm not alone.
> Maybe there's a BZ already? Dunno.

It would appear the problem is this patch:
2005-05-12  Mark Mitchell  <[EMAIL PROTECTED]>

2005-04-04  Mark Mitchell  <[EMAIL PROTECTED]>
* testsuite/Makefile.am (check-local): Remove.
(curent_symbols.txt): Likewise.
(check-abi): Do not depend on current_symbols.txt.
* testsuite/Makefile.in: Regenerated.
* testsuite/libstdc++-abi/abi.exp: Build current_symbols.txt.
2005-04-01  Mark Mitchell  <[EMAIL PROTECTED]>
* testsuite/Makefile.am (noinst_PROGRAMS): Remove.
(site.exp): Write out the path to the baseline file.
(check-abi): Use DejaGNU.
(check-abi-verbose): Remove.
* testsuite/Makefile.in: Regenerated.
* testsuite/abi_check.cc (main): Check the return value from
compare_symbols.
* testsuite/testsuite_abi.cc (compare_symbols): Return a value.
* testsuite/testsuite_abi.h (compare_symbols): Adjust prototype.
* testsuite/libstdc++-abi/abi.exp: New file.
2004-03-19  Phil Edwards  <[EMAIL PROTECTED]>
* testsuite/Makefile.am (site.exp):  New target, based on that
created by automake.  Also set libiconv.

While using DejaGNU for check-abi might be a good idea in principle, the way
it is currently set up means that the 'make check-abi' run will overwrite the
libstdc++.log and libstdc++.sum files generated by the main 'make check' run.
This is why the results are missing from the test_summary report ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: RFC: (use) useful for avoiding unnecessary compare instructions during cc0->CCmode ?!

2005-05-14 Thread Ulrich Weigand
Bjoern Haase wrote:

> IIUC correctly one of the two or three difficulties with cc0->CCmode
> conversion is, that combine in it's present form is not able to
> recognize that in a sequence like
> 
> (parallel[
> (set reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102))
> (set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102)))])
> ((set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102)))
> 
> the compare instruction could be avoided. IIUC it is presently suggested to
> make use of *many* peephole2 patterns for solving this problem.

The current CCmode back-ends (e.g. s390, but also i386 and others) handle
this problem by initially expanding into:

 (parallel [(set (reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102)))
(clobber reg:CC_xxx CC)])
 (set (reg:CC_xxx CC) (compare:CC (reg:SI 101) (reg:SI 102)))

and *also* offer another pattern, to be matched by combine:

 (parallel [(set (reg:CC_xxx CC) (compare:CC (reg:SI 101) (reg:SI 102)))
(set (reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102)))])

See e.g. the "subsi3" expander vs. the "*subsi3_cc" and "*subsi3_cc2"
patterns in s390.md.

This works because combine will treat the initial parallel as a 
'single-set' insn, and nearly all optimizations will be enabled.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


4.0 regression: missing debug info for global variables in C with -O2

2005-05-30 Thread Ulrich Weigand
Hello,

we've just noticed a quite serious regression in debug info output
in GCC 4.0 over previous releases: when building with -funit-at-a-time
(which is on by default with -O2), *no* debug info for global variables
appears to be emitted at all.

The problem appears to be this piece of code in check_global_declarations:

  /* Do not emit debug information about variables that are in
 static storage, but not defined.  */
  if (TREE_CODE (decl) == VAR_DECL
  && TREE_STATIC (decl)
  && !TREE_ASM_WRITTEN (decl))
DECL_IGNORED_P (decl) = 1;

which was added by:
http://gcc.gnu.org/ml/gcc-cvs/2004-11/msg01250.html

2004-11-25  Mark Mitchell  <[EMAIL PROTECTED]>

PR c++/18556
* toplev.c (check_global_declarations): Set DECL_IGNORED_P on
unemitted variables with static storage duration.


However, check_global_declarations is called (by the C front-end at least)
*before* the call to cgraph_optimize.  At this time, when -funit-at-a-time
is in effect, *no* variables have been emitted to assembler as far as I
can tell, and thus all global variables get the DECL_IGNORED_P flag.

Interestingly enough, when building with the C++ front-end, the debug
info is properly emitted.  I'm not sure where exactly the difference is ...


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: 4.0 regression: missing debug info for global variables in C with -O2

2005-05-30 Thread Ulrich Weigand
Andrew Pinski wrote:

> You can reproduce it using:
> static int i;
> int main(void)
> {
>i += 3;
>i *= 5;
>return 0;
> }
> 
> and readelf and looking for the DW_TAG_variable tag.

Yes; in fact 'main' is even superfluous.  Just compile

  int var;

with -S -O2 -g on gcc 3.4 and 4.0 and look at the resulting
assembler file, the difference is quite obvious ...

Bye,
Ulrich


-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: 4.0 regression: missing debug info for global variables in C

2005-05-31 Thread Ulrich Weigand
Paolo Bonzini wrote:

> Maybe this is responsible for part of PR21828?

I'd say this *is* PR21828: note that the variables whose
type is unknown are global variables in C code compiled
with -O2 ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Edges, predictions, and GC crashes ...

2005-06-02 Thread Ulrich Weigand
Hello,

I'm seeing compiler crashes during garbage collection when using mudflap.

The problem appears to be that some basic_block_def structures point to
edge_prediction structures which point to edge_def structures that have
already been ggc_free()'d.

Now, looking at remove_edge (cfg.c) is does indeed appear that it is
possible for edges to get deleted without the corresponding prediction
structure being removed as well ...

How is this supposed to work?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: Edges, predictions, and GC crashes ...

2005-06-02 Thread Ulrich Weigand
Jan Hubicka wrote:

> I didn't have any cleanup_cfg in between earliest place putting
> predictions and the profiling pass consuming them, so this scenario
> didn't happen.  This has however changed a long time ago.  I guess just
> teaching remove_edge to walk prediction list if it is present and kill
> now dead predictions would not be perfomrance overkill as the
> predictions are rare, I can implement that if you pass me some testcase.

FAIL: libmudflap.c++/pass57-frag.cxx (test for excess errors)
WARNING: libmudflap.c++/pass57-frag.cxx compilation failed to produce executable
FAIL: libmudflap.c++/pass57-frag.cxx (-static) (test for excess errors)
WARNING: libmudflap.c++/pass57-frag.cxx (-static) compilation failed to produce 
executable
FAIL: libmudflap.c++/pass57-frag.cxx (-O2) (test for excess errors)
WARNING: libmudflap.c++/pass57-frag.cxx (-O2) compilation failed to produce 
executable
FAIL: libmudflap.c++/pass57-frag.cxx (-O3) (test for excess errors)
WARNING: libmudflap.c++/pass57-frag.cxx (-O3) compilation failed to produce 
executable

with current mainline on s390-ibm-linux.

Thanks for looking into this issue!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: Edges, predictions, and GC crashes ...

2005-06-04 Thread Ulrich Weigand
Jan Hubicka wrote:

> I've comitted the attached patch.  I didn't suceed to reproduce your
> failures, but Danny reported it fixes his and it bootstrap/regtests
> i686-pc-gnu-linux.

Thanks; this does fix one crash on s390x, but doesn't fix the
pass57-frag crashes on s390.

What happens is that after the predictions are created, but before 
remove_edge is called, the edge is modified in rtl_split_block
(called from tree_expand_cfg):

  /* Redirect the outgoing edges.  */
  new_bb->succs = bb->succs;
  bb->succs = NULL;
  FOR_EACH_EDGE (e, ei, new_bb->succs)
e->src = new_bb;

Now the 'src' link points to a different basic block, but the old
basic block still has the prediction pointing to the edge.

When remove_edge is finally called, your new code tries to find
and remove the prediction from the *new* basic block's prediction
list -- but it still remains on the old one's list ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: GCC 4.01 RC1 Available

2005-06-09 Thread Ulrich Weigand
Mark Mitchell wote:

> The GCC 4.0.1 RC1 prerelease is available here:
> 
>ftp://gcc.gnu.org/pub/gcc/prerelease-4.0.1-20050607/
> 
> Please test these tarballs, and let me know about showstoppers.

s390(x)-ibm-linux is looking fine:
http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg00583.html
http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg00584.html

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: GCC 4.0.1 RC2

2005-06-18 Thread Ulrich Weigand
Mark Michell wrote:
> GCC 4.0.1 RC2 is now available here:
> 
>ftp://gcc.gnu.org/pub/gcc/prerelease-4.0.1-20050616

Still fine on s390(x):
http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg01103.html
http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg01104.html

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: Do C++ signed types have modulo semantics?

2005-06-28 Thread Ulrich Weigand
Steven Bosscher wrote:

> Anyway, I've started a SPEC run with "-O2" vs. "-O2 -fwrapv".  Let's see
> how big the damage would be ;-)

Please make sure to include a 64-bit target, where it actually makes any
difference.  (I recall performance degradations of 20-30% in some
SPECfp cases from getting induction variable reduction wrong ...)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Bootstrap failure -- verify_ssa failed

2005-06-28 Thread Ulrich Weigand
Hello,

with current GCC mainline bootstrap fails on s390(x)-ibm-linux
during stage2 build with:

../../gcc-head/gcc/tree-ssa-operands.c: In function 'finalize_ssa_uses':
../../gcc-head/gcc/tree-ssa-operands.c:570: error: Definition in block 64 does 
not dominate use in block 13
for SSA_NAME: TMT.628_22 in statement:
TMT.628_41 = PHI ;
PHI argument
TMT.628_22
for PHI node
TMT.628_41 = PHI ;
../../gcc-head/gcc/tree-ssa-operands.c:570: internal compiler error: verify_ssa 
failed.

Before I start debugging, does this ring any bells?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: Bootstrap failure -- verify_ssa failed

2005-06-30 Thread Ulrich Weigand
Diego Novillo <[EMAIL PROTECTED]> wrote on 28.06.2005 20:51:52:

> On Tue, Jun 28, 2005 at 08:38:13PM +0200, Ulrich Weigand wrote:
> > Hello,
> >
> > with current GCC mainline bootstrap fails on s390(x)-ibm-linux
> > during stage2 build with:
> >
> I'm on it.

s390(x) bootstrap failures are now fixed, thanks!


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: [EMAIL PROTECTED]



Re: GCC 4.0.1 RC3

2005-07-04 Thread Ulrich Weigand
Mark Mitchell wrote:
> GCC 4.0.1 RC3 is now available here:
> 
>ftp://gcc.gnu.org/pub/gcc/prerelease-4.0.1-20050702/
> 
> With luck, this will be the last 4.0.1 release candidate.
> 
> Please do download tarballs from the above URL and confirm that they 
> work OK on your systems.

s390(x)-ibm-linux is still looking good:
http://gcc.gnu.org/ml/gcc-testresults/2005-07/msg00182.html
http://gcc.gnu.org/ml/gcc-testresults/2005-07/msg00183.html

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: GCC 4.0.2 RC1 Available

2005-09-14 Thread Ulrich Weigand
Mark Mitchell wrote:

> It's important to test the actual tarballs, rather than CVS, to check
> for any packaging issues.  If you can, download and build the tarballs,
> post test results to the gcc-testresults mailing list with and
> contrib/test_summary.  If you encounter problems, please file them in
> bugzilla, and add me to the CC: list.

s390(x)-ibm-linux are looking good:
http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00688.html
http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00689.html

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: GCC 4.0.2 RC2

2005-09-18 Thread Ulrich Weigand
Mark Mitchell wrote:

> Please test, post test results to gcc-testresults, and send me an email
> pointing at the results.

s390(x)-ibm-linux is still fine:
http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00883.html
http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00884.html

In fact, as far as I can recall, 4.0.2 will be the first
ever GCC release with zero testsuite FAILs across all
languages on s390-ibm-linux ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: GCC 4.0.2 Released

2005-09-29 Thread Ulrich Weigand
Mark Mitchell wrote:

> GCC 4.0.2 has been released.

Results on s390(x)-ibm-linux are here:
http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg01323.html
http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg01324.html

Unfortunately, it is not zero-FAIL after all; at the last
minute this one appears to have crept in:
FAIL: g++.dg/template/array14.C (test for excess errors)

The error is:
array14.C: In function 'void t(int)':
array14.C:9: error: invalid lvalue in unary '&'

Any idea what this is all about?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Aliasing violation generated by fold_builtin_memcmp?

2005-09-29 Thread Ulrich Weigand
Hello,

in debugging the remaining Ada failures on s390x, I've come to what
looks a bug in the middle end.  See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19382
for details.

In short, the problem appears to be this code in fold_builtin_memcmp:

  /* If len parameter is one, return an expression corresponding to
 (*(const unsigned char*)arg1 - (const unsigned char*)arg2).  */
  if (host_integerp (len, 1) && tree_low_cst (len, 1) == 1)
{
  tree cst_uchar_node = build_type_variant (unsigned_char_type_node, 1, 0);
  tree cst_uchar_ptr_node = build_pointer_type (cst_uchar_node);
  tree ind1 = fold_convert (integer_type_node,
build1 (INDIRECT_REF, cst_uchar_node,
fold_convert (cst_uchar_ptr_node,
  arg1)));
  tree ind2 = fold_convert (integer_type_node,
build1 (INDIRECT_REF, cst_uchar_node,
fold_convert (cst_uchar_ptr_node,
  arg2)));
  return fold_build2 (MINUS_EXPR, integer_type_node, ind1, ind2);
}

This generates code accessing the data pointed to by the arguments
using a synthesized 'const unsigned char' type.  This is only OK
if that type is allowed to alias whatever type the arguments
originally had.

Now this is not an issue in the C family of languages, due to the
special case of 'char' / 'signed char' / 'unsigned char' being
explicitly allowed to alias every other type.

However, when building for some *other* language, this assumption
is not correct -- e.g. in the Ada test case in PR 19382, the type
synthesized here is completely unknown to the front end and gets
assigned a default alias set allowing it to alias *nothing* else.

(Note that even for Ada, fold_builtin_memcmp *will* be called,
e.g. from gimplify.c:gimplify_variable_sized_compare.)

Any suggestions how to fix this?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: GCC 4.0.2 Released

2005-09-30 Thread Ulrich Weigand
Mark Mitchell wrote:

> No, that's very weird; that was PR 23993, which I fixed.  Or, thought I
> did.  It's definitely fixed for me on x86_64-unknown-linux-gnu.  But,
> the nature of the bug didn't seem at all system-dependent.  I've checked
> that I have no local patches in my GCC 4.0.x tree.  So, I'm pretty
> mystified...

Well, current 4_0 branch fixes the problem for me too.  It appears that
the test case made it into the release, but the fix didn't ...

Comparing the cp/ChangeLog files from 4.0.2 and the 4_0 branch, it looks
like the fix is in the release according to the current ChangeLog, but
in fact it wasn't:

[EMAIL PROTECTED] fsf]$ diff -u gcc-4.0.2/gcc/cp/ChangeLog 
gcc-4_0/gcc/cp/ChangeLog
--- gcc-4.0.2/gcc/cp/ChangeLog  2005-09-21 05:56:51.0 +0200
+++ gcc-4_0/gcc/cp/ChangeLog2005-09-30 12:40:52.0 +0200
@@ -1,7 +1,19 @@
-2005-09-20  Release Manager
+2005-09-29  Jakub Jelinek  <[EMAIL PROTECTED]>
+
+   PR c++/21983
+   * class.c (find_final_overrider): Move diagnostic about no unique final
+   overrider to...
+   (update_vtable_entry_for_fn): ... here.
+
+2005-09-27  Release Manager

* GCC 4.0.2 released.

+2005-09-21  Mark Mitchell  <[EMAIL PROTECTED]>
+
+   PR c++/23993
+   * init.c (integral_constant_value): Use DECL_INTEGRAL_CONSTANT_VAR_P.
+
 2005-09-16  Mark Mitchell  <[EMAIL PROTECTED]>

PR c++/23914

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


[RFC] Fortran ICE caused by array addressability issue

2005-10-13 Thread Ulrich Weigand
Hello,

I've been investigating the failure of namelist_14.f90 -O0,
the last s390x-ibm-linux Fortran regression.

A much reduced test case is:
program test
  type ::  mt
integer::  ii(4)
  end type mt

  type(mt) :: t
  t = mt ((/1,1,1,1/))
end program test

Compiling this program without optimization fails with
test.f90:9: internal compiler error: in expand_assignment, at expr.c:3929
(Note that *with* optimization everything is fine.)

The ICE in expand_assignment results from expanding the
following line of code (from test.f90.t24.fixupcfg):
  mt.0.ii[S.6] = D.557;

"mt.0" is a temporary of type "mt" generated by the gimplifier.  As the
size of mt is 16, the expander uses TImode to hold the variable.

The ICE occurs because the expander tries to use a *register* to hold
that variable, and indexing a *variable* slot inside an array held in
a register is not supported by expand_assignment.

The decision whether to use a register or a stack slot is made in the
function use_register_for_decl (function.c):

bool
use_register_for_decl (tree decl)
{
  /* Honor volatile.  */
  if (TREE_SIDE_EFFECTS (decl))
return false;

  /* Honor addressability.  */
  if (TREE_ADDRESSABLE (decl))
return false;

  /* Only register-like things go in registers.  */
  if (DECL_MODE (decl) == BLKmode)
return false;

  /* If -ffloat-store specified, don't put explicit float variables
 into registers.  */
  /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
 propagates values across these stores, and it probably shouldn't.  */
  if (flag_float_store && FLOAT_TYPE_P (TREE_TYPE (decl)))
return false;

  /* If we're not interested in tracking debugging information for
 this decl, then we can certainly put it in a register.  */
  if (DECL_IGNORED_P (decl))
return true;

  return (optimize || DECL_REGISTER (decl));
}

As the variable is a temporary generated by the gimplifier, it has
its DECL_IGNORED_P flag set, which is why use_register_for_decl
decides to put it into a register, causing the ICE.  I was unable
to reproduce this in C, because I don't get such temporaries there.

Now, the interesting point is that when opimization is on, this 
routine would return true anyway, so why does it work then?  This
is because then the TREE_ADDRESSABLE flag is set, so it will always
be placed onto the stack.

However, the routine mark_array_ref_addressable (tree-cfg.c) which
sets the TREE_ADDRESSABLE flag is only ever called from the routine
execute_cleanup_cfg_post_optimizing (tree-optimize.c), which is never
called when not optimizing ...


Now, I'm not sure how to fix this.  One fix would be to always make
sure mark_array_ref_addressable is called, even when not optimizing.
I don't know what the right call point would be.

The other fix would be to remove the special treatment of 
DECL_IGNORED_P from use_register_for_decl.  This may cause somewhat
worse code generation, but only in the -O0 case, which I'm not
particularly concerned about ...


Any suggestions what the proper fix should be?

Thanks,
Ulrich


-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: Link-time optimzation

2005-11-17 Thread Ulrich Weigand
Ian Lance Taylor wrote:

> In section 3.4 (Linker) I have the same comment: for non-GNU targets,
> the native linker is sometimes required, so modifying the linker
> should not be a requirement.  And the exact handling of .a files is
> surprisingly target dependent, so while it would be easy to code an
> archive searcher in gcc, it would be tedious, though doable, to get it
> right for all platforms.
> 
> Conversely, I don't know much we are going to care about speed here,
> but I assume that we are going to care a bit.  For the linker to
> determine which files to pull in from an archive, it is going to have
> to read the symbol tables of all the input objects, and it is going to
> have to read the archive symbol table, and it is going to have to read
> the symbols table of each object included from an archive.  It will
> have to build a symbol hash table as it goes along.  This isn't fast;
> it's a significant component of link time.  Since the compiler is also
> going to have to build a symbol hash table, it is going to be faster
> to have the compiler search the archive symbol table and decide which
> objects to pull in.  Searching an archive symbol table isn't hard; the
> target dependencies come in when deciding which objects to include.

I'm wondering whether we can't simply employ the linker to handle
all those issues:  Have the driver always (not just as fall-back)
call "ld -r" and the linker will pull together all input files,
including those from archives, and combine them into one single
object file.  Then invoke the new "link-optimizer" on that single
object file, resulting in an optimized object file.

Any reasons why this cannot work?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: LEGITIMIZE_RELOAD_ADDRESS vs address_reloaded

2005-11-25 Thread Ulrich Weigand
Hi Alan,

the problem you point out is really just the tip of the iceberg --
currently reload frequently decides it *will* perform an action in the
future, and assumes that other code will anyway behave as if that action
had already been completed ...

This is why e.g. the 'm' constraint accepts *any* MEM without checking
whether the address is valid.  Extra memory constraint checking needs
to similarly account for future reload actions currently ...

> a) Before matching constraints in find_reloads, substitute dummy regs
> for any reloads that have been identified.  I'm not sure how much work
> is involved in doing this, or whether it is even possible.  It sounds
> like this would be the best solution technically, as then the output
> of LEGITIMIZE_RELOAD_ADDRESS is properly checked.

I'd really like to see something like this in the future -- first steps
have been done in the reload-branch.  This is unlikely to be fixable
in the short term, though.

> b) Modify LEGITIMIZE_RELOAD_ADDRESS to return a constraint letter that
> the address is guaranteed to match after reloading.  A bit of mechanical
> work changing all targets.

This is probably not general enough -- in LEGITIMIZE_RELOAD_ADDRESS
you have no idea what the current insn is, or what constrains it can
possibly accept -- if the result of LEGITIMIZE_RELOAD_ADDRESS can
match multiple extra memory constraints, which one should you select?

> c) Modify the ppc 'Z' constraint to match the indexed address reload
> generates.  This would rely on the pattern we generate in
> LEGITIMIZE_RELOAD_ADDRESS never being generated elsewhere.

This would fit in the spirit of how reload currently does things.
Seeing as the 'temporary' address created by LEGITIMIZE_RELOAD_ADDRESS
is non-canonical RTL and thus shouldn't occur elsewhere, such a test
should be fine, in particular if done under if (reload_in_progress).

> d) Hacks like the patch below, that effectively perform the reload
> substitution with a dummy reg.  I fear this isn't proper, even though it
> seems to work..

I'm not sure this is safe, although I don't see right away why it
would break either ...


Overall, I'd tend to prefer something along the lines of (c), in
particular as it would also catch the cases where 
LEGITIMIZE_RELOAD_ADDRESS isn't actually involved, as you note:

> (*) This is exactly what code in find_reloads_address does on
> encoutering invalid indexed address.  The trouble is that its
> transformation isn't valid until the reloads are done, and we check
> constraints before doing the substitutions.  :-(

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


Re: ppc-linux and s390-linux compilers requiring 64-bit HWI?

2005-11-27 Thread Ulrich Weigand
Jakub Jelinek wrote:

> What's the reason why ppc-linux and s390-linux targetted GCC
> requires 64-bit HWI these days?
> If it is a native compiler, this means using long long in a huge part
> of the compiler and therefore slower compile times and bigger memory
> consumption.
> ppc-linux configured compiler (nor s390-linux) isn't multilibbed and
> ppc-linux only supports -m32 (s390-linux one apparently supports
> both -m31 and -m64 on the GCC side, but without multilibs it isn't very
> helpful).

There are two reasons why we require 64-bit HWI on s390: we want to
support -m64 (multilibs can be easily added), and 64-bit HWI simplifies
constant handling significantly.  There are multiple places in the
s390 backend that rely on the size of HWI being > 32-bit, and would
likely cause overflow problems otherwise ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  [EMAIL PROTECTED]


fast_math_flags_set_p vs. set_fast_math_flags inconsistency?

2020-01-20 Thread Ulrich Weigand
Hello,

the -ffast-math command line option sets a bunch of other flags
internally, as implemented in set_fast_math_flags.  It is possible
to selectively override those flags on the command line as well.

I'm now wondering under what circumstances the __FAST_MATH__ macro
should still be defined.  This is currently implemented in the
fast_math_flags_set_p routine, which checks the status of *some*
(but not all!) of the flags implied by -ffast-math.

This has the effect that e.g. after

  -ffast-math -fno-finite-math-only

the __FAST_MATH__ macro is no longer predefined, but after

  -ffast-math -fno-associative-math

the __FAST_MATH__ macro still *is* predefined, even though both
-ffinite-math-only and -fassociative-math are implied by -ffast-math.

Is this deliberate?  (If so, is it documented somewhere?)

Or is this just a bug and fast_math_flags_set_p ought to check
all flags implied by -ffast-math?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?

2020-01-21 Thread Ulrich Weigand
Joseph Myers wrote:
> On Mon, 20 Jan 2020, Ulrich Weigand wrote:
> 
> > This has the effect that e.g. after
> > 
> >   -ffast-math -fno-finite-math-only
> > 
> > the __FAST_MATH__ macro is no longer predefined, but after
> > 
> >   -ffast-math -fno-associative-math
> > 
> > the __FAST_MATH__ macro still *is* predefined, even though both
> > -ffinite-math-only and -fassociative-math are implied by -ffast-math.
> > 
> > Is this deliberate?  (If so, is it documented somewhere?)
> 
> I'd expect both to cause it to be undefined.  My guess would be that some 
> past patch, after fast_math_flags_set_p was added, added more flags to 
> -ffast-math but accidentally failed to update fast_math_flags_set_p; you'd 
> need to look at past patches adding flags to -ffast-math to confirm if 
> it's accidental.

It looks like there's multiple cases here.  For the two flags
-fassociative-math and -freciprocal-math, it seems to have happened just as
you describe: they were created (split out of -funsafe-math-optimizations)
in commit a1a826110720eda37c73f829daa4ee243ee953f5, which however did not
update fast_math_flags_set_p.

For the other three flags, -fsignaling-nans, -frounding-math, and
-fcx-limited-range, the story appears to be a bit different: from the
very beginning, those were treated somewhat differently: these are
only modified as a result of -ffast-math, not -fno-fast-math (like
the other flags derived from -ffast-math), and they are not considered
by fast_math_flags_set_p.  (I'm not sure if there is any particular
reason why these should be treated differently here, though.)

Finally, there is one "mixed" flag, -fexcess-precision, which is handled
like the above three in that its default is only modified as a result of
-ffast-math, not -fno-fast-math; but nevertheless this flag *is* checked
in fast_math_flags_set_p.

Do you think there is something that ought to be fixed here?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?

2020-01-27 Thread Ulrich Weigand
Joseph Myers wrote:
> On Tue, 21 Jan 2020, Ulrich Weigand wrote:
> 
> > It looks like there's multiple cases here.  For the two flags
> > -fassociative-math and -freciprocal-math, it seems to have happened just as
> > you describe: they were created (split out of -funsafe-math-optimizations)
> > in commit a1a826110720eda37c73f829daa4ee243ee953f5, which however did not
> > update fast_math_flags_set_p.
> 
> So that's a bug.

OK, agreed.

> > For the other three flags, -fsignaling-nans, -frounding-math, and
> > -fcx-limited-range, the story appears to be a bit different: from the
> 
> The first two of those are disabled by default as well as disabled by 
> -ffast-math, so it seems right that -fno-fast-math does nothing with them 
> and that they aren't checked by fast_math_flags_set_p.

I see.  I guess that makes me wonder what -fno-fast-math *ever* does
(except canceling a -ffast-math earlier on the command line).  Looking
at the current code, -fno-fast-math (just like -ffast-math) only ever
sets flags whose default is not overridden on the command line, but
then it always sets them to their default value!

Am I missing something here?  If that's the intent, it might be cleaner
to write set_fast_math_flags as just one big
  if (set)
{
}

> The last one is disabled by default but enabled by -ffast-math.  So it 
> would seem appropriate to handle it like other such options, disable it 
> with -fno-fast-math, and check it in fast_math_flags_set_p.

OK.

> > Finally, there is one "mixed" flag, -fexcess-precision, which is handled
> > like the above three in that its default is only modified as a result of
> > -ffast-math, not -fno-fast-math; but nevertheless this flag *is* checked
> > in fast_math_flags_set_p.
> 
> That one's trickier because the default depends on whether a C standards 
> conformance mode is specified.

This also makes sense if we consider the semantics of -fno-fast-math to
just leave all component flags at their default, as above ...

(As an aside, the current code is even more confusing as it has a dead
condition:

  if (set)
{
  if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT)
opts->x_flag_excess_precision
  = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;

The second test of "set" must always be true here, so this will never actually
actively set the flag to EXCESS_PRECISION_DEFAULT.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?

2020-01-28 Thread Ulrich Weigand
Joseph Myers wrote:
> On Mon, 27 Jan 2020, Ulrich Weigand wrote:
> > I see.  I guess that makes me wonder what -fno-fast-math *ever* does
> > (except canceling a -ffast-math earlier on the command line).  Looking
> > at the current code, -fno-fast-math (just like -ffast-math) only ever
> > sets flags whose default is not overridden on the command line, but
> > then it always sets them to their default value!
> 
> As a general principle, more specific flags take precedence over less 
> specific ones, regardless of the command-line order.  So it's correct for 
> -ffast-math and -fno-fast-math not to do anything with a flag that was 
> explicitly overridden by the user (modulo any issues where a particular 
> combination of flags is unsupported by GCC, as with the 
> "%<-fassociative-math%> disabled; other options take precedence" case in 
> toplev.c).

Sure, I agree with the intended behavior as you describe.  I was just 
confused under what circumstances set_fast_math_flags with set == 0
ever does anything.  However, I've now understood that this is required
e.g. for the -Ofast -fno-fast-math case (this seems to be the only case
where set_fast_math_flags is called twice, first with set == 1 and then
with set == 0).

And indeed, in this case we see the effect of not resetting the
flag_cx_limited_range (on x86_64):

$ gcc -E -dD -x c /dev/null -std=c99 | grep GCC_IEC
#define __GCC_IEC_559 2
#define __GCC_IEC_559_COMPLEX 2
$ gcc -E -dD -x c /dev/null -std=c99 -Ofast | grep GCC_IEC
#define __GCC_IEC_559 0
#define __GCC_IEC_559_COMPLEX 0
$ gcc -E -dD -x c /dev/null -std=c99 -Ofast -fno-fast-math | grep GCC_IEC
#define __GCC_IEC_559 2
#define __GCC_IEC_559_COMPLEX 0

Note how __GCC_IEC_559_COMPLEX is not properly reset.

Similarly, we see the effect of not resetting flag_excess_precision
(only on i386):

$ gcc -E -dD -x c /dev/null -m32 -std=c99 | grep GCC_IEC
#define __GCC_IEC_559 2
#define __GCC_IEC_559_COMPLEX 2
$ gcc -E -dD -x c /dev/null -m32 -std=c99 -Ofast | grep GCC_IEC
#define __GCC_IEC_559 0
#define __GCC_IEC_559_COMPLEX 0
$ gcc -E -dD -x c /dev/null -m32 -std=c99 -Ofast -fno-fast-math | grep GCC_IEC
#define __GCC_IEC_559 0
#define __GCC_IEC_559_COMPLEX 0

Note how __GCC_IEC_559 is not properly reset either.

So it seems to me that indeed both of these should be reset
in the !set case of set_fast_math_flags.

In addition, we've already agreed that these three flags should be
checked in fast_math_flags_set_p, where they are currently missing:

  flag_associative_math
  flag_reciprocal_math
  flag_cx_limited_range

Finally, this remaining piece of code in set_fast_math_flags:

 if (set)
   {
 if (!opts->frontend_set_flag_signaling_nans)
   opts->x_flag_signaling_nans = 0;
 if (!opts->frontend_set_flag_rounding_math)
   opts->x_flag_rounding_math = 0;
   }

seems always a no-op since it only ever sets the flags to their
default value when they still must have that same default value.

For clarity, I'd suggest to either remove this code or replace
it with a comment.


The following patch (not yet fully tested) implements the above changes.
Note that this now brings the set of flags set and reset by
set_fast_math_flags completely in sync with the set of flags
tested in fast_math_flags_set_p.

Does this make sense to you?

Thanks,
Ulrich


ChangeLog:

* opts.c (set_fast_math_flags): In the !set case, also reset
x_flag_cx_limited_range and x_flag_excess_precision.  Remove dead
code resetting x_flag_signaling_nans and x_flag_rounding_math.
(fast_math_flags_set_p): Also test x_flag_cx_limited_range,
x_flag_associative_math, and x_flag_reciprocal_math.


diff --git a/gcc/opts.c b/gcc/opts.c
index 7affeb4..4452793 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2850,18 +2850,14 @@ set_fast_math_flags (struct gcc_options *opts, int set)
 opts->x_flag_finite_math_only = set;
   if (!opts->frontend_set_flag_errno_math)
 opts->x_flag_errno_math = !set;
-  if (set)
-{
-  if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT)
-   opts->x_flag_excess_precision
- = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
-  if (!opts->frontend_set_flag_signaling_nans)
-   opts->x_flag_signaling_nans = 0;
-  if (!opts->frontend_set_flag_rounding_math)
-   opts->x_flag_rounding_math = 0;
-  if (!opts->frontend_set_flag_cx_limited_range)
-   opts->x_flag_cx_limited_range = 1;
-}
+  if (!opts->frontend_set_flag_cx_limited_range)
+opts->x_flag_cx_limited_range = set;
+  if (!opts->frontend_set_flag_excess_precision)
+opts->x_flag_excess_precision
+  = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT;
+
+  // -ffast-math should also reset -fsignaling-nans and -frounding-math,
+  // but since those are off by default, there

Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?

2020-01-29 Thread Ulrich Weigand
Joseph Myers wrote:
> On Tue, 28 Jan 2020, Ulrich Weigand wrote:
> 
> > The following patch (not yet fully tested) implements the above changes.
> > Note that this now brings the set of flags set and reset by
> > set_fast_math_flags completely in sync with the set of flags
> > tested in fast_math_flags_set_p.
> > 
> > Does this make sense to you?
> 
> It makes sense, but this is not a review of the patch.

Understood, thanks for having a look!  I'll go ahead and submit the
patch for review as usual then.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: reload question about unmet constraints

2015-09-15 Thread Ulrich Weigand
Jim Wilson wrote:
> On Mon, Sep 14, 2015 at 11:05 PM, DJ Delorie  wrote:
> > As a test, I added this API.  It seems to work.  I suppose there could
> > be a better API where we determine if a constrain matches various
> > memory spaces, then compare with the memory space of the operand, but
> > I can't prove that's sufficiently flexible for all targets that
> > support memory spaces.  Heck, I'm not even sure what to call the
> > macro, and 
> > "TARGET_IS_THIS_MEMORY_ADDRESS_RELOADABLE_TO_MATCH_THIS_CONTRAINT_P()"
> > is a little long ;-)
> >
> > What do we think of this direction?
> 
> We already have define_constraint and define_memory_constraint.  We
> could perhaps add a define_special_memory_constraint that returns
> CT_SPECIAL_MEMORY which mostly operates like CT_MEMORY, except that it
> doesn't assume any MEM can be reloaded to match.

But the only difference between define_memory_constraint and a plain
define_constraint is just that define_memory_constraint guarantees
that any memory operand can be made valid by reloading the address
into a base register ...

If the set of operands accepted by a constraint does *not* have that
property, it must not be defined via define_memory_constraint, and
you should simply use define_constraint instead.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: reload question about unmet constraints

2015-09-15 Thread Ulrich Weigand
Jim Wilson wrote:
> On Tue, Sep 15, 2015 at 7:42 AM, Ulrich Weigand  wrote:
> > But the only difference between define_memory_constraint and a plain
> > define_constraint is just that define_memory_constraint guarantees
> > that any memory operand can be made valid by reloading the address
> > into a base register ...
> >
> > If the set of operands accepted by a constraint does *not* have that
> > property, it must not be defined via define_memory_constraint, and
> > you should simply use define_constraint instead.
> 
> An invalid near mem can be converted to a valid near mem by reloading
> its address into a base reg.  An invalid far mem can be converted to a
> valid far mem by reloading its address into a base reg.  But one can't
> convert a near mem to a far mem by reloading the address, nor can one
> convert a far mem to a near mem by reloading its address.  So we need
> another dimension to the validity testing here, besides the question
> of whether the address can be reloaded, there is the question of
> whether it is in the right address space.  Though I don't think the
> rl78 is actually using address spaces, and it isn't clear if that
> would help.

I see.  Is it correct then to say that reload will never be able to
change a near mem into a far mem or vice versa?  If that is true, there
doesn't appear to be any real benefit to having both near and far mem
operations as *alternatives* to the same insn pattern.

In that case, you might be able to fix the bug by splitting the
offending insns into two patterns, one only handling near mems
and one handling one far mems, where the near/far-ness of the mem
is verified by the *predicate* and not the constraints.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: reload question about unmet constraints

2015-09-16 Thread Ulrich Weigand
DJ Delorie wrote:

> > In that case, you might be able to fix the bug by splitting the
> > offending insns into two patterns, one only handling near mems
> > and one handling one far mems, where the near/far-ness of the mem
> > is verified by the *predicate* and not the constraints.
> 
> But this means that when reload needs to, it moves far mems into
> registers, which changes which insn is matched...  It also adds a
> *lot* of new patterns, since any of the three operands can be far, and
> '0' constraints on far are allowed also - and most insns allow far
> this way, so could be up to seven times as many patterns.

I still think you should get by with two patterns, basically one that
represents the case where you emit an override, and one for the case
without override.

And in fact, you should be able to decide at *expand* time which
of the two you need for the given set of operands.  It may be
that you'll have to load a far mem into a register here, at
expand time, if none of the two cases would otherwise match.
This is best done at expand, since there's really nothing
reload can do any better even if you wait until then.

Then you need to ensure that reload will not accidentally switch
that decision again, so you'll probably need constraints after all.
Something like this ought to work:

 (define_insn "*add3_virt_far"
   [(set (match_operand:QHI   0 "rl78_nonimmediate_operand" "=Wfr,v,v")
 (plus:QHI (match_operand:QHI 1 "rl78_general_operand" "vi0,Wfr,vi")
   (match_operand:QHI 2 "rl78_general_operand" "vi0,vi1,Wfr")))
]
   "rl78_virt_insns_ok () && rl78_far_insn_p (operands)"
   "v.add\t%0, %1, %2"
 )

 (define_insn "*add3_virt_near"
   [(set (match_operand:QHI   0 "rl78_nonfar_nonimm_operand" "=vY,S")
 (plus:QHI (match_operand:QHI 1 "rl78_nonfar_operand" "viY,0")
   (match_operand:QHI 2 "rl78_nonfar_operand" "viY,i")))
]
   "rl78_virt_insns_ok ()"
   "v.add\t%0, %1, %2"
 )

[It might be possible to emit both patterns from a single macro.]

rl78_far_insn_p needs to be a function that returns true if there is
exactly one, possibly duplicated, operand that is a far mem.

The Wfr constraint must not be marked as memory constraint (so as to avoid
reload attmpting to use it to access a stack slot).  But the Y constraint
*can* be marked as memory constraint.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: reload question about unmet constraints

2015-09-17 Thread Ulrich Weigand
DJ Delorie wrote:

> > And in fact, you should be able to decide at *expand* time which
> > of the two you need for the given set of operands.
> 
> I already check for multiple fars at expand, and force all but one of
> them to registers.

OK, that's good.

> Somewhere before reload they get put back in.

To avoid that, you need a check like this:

> >"rl78_virt_insns_ok () && rl78_far_insn_p (operands)"
> 
> Since when does this work reliably?  I've seen cases where insns get
> mashed together without regard for validity before...
> 
> I tested just this change - adding that function to addhi3 plus the
> Wfr constraint sets - and it seems to work.  The big question to me
> now is - is this *supposed* to work this way?  Or is is a coincidence
> that the relevent passes happen to check that function?

This certainly supposed to work reliably.  Every pass that changes an
insn is supposed to re-recognize it, and that will check the operand
predicates as well as the overall insn predicate.

Note that there are some exceptions, in particular reload, which will
make certain types of changes to insn operands *without* fully
revalidating the insn, because it simply assumes those changes are
safe.  That's why in addition to the insn predicate, you also need to
constrain reload's possible actions to those that are guaranteed 
to not break the predicate.

> > The Wfr constraint must not be marked as memory constraint (so as to
> > avoid reload attempting to use it to access a stack slot).
> 
> This also prevents reload from reloading the address when it *is*
> needed.  However, it seems to work ok even as a memory constraint.  Is
> this change *just* because of the stack slots?  Could you give an
> example of how it could be misused, so I can understand the need?

If the constraint is marked as "memory constraint", reload will assume
that everything that is forced to memory by reload can be handled by
this alternative.  This includes:
- a pseudo that was forced to a stack slot
- a pseudo that is REG_EQUIV to some memory location
- any memory operand whose address already had to be reloaded
  since it was not legitimate
- any immediate that was forced to the constant pool
(and possibly others that I'm not thinking of right now).

So in general, it's really not safe to mark a constraint that accepts
only far memory as "memory constraint" with current reload.

Note that *not* marking the constraint as memory constraint actually
does not prevent reload from fixing up illegitimate addresses, so you
shouldn't really see much drawbacks from not marking it ...

(You really need the memory constraint marker for constraints that
accept only a *subset* of legitimate addresses, so that reload will
attempt to reload even addresses that would otherwise be considered
legitimate.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[commit, spu] Re: [BUILDROBOT] spu: left shift of negative value

2015-09-21 Thread Ulrich Weigand
Jan-Benedict Glaw wrote:

> I just noticed that (for config_list.mk builds), current GCC errors
> out at spu.c, see eg. build
> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=3D469639 :
> 
> g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-excep=
> tions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrit=
> e-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -peda=
> ntic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -f=
> no-common  -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. =
> -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/c=
> farm/mpc/include  -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../=
> libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace   -o =
> spu.o -MT spu.o -MMD -MP -MF ./.deps/spu.TPo ../../../gcc/gcc/config/spu/sp=
> u.c
> ../../../gcc/gcc/config/spu/spu.c: In function =E2=80=98void spu_expand_ins=
> v(rtx_def**)=E2=80=99:
> ../../../gcc/gcc/config/spu/spu.c:530:46: error: left shift of negative val=
> ue [-Werror=3Dshift-negative-value]
>maskbits =3D (-1ll << (32 - width - start));
>   ^
> ../../../gcc/gcc/config/spu/spu.c:536:46: error: left shift of negative val=
> ue [-Werror=3Dshift-negative-value]
>maskbits =3D (-1ll << (64 - width - start));
>   ^
> cc1plus: all warnings being treated as errors
> Makefile:2092: recipe for target 'spu.o' failed
> make[2]: *** [spu.o] Error 1
> make[2]: Leaving directory '/home/jbglaw/build-configlist_mk/spu-elf/build-=
> gcc/mk/spu-elf/gcc'

I've now checked in the following fix.

Thanks,
Ulrich


ChangeLog:

* config/spu/spu.c (spu_expand_insv): Avoid undefined behavior.

Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 227968)
--- gcc/config/spu/spu.c(working copy)
*** spu_expand_insv (rtx ops[])
*** 472,478 
  {
HOST_WIDE_INT width = INTVAL (ops[1]);
HOST_WIDE_INT start = INTVAL (ops[2]);
!   HOST_WIDE_INT maskbits;
machine_mode dst_mode;
rtx dst = ops[0], src = ops[3];
int dst_size;
--- 472,478 
  {
HOST_WIDE_INT width = INTVAL (ops[1]);
HOST_WIDE_INT start = INTVAL (ops[2]);
!   unsigned HOST_WIDE_INT maskbits;
machine_mode dst_mode;
rtx dst = ops[0], src = ops[3];
int dst_size;
*** spu_expand_insv (rtx ops[])
*** 527,541 
switch (dst_size)
  {
  case 32:
!   maskbits = (-1ll << (32 - width - start));
if (start)
!   maskbits += (1ll << (32 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 64:
!   maskbits = (-1ll << (64 - width - start));
if (start)
!   maskbits += (1ll << (64 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 128:
--- 527,541 
switch (dst_size)
  {
  case 32:
!   maskbits = (~(unsigned HOST_WIDE_INT)0 << (32 - width - start));
if (start)
!   maskbits += ((unsigned HOST_WIDE_INT)1 << (32 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 64:
!   maskbits = (~(unsigned HOST_WIDE_INT)0 << (64 - width - start));
if (start)
!   maskbits += ((unsigned HOST_WIDE_INT)1 << (64 - start));
emit_move_insn (mask, GEN_INT (maskbits));
break;
  case 128:


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Debugger support for __float128 type?

2015-09-30 Thread Ulrich Weigand
Hello,

I've been looking into supporting __float128 in the debugger, since we're
now introducing this type on PowerPC.  Initially, I simply wanted to do
whatever GDB does on Intel, but it turns out debugging __float128 doesn't
work on Intel either ...

The most obvious question is, how should the type be represented in
DWARF debug info in the first place?  Currently, GCC generates on i386:

.uleb128 0x3# (DIE (0x2d) DW_TAG_base_type)
.byte   0xc # DW_AT_byte_size
.byte   0x4 # DW_AT_encoding
.long   .LASF0  # DW_AT_name: "long double"

and

.uleb128 0x3# (DIE (0x4c) DW_TAG_base_type)
.byte   0x10# DW_AT_byte_size
.byte   0x4 # DW_AT_encoding
.long   .LASF1  # DW_AT_name: "__float128"

On x86_64, __float128 is encoded the same way, but long double is:

.uleb128 0x3# (DIE (0x31) DW_TAG_base_type)
.byte   0x10# DW_AT_byte_size
.byte   0x4 # DW_AT_encoding
.long   .LASF0  # DW_AT_name: "long double"

Now, GDB doesn't recognize __float128 on either platform, but on i386
it could at least in theory distinguish the two via DW_AT_byte_size.

But on x86_64 (and also on powerpc), long double and __float128 have
the identical DWARF encoding, except for the name.

Looking at the current DWARF standard, it's not really clear how to
make a distinction, either.  The standard has no way to specifiy any
particular floating-point format; the only attributes for a base type
of DW_ATE_float encoding are related to the size.

(For the Intel case, one option might be to represent the fact that
for long double, there only 80 data bits and the rest is padding, via
some combination of the DW_AT_bit_size and DW_AT_bit_offset or
DW_AT_data_bit_offset attributes.  But that wouldn't help for PowerPC
since both long double and __float128 really use 128 data bits,
just different encodings.)

Some options might be:

- Extend the official DWARF standard in some way

- Use a private extension (e.g. from the platform-reserved
  DW_AT_encoding value range)

- Have the debugger just hard-code a special case based
  on the __float128 name 

Am I missing something here?  Any suggestions welcome ...

B.t.w. is there interest in fixing this problem for Intel?  I notice
there is a GDB bug open on the issue, but nothing seems to have happened
so far: https://sourceware.org/bugzilla/show_bug.cgi?id=14857

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Debugger support for __float128 type?

2015-10-01 Thread Ulrich Weigand
Joseph Myers wrote:
> On Wed, 30 Sep 2015, Ulrich Weigand wrote:
> 
> > - Extend the official DWARF standard in some way
> 
> I think you should do this.
> 
> Note that TS 18661-4 will be coming out very soon, and includes (optional) 
> types
> 
> * _FloatN, where N is 16, 32, 64 or >= 128 and a multiple of 32;
> 
> * _DecimalN, where N >= 32 and a multiple of 32;
> 
> * _Float32x, _Float64x, _Float128x, _Decimal64x, _Decimal128x
> 
> so this is not simply a matter of supporting a GNU extension (not that 
> it's simply a GNU extension on x86_64 anyway - __float128 is explicitly 
> mentioned in the x86_64 ABI document), but of supporting an ISO C 
> extension, in any case where one of the above types is the same size and 
> radix as float / double / long double but has a different representation.

Ah, thanks for pointing these out!

The _DecimalN types are already supported by DWARF using a base type with
encoding DW_ATE_decimal_float and the appropriate DW_AT_byte_size.

For the interchange type, it seems one could define a new encoding,
e.g. DW_ATE_interchange_float, and use this together with the
appropriate DW_AT_byte_size to identify the format.

However, for the extended types, the standard does not define a storage
size or even a particular encoding, so it's not quite clear how to
handle those.  In theory, two different extended types could even have
the same size ...

On the other hand, in practice on current systems, it will probably be
true that if you look at the set of all of the basic and extended (binary)
types, there will be at most two different encodings for any given size,
one corresponding to the interchange format of that size, and one other;
so mapping those to DW_ATE_float vs. DW_ATE_interchange_float might
suffice.

I'm not sure how to handle an extended decimal format that does not
match any of the decimal interchange formats.  Does this occur in
practice at all?

> (All the above are distinct types in C, and distinct from float, double, 
> long double even if the representations are the same.  But I don't think 
> DWARF needs to distinguish e.g. float and _Float32 other than by their 
> name - it's only the case of different representations that needs 
> distinguishing.  The _Float* and _Float*x types have corresponding complex 
> types, but nothing further should be needed in DWARF for those once you 
> can represent _Float*.)

Well, complex types have their own encoding (DW_ATE_complex_float), so we'd
have to define the corresponding variants for those as well, e.g.
DW_ATE_complex_interchange_float or the like.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Debugger support for __float128 type?

2015-10-02 Thread Ulrich Weigand
Joseph Myers wrote:
> On Thu, 1 Oct 2015, Ulrich Weigand wrote:
> 
> > The _DecimalN types are already supported by DWARF using a base type with
> > encoding DW_ATE_decimal_float and the appropriate DW_AT_byte_size.
> 
> Which doesn't actually say whether the DPD or BID encoding is used, but as 
> long as each architecture uses only one that's not a problem in practice.

I see.  Well, one could add a DW_ATE_decimal_interchange_float for
completeness, if necessary.
 
> > For the interchange type, it seems one could define a new encoding,
> > e.g. DW_ATE_interchange_float, and use this together with the
> > appropriate DW_AT_byte_size to identify the format.
> 
> It's not clear to me that (for example) distinguishing float and _Float32 
> (other than by name) is useful in DWARF (and if you change float from 
> DW_ATE_float to DW_ATE_interchange_float that would affect old debuggers - 
> is the idea to use DW_ATE_interchange_float only for the new types, not 
> for old types with the same encodings, so for _Float32 but not float?).  
> But it's true that if you say it's an interchange type then together with 
> size and endianness that uniquely determines the encoding.

So my thinking here was: today, DWARF deliberately does not specify the
details of the floating-point encoding format, so that it doesn't have
to get into all the various formats that exist on all the platforms
supported by DWARF.  That is why a DW_ATE_float encoding simply says;
this is a floating-point number of size N encoded as defined by the
platform ABI.

The new DW_ATE_interchange_float encoding would say instead; this is
a floating-point number of size N encoded as defined by the IEEE
interchange format.

On platforms where the ABI-defined format actually *is* the interchange
format, a DWARF producer would be free to use either DW_ATE_float or
DW_ATE_interchange_float.  This decision could of course take into
consideration compatibility requirements with older debuggers etc.

However, having two encoding definitions would allow platforms to use
both the interchange format and one additional platform-defined
non-interchange format of the same size, if needed.

> > Well, complex types have their own encoding (DW_ATE_complex_float), so we'd
> > have to define the corresponding variants for those as well, e.g.
> > DW_ATE_complex_interchange_float or the like.
> 
> And DW_ATE_imaginary_interchange_float, I suppose.

Right.


As an alternative to specifying the well-defined interchange format,
another option might be to simply add a second DWARF attribute,
e.g. DW_AT_encoding_variant, to floating-point and related base types.
This would simply be an integer with platform-specific semantics.
So DWARF producers could simply describe a type as:
  this is a floating-point number of size N encoded as defined by
  platform ABI encoding variant #V

(If the attribute isn't present, we'd default to variant 0, which
is just the current encoding.)

This would allow an arbitrary number of platform-specific encodings,
any of which might or might not be IEEE-defined formats ...


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Debugger support for __float128 type?

2015-10-02 Thread Ulrich Weigand
Joseph Myers wrote:
> On Fri, 2 Oct 2015, Ulrich Weigand wrote:
> > I see.  Well, one could add a DW_ATE_decimal_interchange_float for
> > completeness, if necessary.
> 
> Since both DPD and BID are interchange encodings, that doesn't actually 
> determine things without some way to say which was used (that is, both 
> DW_ATE_decimal_float and DW_ATE_decimal_interchange_float would rely on 
> platform-specific information to determine the format).  I don't know if 
> DW_ATE_decimal_float is being used anywhere for something that's not an 
> interchange format.

Ah, yes.  I missed that both DPD and BID are defined as interchange
formats.  This suggestion doesn't make sense then ...

> > As an alternative to specifying the well-defined interchange format,
> > another option might be to simply add a second DWARF attribute,
> > e.g. DW_AT_encoding_variant, to floating-point and related base types.
> > This would simply be an integer with platform-specific semantics.
> > So DWARF producers could simply describe a type as:
> >   this is a floating-point number of size N encoded as defined by
> >   platform ABI encoding variant #V
> 
> Do you want entirely platform-specific semantics?  Or would it be better 
> to define standard values to mean it's an IEEE interchange format (or, for 
> decimal floating point, to specify whether it's DPD or BID), plus space 
> for future standard values and space for platform-specific values?

Hmm, I had been thinking of leaving that entirely platform-specific.
I guess one could indeed define some values with well-defined standard
semantics; that would assume DWARF would want to start getting into the
game of defining floating-point formats -- not sure what the position
of the committee would be on this question ...

[ Back when DW_ATE_decimal_float was added, the initial proposal did
indeed specify the encoding should follow IEEE-754R, but that was
removed when the proposal was actually added to the standard.  ]

> Would existing consumers safely ignore that attribute (so that producers 
> could safely specify IEEE interchange encoding for float, double etc. if 
> applicable, without breaking existing consumers)?

Yes, existing consumers should simply ignore attributes they are not
aware of.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: reload question about unmet constraints

2015-10-07 Thread Ulrich Weigand
DJ Delorie wrote:

> For such constraints that are memory operands but not
> define_memory_constraint, you need to use '*' to keep reload from
> trying to guess a register class from them (it guesses wrong for
> rl78).
> 
> I.e. use "*Wfr" instead of "Wfr".

Huh?  That seems weird.  It should make no difference for purposes
of register class preferences whether a constraint is marked as
memory constraint or extra constraint: neither contributes to
register class preferences at all.

If the '*' makes any difference, I guess this can only be because
IRA chooses another alternative for the insn to begin with.

Do you have an example that shows the problem?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: "cc" clobber

2016-02-01 Thread Ulrich Weigand
David Wohlferd wrote:
> On 1/26/2016 4:31 PM, Bernd Schmidt wrote:
> > On 01/27/2016 12:12 AM, David Wohlferd wrote:
> >> I started by just commenting out the code in ix86_md_asm_adjust that
> >> unconditionally clobbered the flags.  I figured this would allow the
> >> 'normal' "cc" handling to occur.  But apparently there is no 'normal'
> >> "cc" handling.
> >
> > I have a dim memory that there's a difference between the "cc" and 
> > "CC" spellings. You might want to check that.
> 
> I checked, but my scan of the current code isn't turning up anything for 
> "CC" related to clobbers either.
> 
> While presumably "cc" did something at one time, apparently now it's 
> just an unenforced comment (on extended asm).  Not a problem, just a bit 
> of a surprise.

I think on many targets a clobber "cc" works because the backend
actually defines a register named "cc" to correspond to the flags.
Therefore the normal handling of clobbering named hard registers
catches this case as well.

This doesn't work on i386 because there the flags register is called
"flags" in the back end.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: GCC libatomic ABI specification draft

2016-12-20 Thread Ulrich Weigand
Torvald Riegel wrote:
> On Fri, 2016-12-02 at 12:13 +0100, Gabriel Paubert wrote:
> > On Thu, Dec 01, 2016 at 11:13:37AM -0800, Bin Fan at Work wrote:
> > > Thanks for the comment. Yes, the ABI requires libatomic must query the 
> > > hardware. This is 
> > > necessary if we want the compiler to generate inlined code for 16-byte 
> > > atomics. Note that 
> > > this particular issue only affects x86. 
> > 
> > Why? Power (at least recent ones) has 128 bit atomic instructions
> > (lqarx/stqcx.) and Z has 128 bit compare and swap. 
> 
> That's not the only factor affecting whether cmpxchg16b or such is used
> for atomics.  If the HW just offers a wide CAS but no wide atomic load,
> then even an atomic load is not truly just a load, which breaks (1)
> atomic loads on read-only mapped memory and (2) volatile atomic loads
> (unless we claim that an idempotent store is like a load, which is quite
> a stretch for volatile I think).

I may have missed the context of the discussion, but just on the
specific ISA question here: both Power and z not only have the
16-byte CAS (or load-and-reserve/store-conditional), but they also both
have specific 16-byte atomic load and store instructions (lpq/stpq
on z, lq/stq on Power).

Those are available on any system supporting z/Architecture (z900 and up),
and on any Power system supporting the V2.07 ISA (POWER8 and up).  GCC
does in fact use those instructions to implement atomic operations on
16-byte data types on those machines.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: GCC libatomic ABI specification draft

2016-12-22 Thread Ulrich Weigand
Szabolcs Nagy wrote:
> On 20/12/16 13:26, Ulrich Weigand wrote:
> > I may have missed the context of the discussion, but just on the
> > specific ISA question here: both Power and z not only have the
> > 16-byte CAS (or load-and-reserve/store-conditional), but they also both
> > have specific 16-byte atomic load and store instructions (lpq/stpq
> > on z, lq/stq on Power).
> > 
> > Those are available on any system supporting z/Architecture (z900 and up),
> > and on any Power system supporting the V2.07 ISA (POWER8 and up).  GCC
> > does in fact use those instructions to implement atomic operations on
> > 16-byte data types on those machines.
> 
> that's a bug.
> 
> at least i don't see how gcc makes sure the libatomic
> calls can interoperate with inlined atomics.

Hmm, interesting.  On z, there is no issue with ISA levels, since *all*
64-bit platforms support the 16-byte atomics (and on non-64-bit platforms,
16-byte data types are not supported at all).

However, there still seems to be a problem, but this time related to
alignment issues.  We do have the 16-byte atomic instructions, but they
only work on 16-byte aligned data.  This is a problem in particular
since the default alignment of 16-byte data types is still 8 bytes
on our platform (since the ABI only guarantees 8-byte stack alignment).

That's why the libatomic configure check thinks it cannot use the
atomic instructions when building on z, and generates code that uses
the separate lock.  However, *if* a particular object can be proven
by the compiler to be 16-byte aligned, it will emit the inline
atomic instruction.  This means there is indeed a bug if that same
object is also operated on via the library routine.

Andreas suggested that the best way to fix this would be to add a
runtime alignment check to the libatomic routines and also use the
atomic instructions in the library whenever the object actually
happens to be correctly aligned.  It seems that this should indeed
fix the problem (and also use the most efficient way in all cases).


Not sure about Power -- adding David and Segher on CC ...


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Issue with sub-object __builtin_object_size

2014-06-10 Thread Ulrich Weigand
Hello,

the following C++ test case:

struct pollfd
  {
int fd;
short int events;
short int revents;
  };

struct Pollfd : public pollfd { };

struct Pollfd myfd[10];

int test (void)
{
  return __builtin_object_size ((struct pollfd *)myfd, 1);
}

ends up returning 8 from the "test" routine, not 80.


In the real-world application this test case was extracted from,
this causes a call:

  poll(myfd, count, 0);  // 1 < count < 10

to fail with a "Buffer overflow detected" message at run-time
when building with _FORTIFY_SOURCE = 2 against glibc.  [ Here,
there is no explicit cast, but it is implied by the prototype
of the "poll" routine. ]

(Note that in the real-world application, the derived struct Pollfd
has some member functions to construct and pretty-print the structure,
but has no additional data members.)


>From the __builtin_object_size documentation, it's not immediately
clear to me whether this is supposed to work or not:

   If the least significant
   bit is clear, objects are whole variables, if it is set, a closest
   surrounding subobject is considered the object a pointer points to.

Is the presence of the above cast (explicit or implicit) supposed to
modify the notion of "closest surrounding subobject"?


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Issue with attribute ((aligned)) ?

2014-06-17 Thread Ulrich Weigand
Hello,

in comparing ABI conformance across compilers I ran across an issue where
GCC's behavior does appear to deviate from the manual.

In http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html, we have:

aligned (alignment)
This attribute specifies a minimum alignment for the variable or
structure field, measured in bytes. 
[snip]
When used on a struct, or struct member, the aligned attribute can
only increase the alignment; in order to decrease it, the packed
attribute must be specified as well.


However, the following test case:

struct test
{
  int * __attribute__ ((aligned (2))) a;
};

int main (void)
{
  printf ("align: %d\n", (int)__alignof__(struct test));
}

outputs an alignment of 2.  Based on the above wording, I would
have expected an alignment of 8 on a 64-bit machine.  (And the
latter is indeed what LLVM, for example, implements.)

Note that this appears to happen only in the specific
declaration above; things work as documented when using e.g.

  int __attribute__ ((aligned (2))) a;

or even

  int *a __attribute__ ((aligned (2)));


In fact, this also raises the question what the exact semantics
of the above construct is; according to the manual in seems that

  int * __attribute__ ((aligned (2))) a;

ought to define a (default-aligned) pointer to a 2-aligned int,
in which case the above distinction would be moot and we'd
expect __alignof__(struct test) to be 8 in any case.
(see http://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html)

That does't appear to be what's happening either; with
  struct test x;
  printf ("align: %d %d\n", (int)__alignof__(x.a), (int)__alignof__(*x.a));
I get 2 for __alignof__(x.a) and 4 for __alignof__(*x.a), so it
does appear the alignment is being applied (if incorrectly) to
the pointer, not the pointed-to data ...  (LLVM actually also
does it this way.)


Is this a bug in GCC (or the documentation)?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Issue with sub-object __builtin_object_size

2014-09-15 Thread Ulrich Weigand
Jakub Jelinek wrote:
> On Tue, Jun 10, 2014 at 07:37:50PM +0200, Ulrich Weigand wrote:
> > the following C++ test case:
> > 
> > struct pollfd
> >   {
> > int fd;
> > short int events;
> > short int revents;
> >   };
> > 
> > struct Pollfd : public pollfd { };
> > 
> > struct Pollfd myfd[10];
> > 
> > int test (void)
> > {
> >   return __builtin_object_size ((struct pollfd *)myfd, 1);
> > }
> > 
> > ends up returning 8 from the "test" routine, not 80.
> 
> The thing is that the C++ FE transforms this kind of cast to
> &((struct Pollfd *) &myfd)->D.2233
> so for middle-end where __builtin_object_size is evaluated this
> is like:
> struct Pollfd { struct pollfd something; };
> struct Pollfd myfd[10];
> int test (void)
> {
>   return __builtin_object_size (&myfd[0].something, 1);
> }
> and returning 8 in that case is completely correct.
> So the question is why instead of a simple cast it created
> ADDR_EXPR of a FIELD_DECL instead.

Sorry for the long delay, I finally found time to look into this
problem again.  This behavior of the C++ front-end seems to trace
back to a deliberate change by Jason here:
https://gcc.gnu.org/ml/gcc-patches/2004-04/msg02000.html

"This patch changes the representation to b..i,
which is more friendly to alias analysis."

It still seems the effect of this on __builtin_object_size is
surprising and probably not what was intended.  I'm not sure
whether the front-end or the middle-end is more "at fault" here.

Thoughts?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Issue with sub-object __builtin_object_size

2014-09-15 Thread Ulrich Weigand
Jason Merrill wrote:
> On 09/15/2014 11:21 AM, Ulrich Weigand wrote:
> > Jakub Jelinek wrote:
> >> On Tue, Jun 10, 2014 at 07:37:50PM +0200, Ulrich Weigand wrote:
> >>> the following C++ test case:
> >>>
> >>> struct pollfd
> >>>{
> >>>  int fd;
> >>>  short int events;
> >>>  short int revents;
> >>>};
> >>>
> >>> struct Pollfd : public pollfd { };
> >>>
> >>> struct Pollfd myfd[10];
> >>>
> >>> int test (void)
> >>> {
> >>>return __builtin_object_size ((struct pollfd *)myfd, 1);
> >>> }
> >>>
> >>> ends up returning 8 from the "test" routine, not 80.
> 
> This example strikes me as strange.  Why do you expect the size of a 
> pointer to the base class subobject to give a meaningful answer, any 
> more than taking the address of a data member?

Well, it's a test case simplified from a real-world example.  To quote
the original mail at the start of this thread:

>In the real-world application this test case was extracted from,
>this causes a call:
>
>  poll(myfd, count, 0);  // 1 < count < 10
>
>to fail with a "Buffer overflow detected" message at run-time
>when building with _FORTIFY_SOURCE = 2 against glibc.  [ Here,
>there is no explicit cast, but it is implied by the prototype
>of the "poll" routine. ]
>
>(Note that in the real-world application, the derived struct Pollfd
>has some member functions to construct and pretty-print the structure,
>but has no additional data members.)

(https://gcc.gnu.org/ml/gcc/2014-06/msg00116.html)

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Issue with sub-object __builtin_object_size

2014-09-16 Thread Ulrich Weigand
Jason Merrill wrote:
> On 09/15/2014 11:55 AM, Ulrich Weigand wrote:
> > (https://gcc.gnu.org/ml/gcc/2014-06/msg00116.html)
> 
> > From the __builtin_object_size documentation, it's not immediately
> > clear to me whether this is supposed to work or not:
> >
> >If the least significant
> >bit is clear, objects are whole variables, if it is set, a closest
> >surrounding subobject is considered the object a pointer points to.
> >
> > Is the presence of the above cast (explicit or implicit) supposed to
> > modify the notion of "closest surrounding subobject"?
> 
> IMO, yes.  A base class subobject is a subobject, so since the least 
> significant bit is set, it seems to me that GCC is correctly returning 
> the size of that subobject.

I guess I'm still a bit confused about the special handling of the array
case.  Even with the last bit set, array elements normally do not count
as "subobjects", so __builtin_object_size still returns the size of the
full array.

Now in this case, we cast a pointer to the array to a pointer to a base
type of the array element type -- but the intent is for the pointer to still
refer to the whole array.  (Of course, this only works if the base type
is actually the same size as the array type.)

Note that with a somewhat equivalent C construct:

struct pollfd
  {
int fd;
short int events;
short int revents;
  };

struct Pollfd
  {
struct pollfd x;
  };

struct Pollfd myfd[10];

we still get an object size of 80 for either:

  __builtin_object_size ((struct pollfd *)myfd, 1);

or even

  __builtin_object_size (&myfd->x, 1);

so there still seems to be something special in the C++ case ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Issue with sub-object __builtin_object_size

2014-09-16 Thread Ulrich Weigand
Jason Merrill wrote:
> On 09/16/2014 06:23 AM, Ulrich Weigand wrote:
> > Now in this case, we cast a pointer to the array to a pointer to a base
> > type of the array element type -- but the intent is for the pointer to still
> > refer to the whole array.  (Of course, this only works if the base type
> > is actually the same size as the array type.)
> 
> And I'm arguing that this intent is not well expressed by the code.  :)
> 
> If they want to refer to the whole array, why are they casting the 
> pointer to a different type?  And why are they passing the "subobject 
> only" value as the second argument?

It looks more reasonable in the original source :-)  What we have there
is a definition of struct Pollfd that wraps the standard header file type
struct pollfd and adds a couple of helpers (like constructors that set up
the struct in some standard ways etc.).  Subsequent source code simply
uses struct Pollfd wherever struct pollfd would otherwise be used.

In particular, this occurs in a call to the system "poll" routine:

  struct Pollfd myfd[10];
[...]
  poll(myfd, count, 0);  // 1 < count < 10

Now, that routine comes from glibc header files, and if _FORTIFY_SOURCE
is defined, it becomes an inline function that resolves after inlining
and constant folding in this instance to something like:

  __poll_chk ((struct pollfd *)myfd, count, 0,
  __builtin_object_size ((struct pollfd *)myfd, 1));

The __poll_chk routine then performs a runtime check that the value of
"count" is within range of the object size passed in as fourth argument.

Note that the cast is actually implicit, it simply occurs because the
parameter of the inlined "poll" function is "struct pollfd *", but the
caller passes in a "struct Pollfd *".

The "subobject only" value comes from glibc headers, where they use it
deliberately: poll is supposed to get an array of pollfd structures,
and it is OK to access those; but if that array is embedded within a
larger structure, we want to avoid overruns into the rest of the
structure.

Is this "wrapping" of system types in a derived calls supposed to work?
If there is a better way to express this without breaking glibc headers,
I'd be happy to let the authors know ...

> > Note that with a somewhat equivalent C construct:
> >
> > struct pollfd
> >{
> >  int fd;
> >  short int events;
> >  short int revents;
> >};
> >
> > struct Pollfd
> >{
> >  struct pollfd x;
> >};
> >
> > struct Pollfd myfd[10];
> >
> > we still get an object size of 80 for either:
> >
> >__builtin_object_size ((struct pollfd *)myfd, 1);
> >
> > or even
> >
> >__builtin_object_size (&myfd->x, 1);
> 
> That strikes me as a bug, especially the second one.

Jakub, any thoughts on this?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Issue with sub-object __builtin_object_size

2014-09-17 Thread Ulrich Weigand
Jason Merrill wrote:

> Well, __builtin_object_size seems to be pretty fragile magic.  To make 
> it work I guess you could add a wrapper for poll that takes a Pollfd and 
> does a reinterpret_cast, i.e.
> 
> inline int poll(Pollfd* p, nfds_t n, int t)
> {
>return poll(reinterpret_cast(p), n, t);
> }
> 
> so that you force the conversion to ignore the inheritance relationship 
> and thereby avoid the COMPONENT_REF.

That does indeed seem to work for my test case, thanks!

Given that the _FORTIFY_SOURCE documentation already states that:

  With _FORTIFY_SOURCE set to 2 some more checking is added,
  but some conforming  programs  might fail. 

I guess having to use a workaround like the above is good enough.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[RFC] GCC vector extension: binary operators vs. differing signedness

2014-12-10 Thread Ulrich Weigand
Hello,

we've noticed the following behavior of the GCC vector extension, and were
wondering whether this is actually intentional:

When you use binary operators on two vectors, GCC will accept not only operands
that use the same vector type, but also operands whose types only differ in
signedness of the vector element type.  The result type of such an operation
(in C) appears to be the type of the first operand of the binary operator.

For example, the following test case compiles:

typedef signed int vector_signed_int __attribute__ ((vector_size (16)));
typedef unsigned int vector_unsigned_int __attribute__ ((vector_size (16)));

vector_unsigned_int test (vector_unsigned_int x, vector_signed_int y)
{
  return x + y;
}

However, this variant

vector_unsigned_int test1 (vector_unsigned_int x, vector_signed_int y)
{
  return y + x;
}

fails to build:

xxx.c: In function 'test1':
xxx.c:12:3: note: use -flax-vector-conversions to permit conversions between 
vectors with differing element types or numbers of subparts
   return y + x;
   ^
xxx.c:12:10: error: incompatible types when returning type 'vector_signed_int 
{aka __vector(4) int}' but 'vector_unsigned_int {aka __vector(4) unsigned int}' 
was expected
   return y + x;
  ^

Given a commutative operator, this behavior seems surprising.


Note that for C++, the behavior is apparently different: both test
and test1 above compile as C++ code, but this version:

vector_signed_int test2 (vector_unsigned_int x, vector_signed_int y)
{
  return y + x;
}

which builds on C, fails on C++ with:

xxx.C:17:14: note: use -flax-vector-conversions to permit conversions between 
vectors with differing element types or numbers of subparts
   return y + x;
  ^
xxx.C:17:14: error: cannot convert 'vector_unsigned_int {aka __vector(4) 
unsigned int}' to 'vector_signed_int {aka __vector(4) int}' in return

This C vs. C++ mismatch likewise seems surprising.


Now, the manual page for the GCC vector extension says:

You cannot operate between vectors of different lengths or different signedness 
without a cast.

And the change log of GCC 4.3, where the strict vector type checks (and the
above-mentioned -flax-vector-conversions option) were introduced, says:

Implicit conversions between generic vector types are now only permitted 
when the two vectors in question have the same number of elements and 
compatible element types. (Note that the restriction involves compatible 
element types, not implicitly-convertible element types: thus, a vector type 
with element type int may not be implicitly converted to a vector type with 
element type unsigned int.) This restriction, which is in line with 
specifications for SIMD architectures such as AltiVec, may be relaxed using the 
flag -flax-vector-conversions. This flag is intended only as a compatibility 
measure and should not be used for new code. 

Both of these statements appear to imply (as far as I can tell) that all
the functions above ought to be rejected (unless -flax-vector-conversions).

So at the very least, we should bring the documentation in line with the
actual behavior.  However, as seen above, that actual behavior is probably
not really useful in any case, at least in C.


So I'm wondering whether we should:

A. Bring C in line with C++ by making the result of a vector binary operator
   use the unsigned type if the two input types differ in signedness?

and/or

B. Enforce that both operands to a vector binary operator must have the same
   type (except for opaque vector types) unless -flax-vector-conversions?


Thanks,
Ulrich


PS: FYI some prior discussion of related issues that I found:

https://gcc.gnu.org/ml/gcc/2006-10/msg00235.html
https://gcc.gnu.org/ml/gcc/2006-10/msg00682.html
https://gcc.gnu.org/ml/gcc-patches/2006-11/msg00926.html

https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01634.html
https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00450.html


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [RFC] GCC vector extension: binary operators vs. differing signedness

2014-12-11 Thread Ulrich Weigand
Richard Biener wrote:
> On Wed, Dec 10, 2014 at 10:09 PM, Ulrich Weigand  wrote:
> > So at the very least, we should bring the documentation in line with the
> > actual behavior.  However, as seen above, that actual behavior is probably
> > not really useful in any case, at least in C.
> >
> >
> > So I'm wondering whether we should:
> >
> > A. Bring C in line with C++ by making the result of a vector binary operator
> >use the unsigned type if the two input types differ in signedness?
> >
> > and/or
> >
> > B. Enforce that both operands to a vector binary operator must have the same
> >type (except for opaque vector types) unless -flax-vector-conversions?
> 
> I suppose the current behavior comes from the idea that C would
> apply promotion rules to the operands of binary operators.  To the extent
> we can apply the same rules to vector operands we should probably
> support that.  This means that for example
> 
>  1) promotions that only differ in signedness happen (partly the C frontend
> behavior)
>  2) promotions that change the size of the elements and thus the size of
> the vector happen (probably not a good idea by default)
>  3) promotions that change the kind of the elements of the vectors
> (float vs. non-float) happen (probably neither a good idea)

Just to clarify: your 1) is pretty much the same as my A., right?

The difference in behavior between C and C++ seems to originate in different
implementations in c_common_type vs. cp_common_type:

C has:

  /* If one type is a vector type, return that type.  (How the usual
 arithmetic conversions apply to the vector types extension is not
 precisely specified.)  */
  if (code1 == VECTOR_TYPE)
return t1;

  if (code2 == VECTOR_TYPE)
return t2;

while C++ has:

  if (code1 == VECTOR_TYPE)
{
  /* When we get here we should have two vectors of the same size.
 Just prefer the unsigned one if present.  */
  if (TYPE_UNSIGNED (t1))
return build_type_attribute_variant (t1, attributes);
  else
return build_type_attribute_variant (t2, attributes);
}


> I think it's sensible to support 1) by default (also to not regress existing
> code) but fix it so it applies to C++ as well and accepts both cases
> in C.  And of course fix documentation.

I'm not sure I understand exactly what you mean here.  C++ already implements
the sign promotion rule; just C doesn't.  So we could do the same for C as is
currently done for C++.

However, if we make that change, there will be some cases that regress: the
problem is that an expression "x + y" has *one* result type, and some things
you do with the result will require that type to match precisely (including
signedness).  So *any* change that affects what that result type is will
regress some code that happened to rely on the precise result type ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [RFC] GCC vector extension: binary operators vs. differing signedness

2014-12-12 Thread Ulrich Weigand
Richard Biener wrote:
> On Thu, Dec 11, 2014 at 4:04 PM, Ulrich Weigand  wrote:
> > However, if we make that change, there will be some cases that regress: the
> > problem is that an expression "x + y" has *one* result type, and some things
> > you do with the result will require that type to match precisely (including
> > signedness).  So *any* change that affects what that result type is will
> > regress some code that happened to rely on the precise result type ...
> 
> True, but IMHO that's still better.  You may want to check the openCL spec
> which we tried to follow losely as to what we allow.
> 
> So again, implementing your A is ok with me.

Well, the openCL spec says that operations between signed and unsigned
vectors are simply prohibited (both openCL 1.2 and openCL 2.0 agree on
this, and it matches the behavior of my old openCL compiler ...):

6.1.2 Implicit Conversions:
Implicit conversions between built-in vector data types are disallowed.

6.2.6 Usual Arithmetic Conversions:
If the operands are of more than one vector type, then an error shall
occur. Implicit conversions between vector types are not permitted,
per section 6.2.1.

6.3 Operators:

All arithmetic operators return result of the same built-in type
(integer or floating-point) as the type of the operands, after operand
type conversion. After conversion, the following cases are valid:
- The two operands are scalars. [...]
- One operand is a scalar, and the other is a vector. [...]
- The two operands are vectors of the same type. [...]
All other cases of implicit conversions are illegal.

xlcl error message:
1506-068 (S) Operation between types "__private uint4" and "__private int4" is 
not allowed.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [RFC] GCC vector extension: binary operators vs. differing signedness

2014-12-16 Thread Ulrich Weigand
Richard Biener wrote:
> On Fri, Dec 12, 2014 at 1:08 PM, Ulrich Weigand  wrote:
> > Richard Biener wrote:
> >> On Thu, Dec 11, 2014 at 4:04 PM, Ulrich Weigand  
> >> wrote:
> >> > However, if we make that change, there will be some cases that regress: 
> >> > the
> >> > problem is that an expression "x + y" has *one* result type, and some 
> >> > things
> >> > you do with the result will require that type to match precisely 
> >> > (including
> >> > signedness).  So *any* change that affects what that result type is will
> >> > regress some code that happened to rely on the precise result type ...
> >>
> >> True, but IMHO that's still better.  You may want to check the openCL spec
> >> which we tried to follow losely as to what we allow.
> >>
> >> So again, implementing your A is ok with me.
> >
> > Well, the openCL spec says that operations between signed and unsigned
> > vectors are simply prohibited (both openCL 1.2 and openCL 2.0 agree on
> > this, and it matches the behavior of my old openCL compiler ...):
[snip]
> The question is what the fallout is if we reject this by default (I suppose
> we accept it with -flax-vector-conversions).  I'm ok with following
> OpenCL as well,
> that is either solution that makes behavior consistent between C and C++.

I agree that we should certainly continue to support mixed types with
-flax-vector-conversions.  (This means we really should fix the C/C++
inconsistency as to return type in any case, even if we disable mixed
type support by default.)

What the fallout of disabling mixed types by default would be is hard
to say.  On the one hand, all other standards I've looked at (OpenCL,
Altivec/VSX, Cell SPU) prohibit mixing signed/unsigned types.  So this
hopefully means users of the GCC extension don't use this feature (much).
[Of course, Altivec does not really talk about operators, but ideally
GCC's a + b should be equivalent to the Altivec vec_add (a, b), which
does not support mixing signed/unsigned types.]

On the other hand, I've noticed at least two related areas where disabling
mixed types could result in unexpected fallout: opaque types and platform
specific vector types (e.g. "vector bool" on Altivec).

Opaque types are a somewhat under-specified GCC feature that is used for
different purposes by various platforms and the middle-end itself:
- some platform-specific types (PPC SPE __ev64_opaque__ or MEP cp_vector)
- function parameters for overloaded builtins in C before resolution
- the output of vector comparison operators and vector truth_type_for

It should be possible to use an opaque type together with vectors of
different types, even with -flax-vector-conversions (and even if we
disable support for signed/unsigned mixed types); the result of an
operation on an opaque type and a non-opaque type should be the
non-opaque type; it's not quite clear to me how operations on two
different opaque types are (should be?) defined.

Platform-specific types like Altivec "vector bool" are not really known
to the middle-end; this particular case is treated just like a normal
unsigned integer vector type.  This means that as a side-effect of
disabling signed/unsigned mixed types, we would also disallow mixing
signed/bool types.  But at least for Altivec vec_add, the latter is
explicitly *allowed* (returning the signed type).  It would certainly
be preferable for + to be compatible to vec_add also for the bool types.
[ Note that the current implementation also does not fully match that
goal, because while signed + bool is allowed, the return value is
sometimes the bool type instead of the signed type.  ]

This can probably only be fully solved by making the middle-end aware
that things like "vector bool" need to be handled specially.  I had
thought that maybe the back-end simply needs to mark "vector bool"
as an opaque type (after all, the middle-end also uses this for
vector truth types), but that doesn't work as-is, since one of the
other features of opaque types is that they cannot be initialized.
(Altivec vector bool *can* be initialized.)  Maybe those two features
should be decoupled, so we can have opaque types used as truth types,
and those that cannot be initialized ...


So overall the list of actions probably looks like this:

1. Fix the selection of the common type to use for binary vector operations
   - C and C++ need to be consistent
   - If one type is opaque and the other isn't, use the non-opaque type
   - If one type is unsigned and the other is signed, use the unsigned type
   - What to do with different types where neither of the above rules apply?

2. Rework support for opaque and platform-specific vector types
   - Instead of the single "opaqu

Failure to dlopen libgomp due to static TLS data

2015-02-12 Thread Ulrich Weigand
a area in such cases), then we wouldn't have to init
the DTV in init_one_static_tls, and then we could do without the
dl_open_worker check.  Does this sound reasonable?

Bye,
Ulrich

P.S.: Appended is a small test case that shows the issue.  Note that
just two libraries using TLS suffice to trigger the problem, because
module IDs are not even reliably re-used after a dlclose ...

Makefile


all: module1.so module2.so main

clean:
rm -f module.so module1.so module2.so main

module1.so: module.c
gcc -g -Wall -DMODULE=1 -fpic -shared -o module1.so module.c

module2.so: module.c
gcc -g -Wall -DMODULE=2 -fpic -shared -o module2.so module.c

main: main.c
gcc -g -Wall -D_GNU_SOURCE -o main main.c -ldl -lpthread

main.c
==

#include 
#include 
#include 
#include 

pthread_t thread_id;

void *thread_start (void *arg)
{
  printf ("Thread started\n");
  for (;;)
;
}

void run_thread (void)
{
  pthread_create(&thread_id, NULL, &thread_start, NULL);
}

void *test (const char *name)
{
  void *handle, *func;
  size_t modid;

  handle = dlopen (name, RTLD_NOW);
  if (!handle)
{
  printf ("Cannot open %s\n", name);
  exit (1);
}

  func = dlsym (handle, "func");
  if (!func)
{
  printf ("Cannot find func\n");
  exit (1);
}

  ((void (*)(void))func)();

  if (dlinfo(handle, RTLD_DI_TLS_MODID, &modid))
{
  printf ("Cannot find TLS module ID\n");
  exit (1);
}

  printf ("Module ID: %ld\n", (long) modid);

  return handle;
}

int main (void)
{
  void *m1, *m2;
  int i;

  run_thread ();

  m1 = test ("./module1.so");
  m2 = test ("./module2.so");

  for (i = 0; i < 100; i++)
{
  dlclose (m1);
  m1 = test ("./module1.so");
  dlclose (m2);
  m2 = test ("./module2.so");
}

  dlclose (m1);
  dlclose (m2);
  return 0;
}


module.c


#include 

__thread int x __attribute__ ((tls_model ("initial-exec")));

void func (void)
{
  printf ("Module %d TLS variable is: %d\n", MODULE, x);
}


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Failure to dlopen libgomp due to static TLS data

2015-02-12 Thread Ulrich Weigand
Alexander Monakov wrote:
> 
> There's a pending patch for glibc that addresses this issue among others:
> https://sourceware.org/ml/libc-alpha/2014-11/msg00469.html
> 
> ([BZ#17090/17620/17621]: fix DTV race, assert, and DTV_SURPLUS Static TLS
> limit)

Ah, indeed, that would fix the issue!  Thanks for pointing this out.

I see that the latest revision:
https://sourceware.org/ml/libc-alpha/2014-11/msg00590.html
has been pinged a couple of times already, so let me add another ping :-)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[RFC/RFA] Obsolete Cell Broadband Engine SPU targets

2019-04-02 Thread Ulrich Weigand
Hello,

the spu-elf target in GCC supports generating code for the SPU processors
of the Cell Broadband Engine; it has been part of upstream GCC since 2008.

However, at this point I believe this target is no longer in use:
- There is no supported Cell/B.E. hardware any more.
- There is no supported operating system supporting Cell/B.E. any more.

I've still been running daily regression tests until now, but I'll be
unable to continue to do so much longer since the systems I've been
using for this will go away.

Rather than leave SPU support untested/maintained, I'd therefore
propose to declare all SPU targets obsolete in GCC 9 and remove
the code with GCC 10.

Any objections to this approach?

Bye,
Ulrich


gcc/ChangeLog:

* config.gcc: Mark spu* targets as deprecated/obsolete.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 270076)
+++ gcc/config.gcc  (working copy)
@@ -248,6 +248,7 @@ md_file=
 # Obsolete configurations.
 case ${target} in
   *-*-solaris2.10* \
+  | spu*-*-*   \
   | tile*-*-*  \
  )
 if test "x$enable_obsolete" != xyes; then
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [RFC/RFA] Obsolete Cell Broadband Engine SPU targets

2019-04-02 Thread Ulrich Weigand
Thanks, Richi and Jeff!

>   * config.gcc: Mark spu* targets as deprecated/obsolete.

I've now checked this in.

I've also checked in the following patch to announce the change
in htdocs.

Bye,
Ulrich

Index: htdocs/gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.56
diff -u -r1.56 changes.html
--- htdocs/gcc-9/changes.html   1 Apr 2019 14:55:53 -   1.56
+++ htdocs/gcc-9/changes.html   2 Apr 2019 13:45:27 -
@@ -54,6 +54,11 @@
   in the https://gcc.gnu.org/ml/gcc/2018-10/msg00139.html";>
   announcement.
 
+
+  Cell Broadband Engine SPU (spu*-*-*).  Details can be 
found
+  in the https://gcc.gnu.org/ml/gcc/2019-04/msg00023.html";>
+  announcement.
+    
   
 
 

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [RFC/RFA] Obsolete Cell Broadband Engine SPU targets

2019-04-02 Thread Ulrich Weigand
Eric Gallagher wrote:
> On 4/2/19, Ulrich Weigand  wrote:
> > Hello,
> >
> > the spu-elf target in GCC supports generating code for the SPU processors
> > of the Cell Broadband Engine; it has been part of upstream GCC since 2008.
> >
> > However, at this point I believe this target is no longer in use:
> > - There is no supported Cell/B.E. hardware any more.
> 
> Wait, SPU includes the Playstation 3, right? I'm pretty sure there are
> still plenty of PS3s in use out there... AFAIK my university (GWU) is
> still running its PS3 supercomputer cluster... (I'd have to check to
> make sure...)

GCC has only ever supported Linux targets on Cell/B.E.  I understand
Sony has removed Linux support for PS3 a long time ago.  (And in fact,
I believe support for PS3 in general has run out this year.)

Of course, there may be some people out there still running PS3 with
old firmware that runs Linux.  But they will then have to also use
older GCC versions, sorry.

(Unless, of course, somebody steps up and volunteers to take over
the maintainance of the target.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: -fsanitize=thread support on ppc64

2017-01-23 Thread Ulrich Weigand
Bill Schmidt wrote:
> On Jan 23, 2017, at 8:32 AM, Jakub Jelinek  wrote:
> >
> > Another question is, it seems upstream has s390{,x}-*-linux* support for
> > asan/ubsan, does that work?  In that case we should add it to configure.tgt
> > too (similarly to the sparc*-*-linux* entry).
> 
> CCing Uli for the s390 question.

Asan support was added just recently to LLVM for s390x-linux.  However,
I'm not sure it will work out-of-the-box on GCC, since we haven't done any
back-end work to enable it.  Also, LLVM is 64-bit only, so there probably
would have to be extra work in the libraries to enable 31-bit mode.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: Register conflicts between output memory address and output register

2018-04-19 Thread Ulrich Weigand
Matthew Fortune wrote:

> If however the address of my_mem is lowered after IRA i.e. when validating
> constraints in LRA then IRA has nothing to do as the address is just a
> symbol_ref. When LRA resolves the constraint for the address it introduces
> a register for the output memory address but does not seem to acknowledge
> any conflict with the output register (bar) it can therefore end up
> using the same register for the output memory address as the output registe=
> r.
> This leads to the obvious problem if the ASM updates %1 before %0 as it will
> corrupt the address.
> 
> This can of course be worked around by making (bar) an early clobber or an
> in/out but this does seem unnecessary.
> 
> The question is... Should LRA recognise a conflict between the registers
> involved in the address portion of an output memory operand and any output
> register operands or is this a case where you strictly have to use early
> clobber.

Usually, the constraints are interpreted under the assumption that inputs
and output are used atomically, as if by a single target instruction.

If you consider an instruction that both updates memory and sets a register,
then most architectures will indeed allow for the output register to be the
same as one of the address registers for the memory operand, and therefore
GCC should allow this allocation.

In those cases where this is not true, e.g. because your inline asm does
in fact use multiple instructions which set the two outputs at different
times, then you should indeed use the earlyclobber flag -- that is
exactly what this flag is intended for.

The GCC documentation of the earlyclobber flag says:

  Means (in a particular alternative) that this operand is an
  earlyclobber operand, which is written before the instruction is
  finished using the input operands.  Therefore, this operand may not lie
  in a register that is read by the instruction or as part of any memory
  address.

Note in particular "... as part of any memory address."

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Wrong-code bug in execute_update_addresses_taken?

2011-02-05 Thread Ulrich Weigand
Hello,

the following program seems to be miscompiled at -O or higher:

int
main (void)
{
  int data = 1;

  struct ptr
{
  int val;
} *ptr = (struct ptr *) &data;

  ptr->val = 0;

  return data;
}

This program should return 0, but actually returns 1.

[ As far as I can tell, this program does not violate the aliasing
  rules; all memory accesses refer to an object of effective (because
  declared) type "int", and use lvalue references of type "int".
  The pointer cast might depend on implementation-defined behaviour,
  but is not undefined.  In any case, there are no warnings, and use
  of -fno-strict-aliasing does not fix the problem.  ]

In 024t.forwprop1, we still have (correctly):

  # .MEMD.3455_4 = VDEF <.MEMD.3455_3(D)>
  dataD.1975 = 1;

  # .MEMD.3455_5 = VDEF <.MEMD.3455_4>
  MEM[(struct ptr *)&dataD.1975].valD.1977 = 0;

  # VUSE <.MEMD.3455_5>
  D.3453_2 = dataD.1975;
  return D.3453_2;

but then in 025t.ealias, we get:

  dataD.1975_4 = 1;

  # .MEMD.3455_5 = VDEF <.MEMD.3455_3(D)>
  MEM[(struct ptr *)&dataD.1975].valD.1977 = 0;

  D.3453_2 = dataD.1975_4;
  return D.3453_2;

This is apparently caused by the "addresses taken" pass,
which results in those messages:

  No longer having address taken: data

  Registering new PHI nodes in block #0
  Registering new PHI nodes in block #2

  Updating SSA information for statement data = 1;
  Updating SSA information for statement D.3453_2 = data;

It seems that for some reason, the pass claims to be able to
eliminate all statements that take the address of "data",
but then leaves the MEM reference unchanged ... 

Any suggestions on what might cause this problem?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: Wrong-code bug in execute_update_addresses_taken?

2011-02-06 Thread Ulrich Weigand
Richard Guenther wrote:
> A bug?  Can you file a bugreport and CC me?  I'll look into the
> problem.

Sure, this is now PR tree-optimization/47621.

Thanks for looking into it!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: libgcc: strange optimization

2011-08-03 Thread Ulrich Weigand
Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor  wrote:
> > Richard Guenther  writes:
> >> I suggest to amend the documentation for local call-clobbered register
> >> variables to say that the only valid sequence using them is from a
> >> non-inlinable function that contains only direct initializations of the
> >> register variables from constants or parameters.
> >
> > Let's just implement those requirements in the compiler itself.
> 
> Doesn't work for existing code, no?  And if thinking new code then
> I'd rather have explicit dependences (and a way to represent them).
> Thus, for example
> 
> asm ("scall" : : "asm("r0")" (10), ...)
> 
> thus, why force new constraints when we already can figure out
> local register vars by register name?  Why not extend the constraint
> syntax somehow to allow specifying the same effect?

Maybe it would be possible to implement this while keeping the syntax
of existing code by (re-)defining the semantics of register asm to
basically say that:

 If a variable X is declared as register asm for register Y, and X
 is later on used as operand to an inline asm, the register allocator
 will choose register Y to hold that asm operand.  (And this is the
 full specification of register asm semantics, nothing beyond this
 is guaranteed.)

It seems this semantics could be implemented very early on, probably
in the frontend itself.  The frontend would mark the *asm* statement
as using the specified register (there would be no special handling
of the *variable* as such, after the frontend is done).  The optimizers
would then simply be required to pass the asm-statement register
annotations though, much like today they pass constraints through.
At the point where register allocation decisions are made, those
register annotations would then be acted on.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] rejects-valid: error: initializer element is not computable at load time

2011-08-04 Thread Ulrich Weigand
Georg-Johann Lay wrote:

> For the following 1-liner I get an error with current trunk r177267:
> 
> const __pgm char * pallo = "pallo";
> 
> __pgm as a named address space qualifier.
[snip]
> Moreover, if a user writes a line like
>const __pgm char * pallo = "pallo";
> he wants the string literal to be placed in the non-default address
> space.  There is no other way to express that except something like
>const __pgm char * pallo = (const __pgm char*) "pallo";
> which doesn't work either and triggers
> 
> pgm.c:1:1: error: initializer element is not constant
> 
> with a tree dump similar to above.

This is pretty much working as expected.  "pallo" is a string literal
which (always) resides in the default address space.  According to the
named address space specification (TR 18037) there are no string literals
in non-default address spaces ...

The assignment above would therefore need to convert a pointer to the
string literal in the default space to a pointer to the __pgm address
space.  This might be impossible (depending on whether __pgm encloses
the generic space), and even if it is possible, it is not guaranteed
that the conversion can be done as a constant expression at compile time.

What I'd do to get a string placed into the __pgm space is to explicitly
initialize an *array* in __pgm space, e.g. like so:

const __pgm char pallo[] = "pallo";

This is defined to work according to TR 18037, and it does actually
work for me on spu-elf.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-04 Thread Ulrich Weigand
DJ Delorie wrote:
> > The only target supporting named address spaces today is spu-elf,
> 
> m32c-elf does too.

Huh, I totally missed that, sorry ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] rejects-valid: error: initializer element is not computable at load time

2011-08-05 Thread Ulrich Weigand
Georg-Johann Lay wrote:
> Ulrich Weigand wrote:
> > This is pretty much working as expected.  "pallo" is a string literal
> > which (always) resides in the default address space.  According to the
> > named address space specification (TR 18037) there are no string literals
> > in non-default address spaces ...
> 
> The intension of TR 18037 to supply "Extension to Support Embedded Systems"
> and these are systems where every byte counts -- and it counts *where* the
> byte will be placed.
> 
> Basically named AS provide something like target specific qualifiers, and
> if GCC, maybe under the umbrella of GNU-C, would actually implement a feature
> like target specific qualifiers, that would be a great gain and much more
> appreciated than -- users will perceive it that way -- being more catholic
> than the pope ;-)

The problem with all language extensions is that you really need to be careful
that the new feature you want to add is fully specified, in all its potential
interactions with the rest of the (existing) language features.  If you don't,
some of those ambiguities are certain to be causing you problems later on --
in fact, that's just what has happened time and again with GCC extensions
that were added early on ...  This is why these days, extensions usually are
accepted only if they *are* fully specified (which usually means providing
a "diff" to the C standard text that would add the feature to the standard).

This is a non-trivial task.  One of the reasons why we decided to follow the
TR 18037 spec when implementing the __ea extension for SPU is that this task
had already been done for us.  If you want to deviate from that existing spec,
you're back to doing this work yourself.

> For example, you can have any combination of qualifiers like const, restrict
> or volatile, but it is not possible for named AS.  That's clear as long as
> named AS is as strict as TR 18037.  However, users want features to write
> down their code an a comfortable, type-safe way and not as it is at the 
> moment,
> i.e. by means of dreaded inline assembler macros (for my specific case).

A named AS qualifier *can* be combined with other qualifiers like const.
It cannot be combined with *other* named AS qualifiers, because that doesn't
make sense in the semantics underlying the address space concept of TR 18037.
What would you expect a combination of two AS qualifiers to mean?

> > The assignment above would therefore need to convert a pointer to the
> > string literal in the default space to a pointer to the __pgm address
> > space.  This might be impossible (depending on whether __pgm encloses
> > the generic space), and even if it is possible, it is not guaranteed
> > that the conversion can be done as a constant expression at compile time.
> 
> The backend can tell. It likes to implement features to help users.
> It knows about the semantic and if it's legal or not.
> 
> And even if it's all strict under TR 18037, the resulting error messages
> are *really* confusing to users because to them, a string literal's address
> is known.

It would be possible to the extend named AS implementation to allow AS pointer
conversions in initializers in those cases where the back-end knows this can
be done at load time.  (Since this is all implementation-defined anyway, it
would cause no issues with the standard.  We simply didn't do it because on
the SPU, it is not easily possible.)

Of course, that still wouldn't place the string literal into the non-generic
address space, it just would convert its address.

> > What I'd do to get a string placed into the __pgm space is to explicitly
> > initialize an *array* in __pgm space, e.g. like so:
> > 
> > const __pgm char pallo[] = "pallo";
> > 
> > This is defined to work according to TR 18037, and it does actually
> > work for me on spu-elf.
> 
> Ya, but it different to the line above.

Sure, because it allocates only the string data, and not in addition a
pointer to it as your code did ...

> Was just starting with the work and
> it worked some time ago, so I wondered.

I think some time in the past, there was a bug where initalizers like in
you original line were silently accepted but then incorrect code was
generated (i.e. the pointer would just be initialized to an address in
the generic address space, without any conversion).

> And I must admit I am not familiar
> will all the dreaded restriction TR 18037 imposes to render it less 
> functional :-(

It's not a restriction so much, it's simply that TR 18037 does not say anything
about string literals at all, so they keep working as they do in standard C.

> Do you think a feature like "target specific qualifier" would be 

Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
Georg-Johann Lay wrote:
> Ulrich Weigand wrote:
> > I'd be happy to bring this up to date if you're willing to work with
> > me to get this tested on a target that needs this support ...
> 
> Just attached a patch to bugzilla because an AVR user wanted to play
> with the AS support and asked me to supply my changes. It's still in
> a mess but you could get a more reasonable base than on a target where
> all named addresses vanish at expand.
> 
> The patch is fresh and attached to the enhancement PR49868, just BE stuff.
> There is also some sample code.

OK, I'll have a look.

Looking at your definition of the routines avr_addr_space_subset_p and
avr_addr_space_convert, they appear to imply that any generic address
can be used without conversion as a __pgm address and vice versa.

That is: avr_addr_space_subset_p says that __pgm is both a superset
and a subset of the generic address space, so they must be co-extensive.
avr_addr_space_convert then says that the addresses can be converted
between the two without changing their value.

Is this really true?  If so, why have a distinct __pgm address space
in the first place?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
Georg-Johann Lay wrote:

> AVR hardware has basically three address spaces:
[snip]

OK, thanks for the information!

> Of course, RAM and Flash are no subsets of each other when regarded as
> physical memory, but they are subsets when regarded as numbers. This
> lead to my mistake to define RAM and Flash being no subsets of each other:
>http://gcc.gnu.org/ml/gcc/2010-11/msg00170.html

Right, in your situation those are *not* subsets according to the AS rules,
so your avr_addr_space_subset_p routine needs to always return false
(which of course implies you don't need a avr_addr_space_convert routine).

Getting back to the discussion in the other thread, this also means that
pointer conversions during initialization cannot happen either, so this
discussion is basically moot.

> However, there are situations like the following where you like to take
> the decision at runtime:
> 
> char cast_3 (char in_pgm, void * p)
> {
>  return in_pgm ? (*((char __pgm *) p)) : (*((char *) p));
> }

That's really an abuse of "void *" ... if you have an address in the
Flash space, you should never assign it to a "void *".

Instead, if you just have a address and you don't know ahead of time
whether it refers to Flash or RAM space, you ought to hold that number
in an "int" (or "short" or whatever integer type is most appropriate),
and then convert from that integer type to either a "char *" or a
"char __pgm *".

> Linearizing the address space at compiler level is not wanted because 
> that lead to bulky, slow code and reduced the effective address space
> available for Flash which might be up to 64kWords.

I guess to simplify things like the above, you might have a third
address space (e.g. "__far") that is a superset of both the default
address space and the __pgm address space.  Pointers in the __far
address space might be e.g. 3 bytes wide, with the low 2 bytes
holding the address and the high byte identifying whether the address
is in Flash or RAM.

Then a plain dereference of a __far pointer would do the equivalent
of your cast_3 routine above.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
Michael Matz wrote:
> On Fri, 5 Aug 2011, Ulrich Weigand wrote:
> > Instead, if you just have a address and you don't know ahead of time
> > whether it refers to Flash or RAM space, you ought to hold that number
> > in an "int" (or "short" or whatever integer type is most appropriate),
> > and then convert from that integer type to either a "char *" or a
> > "char __pgm *".
> 
> That would leave standard C.  You aren't allowed to construct pointers out 
> of random integers.

C leaves integer-to-pointer conversion *implementation-defined*,
not undefined, and GCC has always chosen to implement this by
(usually) keeping the value unchanged:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Arrays-and-pointers-implementation.html
This works both for default and non-default address spaces.

Of course, my suggested implementation would therefore rely on
implementation-defined behaviour (but by simply using the __pgm
address space, it does so anyway).

> That would point to a third address space, call it "undef" :)  It would be 
> superset of default and pgm, conversions between undef to {default,pgm} 
> are allowed freely (and value preserving, i.e. trivial).

That would probably violate the named AS specification, since two different
entities in the undef space would share the same pointer value ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-05 Thread Ulrich Weigand
DJ Delorie wrote:

> Was this reproducible for m32c also?  I can test it if so...

The patch simply passes the destination address space through
to MODE_CODE_BASE_REG_CLASS and REGNO_MODE_CODE_OK_FOR_BASE_P,
to allow targets to make register allocation decisions based
on address space.

As long as m32c doesn't implement those, just applying the
patch wouldn't change anything.  But if that capability
*would* be helpful on your target, it would certainly be
good if you could try it out ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint)

2011-08-08 Thread Ulrich Weigand
Uros Bizjak wrote:

> Although, it would be nice for reload to subsequently fix CSE'd
> non-offsetable memory by copying address to temporary reg (*as said in
> the documentation*), we could simply require an XMM temporary for
> TImode reloads to/from integer registers, and this fixes ICE for x32.

Moves are special as far as reload is concerned.  If there is already
a move instruction present *before* reload, it will get fixed up
according to its constraints as any other instruction.

However, reload will *introduce* new moves as part of its operation,
and those will *not* themselves get reloaded.  Instead, reload simply
assumes that every plain move will just succeed without requiring
any reload; if this is not true, the target *must* provide a
secondary reload for this move.

(Note that the secondary reload could also work by reloading the
target address into a temporary; that's up to the target to
implement.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] rejects-valid: error: initializer element is not computable@load time

2011-08-09 Thread Ulrich Weigand
Georg-Johann Lay wrote:
> Ulrich Weigand schrieb:
> > It's not a restriction so much, it's simply that TR 18037 does not say 
> > anything
> > about string literals at all, so they keep working as they do in standard C.
> 
> Take the following bit more elaborate example where you want to describe 
> a menu:
[snip]
> Thus you have no alternative and have to use menu1_t:
> 
> const char __as STR_YES[] = "Yes";
> const char __as STR_NO[]  = "No";
> const char __as STR_DUNNO[] = "Don't really know and will decide later";
> 
> const menu1_t menu3 =
> {
>  1,
>  {
>  STR_YES,
>  STR_NO,
>  STR_DUNNO
>  }
> };
> 
> With some menues you soon end up with dozens of unnecessary variables 
> just because TR 18037 has a blank area on it's map.

I agree that this is best approach.  You can even avoid the unnecessary
variables by using a compound literal with AS-qualified char array type,
and initialize the AS pointer to its address, like e.g. so:

const __as char *pallo = (const __as char []){"pallo"};

That's still a bit awkward, but that could be hidden via a macro ...

> This means that a line like
> const __as char * pallo = "pallo";
> can be very useful and that there is exactly one meaning that makes 
> sense: put "pallo" in __as because otherwise you won't be able to 
> reference that string.

That's not really true: on some targets, the generic address space
will indeed be a subset of __as, so you *can* assign the address of
a string literal in the generic space to an __as pointer variable.

Even on the SPU, this works when the assignment is not done at
initialization time:

const __ea char *p;
int test (void)
{
  p = "test";
}

This will generate code to convert the address of the literal "test"
in the generic address space into a __ea address and assign it to
the __ea pointer.

> Even more preferred would be to assign the only sensible meaning 
> (provided a target allows it), which should be legal as that extension 
> is implementation defined, anyway.

What is implementation defined is whether conversion of a pointer
to a different address space is allowed at *initialization* time
or not.

What is *not* implementation defined is that string literals are
always in the generic address space -- that's simply a requirement
of the standard.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-09 Thread Ulrich Weigand
Georg-Johann Lay wrote:

> Thanks, it works.

OK, thanks for testing!

>   std Y+2,r31  ;  30  *movphi/3   [length = 7]
>   std Y+1,r30

I'm actually not seeing those (maybe I'm using a different code
base than you were using ...)

But I still see that the frame is created.  The problem is that IRA
thinks it needs to allocate a pseudo on the stack, and creates a
stack slot for it.  But then reload goes and just reloads the
pseudo into a hard register anyway, and simply doesn't need the
stack slot ... but it was already allocated and accounted for
such that get_frame_size () no longer returns 0.

> I frequently see IRA doing a very bad job for small register classes
> like here.  Maybe it's better to take it completely away from IRA
> and write the load as
> 
> (set (reg:HI)
>  (unspec:HI (post_inc:PHI (reg:PHI Z
> 
> Loading from __pgm is actually an unspec, i.e. reading two times from
> the same address will yield the same result.

This really seems to be a problem in IRA somewhere, but I'd guess it
would be better to fix in there instead of working around it.  Maybe
you should open a bug an get in touch with the IRA maintainers ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: libgcc: strange optimization

2011-08-09 Thread Ulrich Weigand
Richard Earnshaw wrote:

> Better still would be to change the specification and implementation of
> local register variables to only guarantee them at the beginning of ASM
> statements.  At other times they are simply the same as other local
> variables.  Now we have a problem that the register allocator knows how
> to solve.

This seems to be pretty much the same as my proposal here:
http://gcc.gnu.org/ml/gcc/2011-08/msg00064.html

But there was some push-back on requiring additional semantics
by some users ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2011-08-15 Thread Ulrich Weigand
Paul Edwards wrote:

> I was surprised that an instruction that is marked as s_operand
> was getting a seemingly non-s_operand given to it, so I added an
> "S" constraint:

That's right.  It is not good to have a constraint that accepts
more than the predicate, since reload will at this point only
consider the constraint.  Adding a more restricted constraint
should be the proper fix for this problem.

> That then gave an actual compiler error instead of generating bad
> code, which is a step forward:
> 
> pdos.c: In function `pdosLoadExe':
> pdos.c:2703: error: unable to generate reloads for:

You'll need to mark your new constraint as EXTRA_MEMORY_CONSTRAINT
so that reload knows what to do when an argument doesn't match.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2011-08-15 Thread Ulrich Weigand
Paul Edwards wrote:

> Unfortunately it's not quite right, seemingly not loading R9 properly:
> 
> LR9,13
> AR9,13
> MVC   0(10,9),0(2)

That's weird.  What does the reload dump (.greg) say?
 
> And it had a knock-on effect too, producing bad code elsewhere:
> 
> <  SLR   2,2
> <  SLR   3,3
> <  ST2,128(13)
> <  ST3,4+128(13)
> <  ST2,136(13)
> <  ST3,4+136(13)
> <  ST2,144(13)
> <  ST3,4+144(13)
> ---
> >  MVC   128(8,13),=F'0'
> >  MVC   136(8,13),=F'0'
> >  MVC   144(8,13),=F'0'
> 
> But I guess that is another can of worms to investigate.

It seems the literal is not marked as being doubleword.  That might
be related to the fact that const_int's do not carry a mode, so you
cannot just look at the literal's mode to determine the required
size, but have to take the full instruction into account ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2011-08-16 Thread Ulrich Weigand
p1, op1 = tem;

  if (op0 != XEXP (in, 0) || op1 != XEXP (in, 1))
in = gen_rtx_PLUS (GET_MODE (in), op0, op1);

  insn = emit_insn (gen_rtx_SET (VOIDmode, out, in));
  code = recog_memoized (insn);

Note how this actually performs the check whether to swap operands
for commutativity.

Can you debug this and find out why this doesn't work in your case?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2011-08-18 Thread Ulrich Weigand
Paul Edwards wrote:

> Hi Ulrich.  I put in the following debug:
> 
>   op0 = find_replacement (&XEXP (in, 0));
>   op1 = find_replacement (&XEXP (in, 1));
> 
>   /* Since constraint checking is strict, commutativity won't be
> checked, so we need to do that here to avoid spurious failure
> if the add instruction is two-address and the second operand
> of the add is the same as the reload reg, which is frequently
> the case.  If the insn would be A = B + A, rearrange it so
> it will be A = A + B as constrain_operands expects.  */
> 
>   fprintf(stderr, "REGNO(out) is %d\n", REGNO(out));
>   fprintf(stderr, " REG in 1 is %d\n", REGNO(XEXP(in,1)));
>   if (GET_CODE (XEXP (in, 1)) == REG
>   && REGNO (out) == REGNO (XEXP (in, 1)))
>   tem = op0, op0 = op1, op1 = tem;
> 
> And it produced this output (for exactly the same code I showed
> you previously):
> 
> C:\devel\pdos\s370>\devel\gcc\gcc\gccmvs -da -DUSE_MEMMGR -Os -DS390 -S -I 
> . -I ../pdpclib pdos.c
> REGNO(out) is 3
> REG in 1 is 32880
> REGNO(out) is 2
> REG in 1 is 32880
> REGNO(out) is 2
> REG in 1 is 32880
> REGNO(out) is 2
> REG in 1 is 112
> REGNO(out) is 3
> REG in 1 is 32880
> REGNO(out) is 4
> REG in 1 is 112
> REGNO(out) is 2
> REG in 1 is 112
> 
> which looks to me like it is not seeing a register, only a constant,
> so cannot perform a swap.

Oops, there's clearly a bug here.  "in" at this point is the original
expression that has not yet been reloaded, so its second operand will
indeed be a constant, not a register.  However, reload has already
decided that this constant will end up being replaced by a register,
and that is what the "find_replacement" call is checking.

So at this point in the program, XEXP (in, 1) will be the constant,
but op1 will be the register it is going to be replaced with.

Unfortunately the test whether to swap looks at XEXP (in, 1) -- it
really needs to look at op1 instead.

Can you try changing the lines

  if (GET_CODE (XEXP (in, 1)) == REG
  && REGNO (out) == REGNO (XEXP (in, 1)))

to

  if (GET_CODE (op1) == REG
  && REGNO (out) == REGNO (op1))

instead?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-18 Thread Ulrich Weigand
Georg-Johann Lay wrote:

> http://gcc.gnu.org/ml/gcc/2011-08/msg00131.html
>
> Are you going to install that patch? Or maybe you already installed it?

No, it isn't approved yet (in fact, it isn't even posted for approval).
Usually, patches that add new target macros, or new arguments to target
macros, but do not actually add any *exploiter* of the new features,
are frowned upon ...

Thus, I'd prefer to wait until you have patch ready that exploits this
in the AVR backend, and then submit the whole series.

> Then, I wonder how the following named AS code translates:
> 
> int a = *((__as int*) 1000);
> 
> As const_int don't have a machmode (yet), I guess the above line just 
> reads from generic AS and reading from a specific address from named AS 
> cannot work.

This should work fine.  Address space processing doesn't really depend
on the machine mode; the address space is annotated to the MEM RTX
directly.  Code like the above ought to generate a MEM with either
an immediate CONST_INT operand or one with the immediate loaded into
a register, depending on what the target supports, but in either case
the MEM_ADDR_SPACE will be set correctly.  It's then up the target to
implement the access as appropriate.

> Moreover, INDEX_REG_CLASS, REGNO_OK_FOR_INDEX_P, HAVE_PRE_INCREMENT et 
> al. and maybe others are AS-dependent.

I agree for INDEX_REG_CLASS and REGNO_OK_FOR_INDEX_P.  In fact, I'd
suggest they should first be converted to look similar to base registers
(i.e. add MODE_CODE_INDEX_REG_CLASS and REGNO_MODE_CODE_OK_FOR_INDEX_P)
and then add address space support to those extended macros, so as to
avoid having to change lots of back-ends.   Do you need this for AVR?
I could add that to the patch I posted previously ...

Now for HAVE_PRE_INCREMENT, I don't think we need anything special.
This is used just as a short-cut to bypass pre-increment handling
completely on targets that don't support it at all.  On targets
that *do*, there will always be additional requirement on just
which memory accesses support pre-increment.  Therefore, the
middle-end will still always check the target's legitimate_address
callback to ensure any particular pre-incremented memory access
is actually valid.  This mechanism can already look at the address
space to make its decision ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2011-08-22 Thread Ulrich Weigand
Paul Edwards wrote:

>   if (operands[1] == const0_rtx)
>   {
> CC_STATUS_INIT;
> mvs_check_page (0, 6, 8);
> return \"MVC%O0(8,%R0),=XL8'00'\";
>   }
>   mvs_check_page (0, 6, 8);
>   return \"MVC%O0(8,%R0),%1\";
> }"
>[(set_attr "length" "8")]
> )
> 
> forces it to use XL8'00' instead of the default F'0' and that
> seems to work.  Does that seem like a proper solution to
> you?

Well, there isn't really anything special about const0_rtx.
*Any* CONST_INT that shows up as second operand to the movdi
pattern must be emitted into an 8 byte literal at this point.

You can do that inline; but the more usual way would be to
define an operand print format that encodes the fact that
a 64-bit operand is requested.

In fact, looking at the i370.h PRINT_OPERAND, there already
seems to be such a format: 'W'.  (Maybe not quite; since 'W'
sign-extends a 32-bit operand to 64-bit.  But since 'W'
doesn't seem to be used anyway, maybe this can be changed.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-22 Thread Ulrich Weigand
Georg-Johann Lay wrote:
> Ulrich Weigand schrieb:
> > Georg-Johann Lay wrote:
> > 
> >>http://gcc.gnu.org/ml/gcc/2011-08/msg00131.html
> >>
> >>Are you going to install that patch? Or maybe you already installed it?
> > 
> > No, it isn't approved yet (in fact, it isn't even posted for approval).
> > Usually, patches that add new target macros, or new arguments to target
> > macros, but do not actually add any *exploiter* of the new features,
> > are frowned upon ...
> 
> I thought about implementing a "hidden" named AS first and not exposing 
> it to user land, e.g. to be able to do optimizations like
> http://gcc.gnu.org/PR49857
> http://gcc.gnu.org/PR43745
> which need named AS to express that some pointers/accesses are different.
> 
> The most prominent drawback of named AS at the moment is that AVR has 
> few address registers and register allocation often regerates unpleasant 
> code or even runs into spill fails.
> 
> The AS in question can only be accessed by means of post-increment 
> addressing via one single hard register.

Well, it doesn't really matter whether you want to expose the AS externally
or just use it internally.  Either way, I'll be happy to propose my patch
for inclusion once you have a patch ready that depends on it ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: Derive more alias information from named address space

2011-09-19 Thread Ulrich Weigand
Bingfeng Mei wrote:

> Therefore, A & B could only be disjoint, i.e., not aliased to each other.
> We should be able to write:
> 
>   if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
>   {
> if (!targetm.addr_space.subset_p (MEM_ADDR_SPACE (mem), MEM_ADDR_SPACE 
> (x))
>&& !targetm.addr_space.subset_p (MEM_ADDR_SPACE (x), MEM_ADDR_SPACE 
> (mem)))
>   return 0;
> else
>   return 1;
>   }
> 
> Is this correct?

Yes, this looks correct to me ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: IRA changes rules of the game

2011-10-20 Thread Ulrich Weigand
Paulo J. Matos wrote:

> (define_insn_and_split "neghi_internal"
> [(set (match_operand:QI 0 "register_operand" "=c")
>   (neg:QI (match_dup 0)))
>  (set (match_operand:QI 1 "register_operand" "=c")
>   (plus:QI
> (plus:QI
>   (ltu:QI (neg:QI (match_dup 0)) (const_int 0))
>   (match_dup 1))
> (const_int 0)))
>  (clobber (reg:CC RCC))]

> Am I missing something or something here is broken?

When reload looks at the above pattern, it will see just
two operands, both of which are output-only.  So when it
decides to reload one of the operands, it will only provide
an output reload, no input reload.

For operands that are actually used for both input and
output, you need to provide two match_operand clauses,
and tie them using a matching constraint.  Simply using
match_dup doesn't accomplish that.

Something along the lines of:

 (define_insn_and_split "neghi_internal"
 [(set (match_operand:QI 0 "register_operand" "=c")
   (neg:QI (match_operand:QI 1 "register_operand" "0")))
  (set (match_operand:QI 2 "register_operand" "=c")
   (plus:QI
 (plus:QI
   (ltu:QI (neg:QI (match_dup 1)) (const_int 0))
   (match_operand:QI 3 "register_operand" "2"))
 (const_int 0)))
  (clobber (reg:CC RCC))]

ought to work as expected.

(The remaining match_dup is fine, since reload already knows
operand 1 is used as input, so the dup doesn't introduce a
new type of use.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-07-20 Thread Ulrich Weigand
Paul Edwards wrote:

> I then found out that even with old versions of the machine definition,
> I can have the warning removed by simply not defining CONST_INT
> in the PREDICATE_CODES, even though it is allowed when the
> function is called.  ie it seems to have no effect on the code
> generation, but succeeds in eliminating the warning, and without
> needing to define an extra constraint for non-constant situations.

Actually PREDICATE_CODES does have to match the predicate definitions.
If it does not, you can run into subtle bugs, as the code in genrecog.c
that generates the finite state automaton matching an RTX pattern
against the .md insn definitions *does* make use of PREDICATE_CODES;
for example, it will assume that two predicates with disjoint sets
of PREDICATE_CODES can never both match the same RTX.

This may or may not lead to visible differences when compiling simple
test cases, but I'd expect effects to be visible in more complex
scenarios.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-08-10 Thread Ulrich Weigand
Paul Edwards wrote:

> GCC 4 complained (on my Linux system) that I didn't have
> various things (gimp etc) installed, which means I would need
> that other software to be ported too, which is not a project
> I want to work on at the moment.  However, at least it means
> that i have successfully merged all the GCCMVS changes
> to 3.2.3 in with the (last available) 3.4.6 development, which
> was a precursor to any GCC 4 port anyway.  I'll see over time
> how GCC 3.4.6 pans out.

This probably is not gimp (the graphics editor) but gmp (the
multi-precision integer operation library) and mpfr (same for
floating-point).  To build any recent GCC you'll indeed need
these two libraries.  Fortunately, they're already available
on s390(x) on Linux, and shouldn't really contain anything
that is OS-specific, so porting those to MVS should hopefully
be straightforward ...

> Until then, back at GCC 3.2.3, I have a "need" to make the entry
> point 0 in my MVS modules.

> Currently I generate this:
[snip]
> for a main program.  In order to get the entry point to 0, I need to move 
> that
> @@MAIN function to the top of the file.

I don't think there is a reliable way to change the sequence of
functions / objects in the output file.

However, it seems to me it shouldn't really be necessary to treat "main"
special.  Usually, the "entry point" isn't really "main", but rather some
function in crt0.o, which then in turn *calls* main after all startup
processing has been done.  As crt0.o can be (and usually is) completely
written in assembler, you can arrange for everything to be in the sequence
that is required.  (On the linker command line, crt0.o would be placed
first, so if the entry point is the first thing in crt0.o it will then
also be the first thing in the executable.)

> And I have another question - where is the code for __asm__?
> 
> Currently that is generating garbage for me:
> 
> unsigned int bytes = 258;
> 
> __asm__("? :");
> 
> int main(void)
> 
> BYTESEQU   *
>  DCF'258'
>  o...@z
> * Program text area
> 
> when done in cross-compiler mode, and need to find out where
> the literal is being translated (or more likely - failing to be
> translated in the first instance).  Any idea where that is?

Hmm, this seems to be the bug fixed by Eric Christopher in 2004:
http://gcc.gnu.org/ml/gcc-cvs/2004-02/msg01425.html


> And final thing - the interest in the __asm__ comes from the hope
> that in the generated 370 assembler, it would be possible to have
> the C code interspersed to whatever extent possible.  The plan was
> to basically somehow get every line of C code, e.g. "int main(void)"
> above, and automatically generate an:
> __asm__("int main void)");

As you note, this doesn't seem workable as the result wouldn't
be syntactically valid ...

> which gives a syntax error.  Any idea how to get the mixed
> C/assembler when I'm running with the "-S" option and don't
> have the luxury of calling the normal "as" which would
> normally handle that?

There doesn't seem to be an easy way to do this from the
compiler.  At the time compiler output is generated, the
original source code text isn't even kept any more; you'd
have to go back and re-read the original source files using
the line-number debug data, just as the assembler would ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-08-11 Thread Ulrich Weigand
Paul Edwards wrote:

> I expected that the assembler generation wouldn't happen until
> after the optimization was completed, and thus, by then, I
> would know whether this was a main.

That depends a bit on the compiler version and optimization level,
but (in particular in the 3.x time frame) GCC may output assembler
code on a function-by-function basis, without necessarily reading
in the whole source file first.

As I said, it seems the best way would be to not have care at all
whether or not any particular source file contains a main routine,
but instead do all entry-point processing in the crt0 startup file.

> > As crt0.o can be (and usually is) completely
> > written in assembler, you can arrange for everything to be in the sequence
> > that is required.  (On the linker command line, crt0.o would be placed
> > first, so if the entry point is the first thing in crt0.o it will then
> > also be the first thing in the executable.)
> 
> Yes, I can do that, but that means I need to have a linker command
> line!  The way the MVS linker works, I can link my main program,
> (which obviously doesn't have crt0 in it), and, thanks to the "END"
> statement, I can specify an entry point.  This means no complaint
> from the linker about a default (and wrong) entry point, and no
> need for any linker statements.  It all automatically resolves.

I don't know exactly how your port handles this on MVS, but usually
GCC itself will invoke the linker, and will itself prepare an
appropriate command linker for the linker.  As part of this command
line, things like crt files will be specified.  You should set the
STARTFILE_SPEC macro in your back-end to do that ...

> > There doesn't seem to be an easy way to do this from the
> > compiler.  At the time compiler output is generated, the
> > original source code text isn't even kept any more; you'd
> > have to go back and re-read the original source files using
> > the line-number debug data, just as the assembler would ...
> 
> Ok, well - I would be content with just the source line number
> appearing in the output assembler.  Is this information
> available as each assembler line is output?

In current GCC, every insn contains "location" information pointing
back to source code line (and column) numbers.  However, in GCC 3.x
I think this wasn't yet present, but you had to rely on line number
notes interspersed with the insn stream.

In any case, if you build with -g and use in addition the "debug
assembler output" flag -dA the assembler output should contain
human-readable comments containing line number information.


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-08-12 Thread Ulrich Weigand
Paul Edwards wrote:
> > That depends a bit on the compiler version and optimization level,
> > but (in particular in the 3.x time frame) GCC may output assembler
> > code on a function-by-function basis, without necessarily reading
> > in the whole source file first.
> 
> Ok, actually it doesn't matter if it doesn't work all the time.  I'll
> always be compiling with -Os anyway, so it sounds like I'm in
> with a chance of the whole file being read first?
> 
> If so, where is my first opportunity, in 3.2.3, to see if there's a
> "main" function in this file?

Hmm, it seems 3.2.x would *always* operate on a function-by-function
basis.  The unit-at-a-time mode was only introduced with 3.4 (I don't
recall if it was already present in 3.3).  I don't think there is any
way in 3.2.3 to check whether there is a "main" function in the file
before it is processed ...

> > I don't know exactly how your port handles this on MVS, but usually
> > GCC itself will invoke the linker, and will itself prepare an
> > appropriate command linker for the linker.
> 
> GCC on MVS *only* outputs assembler.  ie you always have to
> use the "-S" option.
> 
> By doing so, it turns GCC into a pure text-processing application,
> which will thus work in any C90 environment.

Huh.  So the user will always have to invoke the linker manually, and
pass all the correct options (libraries etc.)?

What is the problem with having GCC itself invoke the linker, just like
it does on other platforms?

> > In current GCC, every insn contains "location" information pointing
> > back to source code line (and column) numbers.  However, in GCC 3.x
> > I think this wasn't yet present, but you had to rely on line number
> > notes interspersed with the insn stream.
> >
> > In any case, if you build with -g and use in addition the "debug
> > assembler output" flag -dA the assembler output should contain
> > human-readable comments containing line number information.
> 
> The GCC assembler is never invoked.  After getting the assembler
> output from the -S option, this is fed into IFOX00/IEV90/ASMA90.

As Paolo mentioned, the -dA flag is an option to the *compiler* that
causes it to place additional information into its output stream
(the assembler source code).


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-08-12 Thread Ulrich Weigand
Paul Edwards wrote:

> > What is the problem with having GCC itself invoke the linker, just like
> > it does on other platforms?
> 
> 1. That requires extensions to the C90 standard.  The behaviour of
> system() is undefined, much less even the existence of fork() etc.
> 
> 2. The attempt to add functionality to system() in MVS has made
> leaps and bounds, but has not reached the stage of being able
> to detect if the SYSPRINT DCB is already open so it knows not
> to reopen it, and close it, stuffing up the caller.
> 
> 3. MVS compilers don't normally invoke the linker.  That's always
> a separate step.  GCCMVS is an MVS compiler also.  It would
> be normal to generate object code though.  But the compiler
> normally generates the object code, rather than invoking the
> assembler.  In any case, the facility to allocate temporary
> datasets for HLASM and deciding what space parameters
> should be used etc etc has not yet been determined, and is
> unwieldy regardless, and the functionality doesn't exist yet
> anyway, and may be years away still, if it even makes sense.

I cannot really comment on what would be desirable behaviour for
MVS ...   Just as an implementation note, while it is true that
the GCC compiler driver needs to be able to execute other processes
(preprocessor, compiler, assembler, linker) as sub-tasks, it does
not require the full POSIX system/fork/exec functionality to do so.
GCC relies on the libiberty "pex" family of routines, which are
much more narrow in scope, and have in fact been ported to several
non-UNIX systems, including even MS-DOS.  Providing a port of "pex"
to MVS should be much easier that porting a full Unix "system" or
"fork" feature.

B.t.w. if you cannot execute sub-tasks at all, how does the
MVS GCC invoke the preprocessor (I think this was still a
separate process in 3.2) and the core compiler (cc1) from
the compiler driver?   Do you even have a separate compiler
driver?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-08-20 Thread Ulrich Weigand
Paul Edwards wrote:

> > Hmm, it seems 3.2.x would *always* operate on a function-by-function
> > basis.  The unit-at-a-time mode was only introduced with 3.4 (I don't
> > recall if it was already present in 3.3).  I don't think there is any
> > way in 3.2.3 to check whether there is a "main" function in the file
> > before it is processed ...
> 
> Does that mean I could take advantage of this behaviour?

I don't think this would be a good idea.

> /* Store in OUTPUT a string (made with alloca) containing an
>assembler-name for a local static variable named NAME.
>LABELNO is an integer which is different for each call.  */
> 
> #ifdef TARGET_PDPMAC
> #define ASM_FORMAT_PRIVATE_NAME(OUTPUT, NAME, LABELNO)  \
> { \
>   (OUTPUT) = (char *) alloca (strlen ((NAME)) + 10); \
>   sprintf ((OUTPUT), "__%d", (LABELNO)); \
> }

How does this work?  ASM_FORMAT_PRIVATE_NAME is not supposed
to completely ignore the NAME argument, the function may well
be called with the same LABELNO but different NAME strings,
and this must not result in conflicting symbols ...

> static void
> i370_output_function_prologue (f, l)
>  FILE *f;
>  HOST_WIDE_INT l;
> {
> /* Don't print stack and args in PDPMAC as it makes the
>comment too long */
> #ifdef TARGET_PDPMAC
>   fprintf (f, "* %c-func %s prologue\n",
>mvs_need_entry ? 'X' : 'S',
>mvs_function_name);
> #else

At this point, you may refer to "current_function_decl" to
retrieve information about the function currently being output.
In particular, you can retrieve the original source-level name
associated with the routine via DECL_NAME (current_function_decl).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-08-21 Thread Ulrich Weigand
Paul Edwards wrote:

> >> /* Store in OUTPUT a string (made with alloca) containing an
> >>assembler-name for a local static variable named NAME.
> >>LABELNO is an integer which is different for each call.  */
> >>
> >> #ifdef TARGET_PDPMAC
> >> #define ASM_FORMAT_PRIVATE_NAME(OUTPUT, NAME, LABELNO)  \
> >> { \
> >>   (OUTPUT) = (char *) alloca (strlen ((NAME)) + 10); \
> >>   sprintf ((OUTPUT), "__%d", (LABELNO)); \
> >> }
> >
> > How does this work?  ASM_FORMAT_PRIVATE_NAME is not supposed
> > to completely ignore the NAME argument, the function may well
> > be called with the same LABELNO but different NAME strings,
> > and this must not result in conflicting symbols ...
> 
> I have compiled the entire GCC and not come up with any duplicate
> static function names, so I think the number is always unique.

Hmm, I see that in the 3.2.x code base this is indeed true.
However, in later compilers ASM_FORMAT_PRIVATE_NAME is used
for other purposes by the middle-end, not just static function
or variable names.  You definitely can get number collisions
in later compilers ...

> > At this point, you may refer to "current_function_decl" to
> > retrieve information about the function currently being output.
> > In particular, you can retrieve the original source-level name
> > associated with the routine via DECL_NAME (current_function_decl).
> 
> Thanks a lot!  I couldn't use that directly, but this:

Why not?  I'd have thought something like

  printf ("%s", IDENTIFIER_POINTER (DECL_NAME (current_function_decl)));

should work fine ...


> c:\devel\gcc\gcc\config\i370>cvs diff -r 1.37 i370.c

B.t.w. if you use the -u or -c option to cvs diff, the diffs are
a lot more readable ...

> <mvs_function_name);
> ---
> >fname_as_string(0));

This is a bit problematic as fname_as_string is a function defined in
the C front-end.  If you were e.g. to build the Fortran compiler, your
back-end gets linked against the Fortran front-end instead of the C
front-end, and that function simply will not be there.  Generally,
the rule is that the back-end must not directly call front-end routines.

In any case, for C source code fname_as_string does pretty much
nothing else than what I suggested above ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: i370 port

2009-09-14 Thread Ulrich Weigand
Paul Edwards wrote:

> int
> i370_branch_dest (branch)
>  rtx branch;
> {
>   rtx dest = SET_SRC (PATTERN (branch));
>   int dest_uid;
>   int dest_addr;
> 
>   /* first, compute the estimated address of the branch target */
>   if (GET_CODE (dest) == IF_THEN_ELSE)
> dest = XEXP (dest, 1);
>   dest = XEXP (dest, 0);

This is set up only to handle direct branches of the form

  (set (pc) (label_ref ...))

and indirect branches of the form

  (set (pc) (if_then_else (...) (label_ref ...) (pc)))

but *not* indirect branches of the form

  (set (pc) (if_then_else (...) (pc) (label_ref ...)))

This latter form is accepted by the "negated conditional
jump instructions in the i370.md file, like so:

(define_insn ""
  [(set (pc)
(if_then_else (eq (cc0)
  (const_int 0))
  (pc)
  (label_ref (match_operand 0 "" ""
;   (clobber (reg:SI 14))
   ]
  ""
  "*
{
  check_label_emit ();
  mvs_check_page (0, 4, 0);
  if (i370_short_branch(insn) || mvs_check_label (CODE_LABEL_NUMBER 
(operands[0])))
{


Therefore, the i370_branch_dest routine needs to handle
those as well.  Probably something along the following lines:

  if (GET_CODE (dest) == IF_THEN_ELSE)
{
  if (GET_CODE (XEXP (dest, 1) == LABEL_REF)
dest = XEXP (dest, 1);
  else
dest = XEXP (dest, 2);
}

  gcc_assert (GET_CODE (dest) == LABEL_REF);
  dest = XEXP (dest, 0);

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


  1   2   >