Re: Reload Problem in delete_output_reload
Erwin Unruh wrote: > I have a problem with delete_output_reload. It sometimes deletes > instructions > which are needed. Here an analysis of a recent case (In a private > version of > the S390 port). The original S390 shows almost the same reloads, but > chooses > different registers. What GCC version your compiler based on? > Reloads for insn # 1598 > Reload 0: reload_in (SI) =3D (const_int 4080 [0xff0]) > ADDR_REGS, RELOAD_FOR_OUTPUT_ADDRESS (opnum =3D 0) > reload_in_reg: (const_int 4080 [0xff0]) > reload_reg_rtx: (reg:SI 2 2) > Reload 1: reload_in (SI) =3D (const_int 4080 [0xff0]) > ADDR_REGS, RELOAD_FOR_OTHER_ADDRESS (opnum =3D 0) > reload_in_reg: (const_int 4080 [0xff0]) > reload_reg_rtx: (reg:SI 2 2) > Reload 2: reload_in (SI) =3D (const_int 4080 [0xff0]) > ADDR_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum =3D 2) > reload_in_reg: (const_int 4080 [0xff0]) > reload_reg_rtx: (reg:SI 2 2) > Reload 3: reload_in (DI) =3D (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15) > (const_int > 4080 [0xff0])) > (const_int 3144 > [0xc48])) [0 S8 A8]) > reload_out (DI) =3D (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15) > (const_int > 4080 [0xff0])) > (const_int 3136 > [0xc40])) [0 S8 A8]) > GENERAL_REGS, RELOAD_OTHER (opnum =3D 0), can't combine > reload_in_reg: (reg:DI 1391) > reload_out_reg: (reg:DI 1393) > reload_reg_rtx: (reg:DI 0 0) > Reload 4: ADDR_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum =3D 2), can't > combine, secondary_reload_p > reload_reg_rtx: (reg:SI 3 3) > Reload 5: reload_in (SI) =3D (plus:SI (plus:SI (reg/f:SI 15 15) > (const_int 4080 > [0xff0])) > (const_int 3136 > [0xc40])) > ADDR_REGS, RELOAD_FOR_INPUT (opnum =3D 2), inc by 8 > reload_in_reg: (plus:SI (plus:SI (reg/f:SI 15 15) > (const_int 4080 > [0xff0])) > (const_int 3136 > [0xc40])) > reload_reg_rtx: (reg:SI 2 2) > secondary_in_reload =3D 4 > secondary_in_icode =3D reload_insi > > These reloads are ok. In do_output_reload it is noted that both > insn_1597.Reload_2 and insn_1598.Reload_3 write to the same stack slot. > So the compiler decides to remove the first reload and use register > (reg:DI 2) > directly. In this analysis it misses the fact that (reg:SI 2) > is used for input reloads of insn 1598. This should actually be caught by the free_for_value_p check in choose_reload_regs. You cannot inherit a value for a RELOAD_OTHER reload (3) in a register that is already in use for a RELOAD_FOR_INPUT_ ADDRESS reload (2). Could you try to find out why this doesn't work correctly? > One critical point is the timing on the variables reg_reloaded_valid and > spill_reg_store. > Within the function emit_reload_insns they are first checked (within > do_output_reload) and later updated (after the reload instructions are > written). > So they reflect the state before the "reload sequence". Not all usages > reflect > this semantics. Especially the check within delete_output_reload is not > correct. I'm not sure how delete_output_reload comes into play here. The decision to inherit was already made long ago, in choose_reload_regs, and that is already incorrect. Even if the output reload for insn 1597 were not deleted at this point, the code would still be incorrect. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE [EMAIL PROTECTED]
Re: Reload Problem in delete_output_reload
Erwin Unruh wrote: > Sorry, I mislead you. Somehow I did confuse (mem/c:DI (reg:SI 2 2) [0 S8 > A8]) > with (reg:DI 2). Register 2 is used correctly. > I do not think any reload is inherited in this case. Ah, right. That did confuse me ;-) > I did find something which might be the real problem. Within > delete_output_reload > there are two calls to count_occurrences. The second one will be called > with > parameters > > (parallel [ > (set (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15) > (const_int 4080 [0xff0])) > (const_int 3136 [0xc40])) [0 S8 A8]) > (plus:DI (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15) > (const_int 4080 [0xff0])) > (const_int 3144 [0xc48])) [0 S8 A8]) > (mem/c:DI (plus:SI (plus:SI (reg/f:SI 15 15) > (const_int 4080 [0xff0])) > (const_int 3136 [0xc40])) [0 S8 A8]))) > (clobber (reg:CC 33 %cc)) > ]) > > (mem/c:DI (plus:SI (reg/f:SI 15 15) > (const_int 7216 [0x1c30])) [0 S8 A8]) > > The offset from register 15 is represented in two different ways. In the > > first parameter it is split in two constants, in the second it is kept=20 > as a single constant. > > Due to this difference, no occurence is found. So the second operand=20 > of the (plus:DI ...) is not counted. I see. This does look exactly like the problem Alexandre Oliva fixed for PR target/28146: http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00592.html This patch is in mainline, but hasn't been applied to the 4.1 branch. Could you check whether it fixes your problem? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE [EMAIL PROTECTED]
Re: Import GCC 4.2.0 PRs
Mark Mitchell wrote: > * PR 28544 (Brook, Weigand) -- this is an ARM ICE which Paul has tracked > to reload, but he's not sure what to do there. Perhaps Ulrich can help. This description doesn't appear to match the bugzilla record. Maybe you're referring to PR 28675? I can have a look at that one. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE [EMAIL PROTECTED]
Broken debug info with G++ anon struct typedefs
Hello, in debugging a failing GDB test case I've come upon what looks like bad DWARF-2 debug info being generated by G++. The problem has to do with typedefs of anonymous structs like: typedef struct { int one; int two; } tagless_struct; tagless_struct v_tagless; This causes the following DWARF-2 to be generated: .uleb128 0x2 # (DIE (0x6c) DW_TAG_structure_type) .ascii "\0"# DW_AT_name .byte 0x8 # DW_AT_byte_size .byte 0x1 # DW_AT_decl_file (xxx.ii) .byte 0x1 # DW_AT_decl_line .4byte 0xa4 # DW_AT_sibling .uleb128 0x3 # (DIE (0x87) DW_TAG_member) .ascii "one\0" # DW_AT_name .byte 0x1 # DW_AT_decl_file (xxx.ii) .byte 0x2 # DW_AT_decl_line .4byte 0xa4 # DW_AT_type .byte 0x2 # DW_AT_data_member_location .byte 0x23 # DW_OP_plus_uconst .uleb128 0x0 .uleb128 0x3 # (DIE (0x95) DW_TAG_member) .ascii "two\0" # DW_AT_name .byte 0x1 # DW_AT_decl_file (xxx.ii) .byte 0x3 # DW_AT_decl_line .4byte 0xa4 # DW_AT_type .byte 0x2 # DW_AT_data_member_location .byte 0x23 # DW_OP_plus_uconst .uleb128 0x4 .byte 0x0 # end of children of DIE 0x6c .uleb128 0x4 # (DIE (0xa4) DW_TAG_base_type) .byte 0x4 # DW_AT_byte_size .byte 0x5 # DW_AT_encoding .ascii "int\0" # DW_AT_name .uleb128 0x5 # (DIE (0xab) DW_TAG_variable) .ascii "v_tagless\0" # DW_AT_name .byte 0x1 # DW_AT_decl_file (xxx.ii) .byte 0x5 # DW_AT_decl_line .4byte 0x6c # DW_AT_type .byte 0x1 # DW_AT_external .byte 0x5 # DW_AT_location .byte 0x3 # DW_OP_addr .4byte v_tagless Note how the v_tagless variable records DIE 0x6c as type, which is the anonymous struct -- the typedef is completely missing. This is incorrect; we should see "tagless_struct" as type of "v_tagless". The reason for this appears to be the following optimization in cp/decl.c (grokdeclarator): /* If the user declares "typedef struct {...} foo" then the struct will have an anonymous name. Fill that name in now. Nothing can refer to it, so nothing needs know about the name change. */ if (type != error_mark_node && unqualified_id && TYPE_NAME (type) && TREE_CODE (TYPE_NAME (type)) == TYPE_DECL && TYPE_ANONYMOUS_P (type) /* Don't do this if there are attributes. */ && (!attrlist || !*attrlist) && cp_type_quals (type) == TYPE_UNQUALIFIED) { tree oldname = TYPE_NAME (type); tree t; /* Replace the anonymous name with the real name everywhere. */ for (t = TYPE_MAIN_VARIANT (type); t; t = TYPE_NEXT_VARIANT (t)) if (TYPE_NAME (t) == oldname) TYPE_NAME (t) = decl; This attempts to change the name to the typedef name. However, it comes too late. The original anonymous struct type has already run through rest_of_type_compilation at this stage, which has already called dwarf2out_decl on it. This in turn has already allocated the DIE and set up the name information. Changing TYPE_NAME afterwards has no effect on the debug data any more. Any suggestions how to fix this? Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE [EMAIL PROTECTED]
Re: Trouble understanding reload dump file format..
Dave Korn wrote: > (define_insn "mulsi3" > [(set (match_operand:SI 0 "register_operand" "=d") > (mult:SI (match_operand:SI 2 "register_operand" "r") > (match_operand:SI 1 "register_operand" "a"))) >(clobber (match_scratch:SI 3 "b"))] You should be using "=b" to designate the operand as *output* only. Otherwise, reload will attempt (unsuccessfully) to load up "scratch:SI" as input value ... B.t.w. if the register class denoted by "b" has just a single element, you might just as well use the hard reg directly in the insn pattern here. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE [EMAIL PROTECTED]
Re: Trouble understanding reload dump file format..
Dave Korn wrote: > Also, it's not actually a hard reg, it's in fact a memory-mapped peripheral. > This means that I need to specify secondary reloads (can't be loaded directly > from memory as I have no mem->mem move insn, needs to go via a gpr) and that > implies that the register has to exist as a class because you can only specify > secondary reloads on a per-reg-class basis. I see, that makes sense. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE [EMAIL PROTECTED]
Re: reload-branch created
ode != inmode && ! HARD_REGNO_MODE_OK (offsetted_regno, mode)) ! { ! nregs = HARD_REGNO_NREGS (offsetted_regno, inmode); ! ! for (k = 0; k < nregs; k++) ! if (! TEST_HARD_REG_BIT (*usable_regs, offsetted_regno + k)) ! return IT_NONE; ! ! can_use_inheritance_reg = 0; ! } else { nregs = HARD_REGNO_NREGS (offsetted_regno, mode); -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: reload-branch created
Bernd Schmidt <[EMAIL PROTECTED]> wrote on 03/20/2005 07:41:14 PM: > This is OK. Would you check it in? Done, thanks. Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand Linux for S/390 Design & Development IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen Phone: +49-7031/16-3727 --- Email: [EMAIL PROTECTED]
Re: [m68k]: Trouble trying to figure out LEGITIMIZE_RELOAD_ADDRESS
Peter Barada wrote: > I figured out how to make it work using LEGITIMIZE_RELOAD_ADDRESS(at > least for gcc-3.4.3) via: I seem to recall rth opposing any use of LEGITIMIZE_RELOAD_ADDRESS to ensure *correctness* of generated code; L_R_A's only purpose is supposed to provide additional optimizations. > For ColdFire v4e, FP loads and stores can not use REG+REG addressing. > > I think you are correct that this current hack can be improved by > making an array of mode to determine if double_reg_address_ok should > be true, but I think in the end that a more flexible scheme be thought > about since this isn't the *only* peculiarity of the ColdFire. One is > that pc-relative addressing is only available on the *source*, not the > destination, and currently GO_IF_LEGITIMATE_ADDRESS nor > LEGITIMIZE_ADDRESS have no way of know if its a source or destination. It looks like you really have different types of valid addresses, depending on which instruction uses the address. We have a similar situation on s390, and to handle this I've introduced the EXTRA_MEMORY_CONSTRAINT mechanism. The idea is, you keep GO_IF_LEGITIMIZE_ADDRESS defined so as to accept the most general form of addresses accepted by any instruction. You continue to use the 'm' constraint for instructions accepting the most general form. For each more restricted form of address you define an extra constraint, and recognize exactly the appropriate forms in your implementation of EXTRA_CONSTRAINT. You use these constraints instead of 'm' in those instructions that accept only a restricted form of addresses. In addition, you define EXTRA_MEMORY_CONSTRAINT and have it accept all constraints that correspond to such restricted memory addresses. (Note that EXTRA_MEMORY_CONSTRAINT assumes that at least a single base register will be a valid address accepted by any constraint marked thus.) Reload will now go and first try to handle each operand marked with an EXTRA_MEMORY_CONSTRAINT just like it does 'm' operands. However, it additionally checks EXTRA_CONSTRAINT, and if the current operand is rejected, reload will automatically reload the operand into a base register. Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: reload-branch created
Bernd Schmidt <[EMAIL PROTECTED]> wrote on 11.04.2005 14:43:38: >* reload.c (find_reloads): Only set INC field if we know we have an >autoinc reload. Yes, this helps for s390. With the current reload-branch, and just my scan_rtx patch on top, I was able to bootstrap and run the test suite without regressions (all languages, including Ada) on s390-ibm-linux and s390x-ibm-linux. Thanks for taking care of this problem! Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand Linux for S/390 Design & Development IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen Phone: +49-7031/16-3727 --- Email: [EMAIL PROTECTED]
Re: GCC 3.4.4 RC2
Greg Schafer wrote: > On Fri, May 13, 2005 at 03:44:59PM -0700, Mark Mitchell wrote: > > > GCC 3.4.4 RC2 is now available here: > > > > ftp://gcc.gnu.org/pub/gcc/prerelease-3.4.4-20050512 > > > > There are just a few changes from RC1 to fix critical problems people > > experienced with RC1. > > > > Please download, build, and test. > > Done. Likewise, s390(x)-ibm-linux are working fine now: http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00916.html http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00917.html > Something went wrong with the libstdc++ testsuite. It doesn't show up in the > test results: > > http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00856.html > > AFAICT, the libstdc++ testsuite _did_ run, but it doesn't show up. This > could be pilot error on my behalf, but I've followed the exact same scripted > procedure for ages so I don't think so. > > Oops, I failed to realise it also happened with RC1: > > http://gcc.gnu.org/ml/gcc-testresults/2005-05/msg00670.html Um? This does have the libstd++ results ... > I've run out of time right now and cannot investigate it immediately. Anyone > else seeing this? Quick browse of other test results suggests I'm not alone. > Maybe there's a BZ already? Dunno. It would appear the problem is this patch: 2005-05-12 Mark Mitchell <[EMAIL PROTECTED]> 2005-04-04 Mark Mitchell <[EMAIL PROTECTED]> * testsuite/Makefile.am (check-local): Remove. (curent_symbols.txt): Likewise. (check-abi): Do not depend on current_symbols.txt. * testsuite/Makefile.in: Regenerated. * testsuite/libstdc++-abi/abi.exp: Build current_symbols.txt. 2005-04-01 Mark Mitchell <[EMAIL PROTECTED]> * testsuite/Makefile.am (noinst_PROGRAMS): Remove. (site.exp): Write out the path to the baseline file. (check-abi): Use DejaGNU. (check-abi-verbose): Remove. * testsuite/Makefile.in: Regenerated. * testsuite/abi_check.cc (main): Check the return value from compare_symbols. * testsuite/testsuite_abi.cc (compare_symbols): Return a value. * testsuite/testsuite_abi.h (compare_symbols): Adjust prototype. * testsuite/libstdc++-abi/abi.exp: New file. 2004-03-19 Phil Edwards <[EMAIL PROTECTED]> * testsuite/Makefile.am (site.exp): New target, based on that created by automake. Also set libiconv. While using DejaGNU for check-abi might be a good idea in principle, the way it is currently set up means that the 'make check-abi' run will overwrite the libstdc++.log and libstdc++.sum files generated by the main 'make check' run. This is why the results are missing from the test_summary report ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: RFC: (use) useful for avoiding unnecessary compare instructions during cc0->CCmode ?!
Bjoern Haase wrote: > IIUC correctly one of the two or three difficulties with cc0->CCmode > conversion is, that combine in it's present form is not able to > recognize that in a sequence like > > (parallel[ > (set reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102)) > (set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102)))]) > ((set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102))) > > the compare instruction could be avoided. IIUC it is presently suggested to > make use of *many* peephole2 patterns for solving this problem. The current CCmode back-ends (e.g. s390, but also i386 and others) handle this problem by initially expanding into: (parallel [(set (reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102))) (clobber reg:CC_xxx CC)]) (set (reg:CC_xxx CC) (compare:CC (reg:SI 101) (reg:SI 102))) and *also* offer another pattern, to be matched by combine: (parallel [(set (reg:CC_xxx CC) (compare:CC (reg:SI 101) (reg:SI 102))) (set (reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102)))]) See e.g. the "subsi3" expander vs. the "*subsi3_cc" and "*subsi3_cc2" patterns in s390.md. This works because combine will treat the initial parallel as a 'single-set' insn, and nearly all optimizations will be enabled. Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
4.0 regression: missing debug info for global variables in C with -O2
Hello, we've just noticed a quite serious regression in debug info output in GCC 4.0 over previous releases: when building with -funit-at-a-time (which is on by default with -O2), *no* debug info for global variables appears to be emitted at all. The problem appears to be this piece of code in check_global_declarations: /* Do not emit debug information about variables that are in static storage, but not defined. */ if (TREE_CODE (decl) == VAR_DECL && TREE_STATIC (decl) && !TREE_ASM_WRITTEN (decl)) DECL_IGNORED_P (decl) = 1; which was added by: http://gcc.gnu.org/ml/gcc-cvs/2004-11/msg01250.html 2004-11-25 Mark Mitchell <[EMAIL PROTECTED]> PR c++/18556 * toplev.c (check_global_declarations): Set DECL_IGNORED_P on unemitted variables with static storage duration. However, check_global_declarations is called (by the C front-end at least) *before* the call to cgraph_optimize. At this time, when -funit-at-a-time is in effect, *no* variables have been emitted to assembler as far as I can tell, and thus all global variables get the DECL_IGNORED_P flag. Interestingly enough, when building with the C++ front-end, the debug info is properly emitted. I'm not sure where exactly the difference is ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: 4.0 regression: missing debug info for global variables in C with -O2
Andrew Pinski wrote: > You can reproduce it using: > static int i; > int main(void) > { >i += 3; >i *= 5; >return 0; > } > > and readelf and looking for the DW_TAG_variable tag. Yes; in fact 'main' is even superfluous. Just compile int var; with -S -O2 -g on gcc 3.4 and 4.0 and look at the resulting assembler file, the difference is quite obvious ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: 4.0 regression: missing debug info for global variables in C
Paolo Bonzini wrote: > Maybe this is responsible for part of PR21828? I'd say this *is* PR21828: note that the variables whose type is unknown are global variables in C code compiled with -O2 ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Edges, predictions, and GC crashes ...
Hello, I'm seeing compiler crashes during garbage collection when using mudflap. The problem appears to be that some basic_block_def structures point to edge_prediction structures which point to edge_def structures that have already been ggc_free()'d. Now, looking at remove_edge (cfg.c) is does indeed appear that it is possible for edges to get deleted without the corresponding prediction structure being removed as well ... How is this supposed to work? Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: Edges, predictions, and GC crashes ...
Jan Hubicka wrote: > I didn't have any cleanup_cfg in between earliest place putting > predictions and the profiling pass consuming them, so this scenario > didn't happen. This has however changed a long time ago. I guess just > teaching remove_edge to walk prediction list if it is present and kill > now dead predictions would not be perfomrance overkill as the > predictions are rare, I can implement that if you pass me some testcase. FAIL: libmudflap.c++/pass57-frag.cxx (test for excess errors) WARNING: libmudflap.c++/pass57-frag.cxx compilation failed to produce executable FAIL: libmudflap.c++/pass57-frag.cxx (-static) (test for excess errors) WARNING: libmudflap.c++/pass57-frag.cxx (-static) compilation failed to produce executable FAIL: libmudflap.c++/pass57-frag.cxx (-O2) (test for excess errors) WARNING: libmudflap.c++/pass57-frag.cxx (-O2) compilation failed to produce executable FAIL: libmudflap.c++/pass57-frag.cxx (-O3) (test for excess errors) WARNING: libmudflap.c++/pass57-frag.cxx (-O3) compilation failed to produce executable with current mainline on s390-ibm-linux. Thanks for looking into this issue! Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: Edges, predictions, and GC crashes ...
Jan Hubicka wrote: > I've comitted the attached patch. I didn't suceed to reproduce your > failures, but Danny reported it fixes his and it bootstrap/regtests > i686-pc-gnu-linux. Thanks; this does fix one crash on s390x, but doesn't fix the pass57-frag crashes on s390. What happens is that after the predictions are created, but before remove_edge is called, the edge is modified in rtl_split_block (called from tree_expand_cfg): /* Redirect the outgoing edges. */ new_bb->succs = bb->succs; bb->succs = NULL; FOR_EACH_EDGE (e, ei, new_bb->succs) e->src = new_bb; Now the 'src' link points to a different basic block, but the old basic block still has the prediction pointing to the edge. When remove_edge is finally called, your new code tries to find and remove the prediction from the *new* basic block's prediction list -- but it still remains on the old one's list ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: GCC 4.01 RC1 Available
Mark Mitchell wote: > The GCC 4.0.1 RC1 prerelease is available here: > >ftp://gcc.gnu.org/pub/gcc/prerelease-4.0.1-20050607/ > > Please test these tarballs, and let me know about showstoppers. s390(x)-ibm-linux is looking fine: http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg00583.html http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg00584.html Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: GCC 4.0.1 RC2
Mark Michell wrote: > GCC 4.0.1 RC2 is now available here: > >ftp://gcc.gnu.org/pub/gcc/prerelease-4.0.1-20050616 Still fine on s390(x): http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg01103.html http://gcc.gnu.org/ml/gcc-testresults/2005-06/msg01104.html Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: Do C++ signed types have modulo semantics?
Steven Bosscher wrote: > Anyway, I've started a SPEC run with "-O2" vs. "-O2 -fwrapv". Let's see > how big the damage would be ;-) Please make sure to include a 64-bit target, where it actually makes any difference. (I recall performance degradations of 20-30% in some SPECfp cases from getting induction variable reduction wrong ...) Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Bootstrap failure -- verify_ssa failed
Hello, with current GCC mainline bootstrap fails on s390(x)-ibm-linux during stage2 build with: ../../gcc-head/gcc/tree-ssa-operands.c: In function 'finalize_ssa_uses': ../../gcc-head/gcc/tree-ssa-operands.c:570: error: Definition in block 64 does not dominate use in block 13 for SSA_NAME: TMT.628_22 in statement: TMT.628_41 = PHI ; PHI argument TMT.628_22 for PHI node TMT.628_41 = PHI ; ../../gcc-head/gcc/tree-ssa-operands.c:570: internal compiler error: verify_ssa failed. Before I start debugging, does this ring any bells? Thanks, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: Bootstrap failure -- verify_ssa failed
Diego Novillo <[EMAIL PROTECTED]> wrote on 28.06.2005 20:51:52: > On Tue, Jun 28, 2005 at 08:38:13PM +0200, Ulrich Weigand wrote: > > Hello, > > > > with current GCC mainline bootstrap fails on s390(x)-ibm-linux > > during stage2 build with: > > > I'm on it. s390(x) bootstrap failures are now fixed, thanks! Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand Linux for S/390 Design & Development IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen Phone: +49-7031/16-3727 --- Email: [EMAIL PROTECTED]
Re: GCC 4.0.1 RC3
Mark Mitchell wrote: > GCC 4.0.1 RC3 is now available here: > >ftp://gcc.gnu.org/pub/gcc/prerelease-4.0.1-20050702/ > > With luck, this will be the last 4.0.1 release candidate. > > Please do download tarballs from the above URL and confirm that they > work OK on your systems. s390(x)-ibm-linux is still looking good: http://gcc.gnu.org/ml/gcc-testresults/2005-07/msg00182.html http://gcc.gnu.org/ml/gcc-testresults/2005-07/msg00183.html Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: GCC 4.0.2 RC1 Available
Mark Mitchell wrote: > It's important to test the actual tarballs, rather than CVS, to check > for any packaging issues. If you can, download and build the tarballs, > post test results to the gcc-testresults mailing list with and > contrib/test_summary. If you encounter problems, please file them in > bugzilla, and add me to the CC: list. s390(x)-ibm-linux are looking good: http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00688.html http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00689.html Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: GCC 4.0.2 RC2
Mark Mitchell wrote: > Please test, post test results to gcc-testresults, and send me an email > pointing at the results. s390(x)-ibm-linux is still fine: http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00883.html http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00884.html In fact, as far as I can recall, 4.0.2 will be the first ever GCC release with zero testsuite FAILs across all languages on s390-ibm-linux ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: GCC 4.0.2 Released
Mark Mitchell wrote: > GCC 4.0.2 has been released. Results on s390(x)-ibm-linux are here: http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg01323.html http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg01324.html Unfortunately, it is not zero-FAIL after all; at the last minute this one appears to have crept in: FAIL: g++.dg/template/array14.C (test for excess errors) The error is: array14.C: In function 'void t(int)': array14.C:9: error: invalid lvalue in unary '&' Any idea what this is all about? Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Aliasing violation generated by fold_builtin_memcmp?
Hello, in debugging the remaining Ada failures on s390x, I've come to what looks a bug in the middle end. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19382 for details. In short, the problem appears to be this code in fold_builtin_memcmp: /* If len parameter is one, return an expression corresponding to (*(const unsigned char*)arg1 - (const unsigned char*)arg2). */ if (host_integerp (len, 1) && tree_low_cst (len, 1) == 1) { tree cst_uchar_node = build_type_variant (unsigned_char_type_node, 1, 0); tree cst_uchar_ptr_node = build_pointer_type (cst_uchar_node); tree ind1 = fold_convert (integer_type_node, build1 (INDIRECT_REF, cst_uchar_node, fold_convert (cst_uchar_ptr_node, arg1))); tree ind2 = fold_convert (integer_type_node, build1 (INDIRECT_REF, cst_uchar_node, fold_convert (cst_uchar_ptr_node, arg2))); return fold_build2 (MINUS_EXPR, integer_type_node, ind1, ind2); } This generates code accessing the data pointed to by the arguments using a synthesized 'const unsigned char' type. This is only OK if that type is allowed to alias whatever type the arguments originally had. Now this is not an issue in the C family of languages, due to the special case of 'char' / 'signed char' / 'unsigned char' being explicitly allowed to alias every other type. However, when building for some *other* language, this assumption is not correct -- e.g. in the Ada test case in PR 19382, the type synthesized here is completely unknown to the front end and gets assigned a default alias set allowing it to alias *nothing* else. (Note that even for Ada, fold_builtin_memcmp *will* be called, e.g. from gimplify.c:gimplify_variable_sized_compare.) Any suggestions how to fix this? Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: GCC 4.0.2 Released
Mark Mitchell wrote: > No, that's very weird; that was PR 23993, which I fixed. Or, thought I > did. It's definitely fixed for me on x86_64-unknown-linux-gnu. But, > the nature of the bug didn't seem at all system-dependent. I've checked > that I have no local patches in my GCC 4.0.x tree. So, I'm pretty > mystified... Well, current 4_0 branch fixes the problem for me too. It appears that the test case made it into the release, but the fix didn't ... Comparing the cp/ChangeLog files from 4.0.2 and the 4_0 branch, it looks like the fix is in the release according to the current ChangeLog, but in fact it wasn't: [EMAIL PROTECTED] fsf]$ diff -u gcc-4.0.2/gcc/cp/ChangeLog gcc-4_0/gcc/cp/ChangeLog --- gcc-4.0.2/gcc/cp/ChangeLog 2005-09-21 05:56:51.0 +0200 +++ gcc-4_0/gcc/cp/ChangeLog2005-09-30 12:40:52.0 +0200 @@ -1,7 +1,19 @@ -2005-09-20 Release Manager +2005-09-29 Jakub Jelinek <[EMAIL PROTECTED]> + + PR c++/21983 + * class.c (find_final_overrider): Move diagnostic about no unique final + overrider to... + (update_vtable_entry_for_fn): ... here. + +2005-09-27 Release Manager * GCC 4.0.2 released. +2005-09-21 Mark Mitchell <[EMAIL PROTECTED]> + + PR c++/23993 + * init.c (integral_constant_value): Use DECL_INTEGRAL_CONSTANT_VAR_P. + 2005-09-16 Mark Mitchell <[EMAIL PROTECTED]> PR c++/23914 -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
[RFC] Fortran ICE caused by array addressability issue
Hello, I've been investigating the failure of namelist_14.f90 -O0, the last s390x-ibm-linux Fortran regression. A much reduced test case is: program test type :: mt integer:: ii(4) end type mt type(mt) :: t t = mt ((/1,1,1,1/)) end program test Compiling this program without optimization fails with test.f90:9: internal compiler error: in expand_assignment, at expr.c:3929 (Note that *with* optimization everything is fine.) The ICE in expand_assignment results from expanding the following line of code (from test.f90.t24.fixupcfg): mt.0.ii[S.6] = D.557; "mt.0" is a temporary of type "mt" generated by the gimplifier. As the size of mt is 16, the expander uses TImode to hold the variable. The ICE occurs because the expander tries to use a *register* to hold that variable, and indexing a *variable* slot inside an array held in a register is not supported by expand_assignment. The decision whether to use a register or a stack slot is made in the function use_register_for_decl (function.c): bool use_register_for_decl (tree decl) { /* Honor volatile. */ if (TREE_SIDE_EFFECTS (decl)) return false; /* Honor addressability. */ if (TREE_ADDRESSABLE (decl)) return false; /* Only register-like things go in registers. */ if (DECL_MODE (decl) == BLKmode) return false; /* If -ffloat-store specified, don't put explicit float variables into registers. */ /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa propagates values across these stores, and it probably shouldn't. */ if (flag_float_store && FLOAT_TYPE_P (TREE_TYPE (decl))) return false; /* If we're not interested in tracking debugging information for this decl, then we can certainly put it in a register. */ if (DECL_IGNORED_P (decl)) return true; return (optimize || DECL_REGISTER (decl)); } As the variable is a temporary generated by the gimplifier, it has its DECL_IGNORED_P flag set, which is why use_register_for_decl decides to put it into a register, causing the ICE. I was unable to reproduce this in C, because I don't get such temporaries there. Now, the interesting point is that when opimization is on, this routine would return true anyway, so why does it work then? This is because then the TREE_ADDRESSABLE flag is set, so it will always be placed onto the stack. However, the routine mark_array_ref_addressable (tree-cfg.c) which sets the TREE_ADDRESSABLE flag is only ever called from the routine execute_cleanup_cfg_post_optimizing (tree-optimize.c), which is never called when not optimizing ... Now, I'm not sure how to fix this. One fix would be to always make sure mark_array_ref_addressable is called, even when not optimizing. I don't know what the right call point would be. The other fix would be to remove the special treatment of DECL_IGNORED_P from use_register_for_decl. This may cause somewhat worse code generation, but only in the -O0 case, which I'm not particularly concerned about ... Any suggestions what the proper fix should be? Thanks, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: Link-time optimzation
Ian Lance Taylor wrote: > In section 3.4 (Linker) I have the same comment: for non-GNU targets, > the native linker is sometimes required, so modifying the linker > should not be a requirement. And the exact handling of .a files is > surprisingly target dependent, so while it would be easy to code an > archive searcher in gcc, it would be tedious, though doable, to get it > right for all platforms. > > Conversely, I don't know much we are going to care about speed here, > but I assume that we are going to care a bit. For the linker to > determine which files to pull in from an archive, it is going to have > to read the symbol tables of all the input objects, and it is going to > have to read the archive symbol table, and it is going to have to read > the symbols table of each object included from an archive. It will > have to build a symbol hash table as it goes along. This isn't fast; > it's a significant component of link time. Since the compiler is also > going to have to build a symbol hash table, it is going to be faster > to have the compiler search the archive symbol table and decide which > objects to pull in. Searching an archive symbol table isn't hard; the > target dependencies come in when deciding which objects to include. I'm wondering whether we can't simply employ the linker to handle all those issues: Have the driver always (not just as fall-back) call "ld -r" and the linker will pull together all input files, including those from archives, and combine them into one single object file. Then invoke the new "link-optimizer" on that single object file, resulting in an optimized object file. Any reasons why this cannot work? Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: LEGITIMIZE_RELOAD_ADDRESS vs address_reloaded
Hi Alan, the problem you point out is really just the tip of the iceberg -- currently reload frequently decides it *will* perform an action in the future, and assumes that other code will anyway behave as if that action had already been completed ... This is why e.g. the 'm' constraint accepts *any* MEM without checking whether the address is valid. Extra memory constraint checking needs to similarly account for future reload actions currently ... > a) Before matching constraints in find_reloads, substitute dummy regs > for any reloads that have been identified. I'm not sure how much work > is involved in doing this, or whether it is even possible. It sounds > like this would be the best solution technically, as then the output > of LEGITIMIZE_RELOAD_ADDRESS is properly checked. I'd really like to see something like this in the future -- first steps have been done in the reload-branch. This is unlikely to be fixable in the short term, though. > b) Modify LEGITIMIZE_RELOAD_ADDRESS to return a constraint letter that > the address is guaranteed to match after reloading. A bit of mechanical > work changing all targets. This is probably not general enough -- in LEGITIMIZE_RELOAD_ADDRESS you have no idea what the current insn is, or what constrains it can possibly accept -- if the result of LEGITIMIZE_RELOAD_ADDRESS can match multiple extra memory constraints, which one should you select? > c) Modify the ppc 'Z' constraint to match the indexed address reload > generates. This would rely on the pattern we generate in > LEGITIMIZE_RELOAD_ADDRESS never being generated elsewhere. This would fit in the spirit of how reload currently does things. Seeing as the 'temporary' address created by LEGITIMIZE_RELOAD_ADDRESS is non-canonical RTL and thus shouldn't occur elsewhere, such a test should be fine, in particular if done under if (reload_in_progress). > d) Hacks like the patch below, that effectively perform the reload > substitution with a dummy reg. I fear this isn't proper, even though it > seems to work.. I'm not sure this is safe, although I don't see right away why it would break either ... Overall, I'd tend to prefer something along the lines of (c), in particular as it would also catch the cases where LEGITIMIZE_RELOAD_ADDRESS isn't actually involved, as you note: > (*) This is exactly what code in find_reloads_address does on > encoutering invalid indexed address. The trouble is that its > transformation isn't valid until the reloads are done, and we check > constraints before doing the substitutions. :-( Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
Re: ppc-linux and s390-linux compilers requiring 64-bit HWI?
Jakub Jelinek wrote: > What's the reason why ppc-linux and s390-linux targetted GCC > requires 64-bit HWI these days? > If it is a native compiler, this means using long long in a huge part > of the compiler and therefore slower compile times and bigger memory > consumption. > ppc-linux configured compiler (nor s390-linux) isn't multilibbed and > ppc-linux only supports -m32 (s390-linux one apparently supports > both -m31 and -m64 on the GCC side, but without multilibs it isn't very > helpful). There are two reasons why we require 64-bit HWI on s390: we want to support -m64 (multilibs can be easily added), and 64-bit HWI simplifies constant handling significantly. There are multiple places in the s390 backend that rely on the size of HWI being > 32-bit, and would likely cause overflow problems otherwise ... Bye, Ulrich -- Dr. Ulrich Weigand Linux on zSeries Development [EMAIL PROTECTED]
fast_math_flags_set_p vs. set_fast_math_flags inconsistency?
Hello, the -ffast-math command line option sets a bunch of other flags internally, as implemented in set_fast_math_flags. It is possible to selectively override those flags on the command line as well. I'm now wondering under what circumstances the __FAST_MATH__ macro should still be defined. This is currently implemented in the fast_math_flags_set_p routine, which checks the status of *some* (but not all!) of the flags implied by -ffast-math. This has the effect that e.g. after -ffast-math -fno-finite-math-only the __FAST_MATH__ macro is no longer predefined, but after -ffast-math -fno-associative-math the __FAST_MATH__ macro still *is* predefined, even though both -ffinite-math-only and -fassociative-math are implied by -ffast-math. Is this deliberate? (If so, is it documented somewhere?) Or is this just a bug and fast_math_flags_set_p ought to check all flags implied by -ffast-math? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?
Joseph Myers wrote: > On Mon, 20 Jan 2020, Ulrich Weigand wrote: > > > This has the effect that e.g. after > > > > -ffast-math -fno-finite-math-only > > > > the __FAST_MATH__ macro is no longer predefined, but after > > > > -ffast-math -fno-associative-math > > > > the __FAST_MATH__ macro still *is* predefined, even though both > > -ffinite-math-only and -fassociative-math are implied by -ffast-math. > > > > Is this deliberate? (If so, is it documented somewhere?) > > I'd expect both to cause it to be undefined. My guess would be that some > past patch, after fast_math_flags_set_p was added, added more flags to > -ffast-math but accidentally failed to update fast_math_flags_set_p; you'd > need to look at past patches adding flags to -ffast-math to confirm if > it's accidental. It looks like there's multiple cases here. For the two flags -fassociative-math and -freciprocal-math, it seems to have happened just as you describe: they were created (split out of -funsafe-math-optimizations) in commit a1a826110720eda37c73f829daa4ee243ee953f5, which however did not update fast_math_flags_set_p. For the other three flags, -fsignaling-nans, -frounding-math, and -fcx-limited-range, the story appears to be a bit different: from the very beginning, those were treated somewhat differently: these are only modified as a result of -ffast-math, not -fno-fast-math (like the other flags derived from -ffast-math), and they are not considered by fast_math_flags_set_p. (I'm not sure if there is any particular reason why these should be treated differently here, though.) Finally, there is one "mixed" flag, -fexcess-precision, which is handled like the above three in that its default is only modified as a result of -ffast-math, not -fno-fast-math; but nevertheless this flag *is* checked in fast_math_flags_set_p. Do you think there is something that ought to be fixed here? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?
Joseph Myers wrote: > On Tue, 21 Jan 2020, Ulrich Weigand wrote: > > > It looks like there's multiple cases here. For the two flags > > -fassociative-math and -freciprocal-math, it seems to have happened just as > > you describe: they were created (split out of -funsafe-math-optimizations) > > in commit a1a826110720eda37c73f829daa4ee243ee953f5, which however did not > > update fast_math_flags_set_p. > > So that's a bug. OK, agreed. > > For the other three flags, -fsignaling-nans, -frounding-math, and > > -fcx-limited-range, the story appears to be a bit different: from the > > The first two of those are disabled by default as well as disabled by > -ffast-math, so it seems right that -fno-fast-math does nothing with them > and that they aren't checked by fast_math_flags_set_p. I see. I guess that makes me wonder what -fno-fast-math *ever* does (except canceling a -ffast-math earlier on the command line). Looking at the current code, -fno-fast-math (just like -ffast-math) only ever sets flags whose default is not overridden on the command line, but then it always sets them to their default value! Am I missing something here? If that's the intent, it might be cleaner to write set_fast_math_flags as just one big if (set) { } > The last one is disabled by default but enabled by -ffast-math. So it > would seem appropriate to handle it like other such options, disable it > with -fno-fast-math, and check it in fast_math_flags_set_p. OK. > > Finally, there is one "mixed" flag, -fexcess-precision, which is handled > > like the above three in that its default is only modified as a result of > > -ffast-math, not -fno-fast-math; but nevertheless this flag *is* checked > > in fast_math_flags_set_p. > > That one's trickier because the default depends on whether a C standards > conformance mode is specified. This also makes sense if we consider the semantics of -fno-fast-math to just leave all component flags at their default, as above ... (As an aside, the current code is even more confusing as it has a dead condition: if (set) { if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT) opts->x_flag_excess_precision = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT; The second test of "set" must always be true here, so this will never actually actively set the flag to EXCESS_PRECISION_DEFAULT.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?
Joseph Myers wrote: > On Mon, 27 Jan 2020, Ulrich Weigand wrote: > > I see. I guess that makes me wonder what -fno-fast-math *ever* does > > (except canceling a -ffast-math earlier on the command line). Looking > > at the current code, -fno-fast-math (just like -ffast-math) only ever > > sets flags whose default is not overridden on the command line, but > > then it always sets them to their default value! > > As a general principle, more specific flags take precedence over less > specific ones, regardless of the command-line order. So it's correct for > -ffast-math and -fno-fast-math not to do anything with a flag that was > explicitly overridden by the user (modulo any issues where a particular > combination of flags is unsupported by GCC, as with the > "%<-fassociative-math%> disabled; other options take precedence" case in > toplev.c). Sure, I agree with the intended behavior as you describe. I was just confused under what circumstances set_fast_math_flags with set == 0 ever does anything. However, I've now understood that this is required e.g. for the -Ofast -fno-fast-math case (this seems to be the only case where set_fast_math_flags is called twice, first with set == 1 and then with set == 0). And indeed, in this case we see the effect of not resetting the flag_cx_limited_range (on x86_64): $ gcc -E -dD -x c /dev/null -std=c99 | grep GCC_IEC #define __GCC_IEC_559 2 #define __GCC_IEC_559_COMPLEX 2 $ gcc -E -dD -x c /dev/null -std=c99 -Ofast | grep GCC_IEC #define __GCC_IEC_559 0 #define __GCC_IEC_559_COMPLEX 0 $ gcc -E -dD -x c /dev/null -std=c99 -Ofast -fno-fast-math | grep GCC_IEC #define __GCC_IEC_559 2 #define __GCC_IEC_559_COMPLEX 0 Note how __GCC_IEC_559_COMPLEX is not properly reset. Similarly, we see the effect of not resetting flag_excess_precision (only on i386): $ gcc -E -dD -x c /dev/null -m32 -std=c99 | grep GCC_IEC #define __GCC_IEC_559 2 #define __GCC_IEC_559_COMPLEX 2 $ gcc -E -dD -x c /dev/null -m32 -std=c99 -Ofast | grep GCC_IEC #define __GCC_IEC_559 0 #define __GCC_IEC_559_COMPLEX 0 $ gcc -E -dD -x c /dev/null -m32 -std=c99 -Ofast -fno-fast-math | grep GCC_IEC #define __GCC_IEC_559 0 #define __GCC_IEC_559_COMPLEX 0 Note how __GCC_IEC_559 is not properly reset either. So it seems to me that indeed both of these should be reset in the !set case of set_fast_math_flags. In addition, we've already agreed that these three flags should be checked in fast_math_flags_set_p, where they are currently missing: flag_associative_math flag_reciprocal_math flag_cx_limited_range Finally, this remaining piece of code in set_fast_math_flags: if (set) { if (!opts->frontend_set_flag_signaling_nans) opts->x_flag_signaling_nans = 0; if (!opts->frontend_set_flag_rounding_math) opts->x_flag_rounding_math = 0; } seems always a no-op since it only ever sets the flags to their default value when they still must have that same default value. For clarity, I'd suggest to either remove this code or replace it with a comment. The following patch (not yet fully tested) implements the above changes. Note that this now brings the set of flags set and reset by set_fast_math_flags completely in sync with the set of flags tested in fast_math_flags_set_p. Does this make sense to you? Thanks, Ulrich ChangeLog: * opts.c (set_fast_math_flags): In the !set case, also reset x_flag_cx_limited_range and x_flag_excess_precision. Remove dead code resetting x_flag_signaling_nans and x_flag_rounding_math. (fast_math_flags_set_p): Also test x_flag_cx_limited_range, x_flag_associative_math, and x_flag_reciprocal_math. diff --git a/gcc/opts.c b/gcc/opts.c index 7affeb4..4452793 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -2850,18 +2850,14 @@ set_fast_math_flags (struct gcc_options *opts, int set) opts->x_flag_finite_math_only = set; if (!opts->frontend_set_flag_errno_math) opts->x_flag_errno_math = !set; - if (set) -{ - if (opts->frontend_set_flag_excess_precision == EXCESS_PRECISION_DEFAULT) - opts->x_flag_excess_precision - = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT; - if (!opts->frontend_set_flag_signaling_nans) - opts->x_flag_signaling_nans = 0; - if (!opts->frontend_set_flag_rounding_math) - opts->x_flag_rounding_math = 0; - if (!opts->frontend_set_flag_cx_limited_range) - opts->x_flag_cx_limited_range = 1; -} + if (!opts->frontend_set_flag_cx_limited_range) +opts->x_flag_cx_limited_range = set; + if (!opts->frontend_set_flag_excess_precision) +opts->x_flag_excess_precision + = set ? EXCESS_PRECISION_FAST : EXCESS_PRECISION_DEFAULT; + + // -ffast-math should also reset -fsignaling-nans and -frounding-math, + // but since those are off by default, there
Re: fast_math_flags_set_p vs. set_fast_math_flags inconsistency?
Joseph Myers wrote: > On Tue, 28 Jan 2020, Ulrich Weigand wrote: > > > The following patch (not yet fully tested) implements the above changes. > > Note that this now brings the set of flags set and reset by > > set_fast_math_flags completely in sync with the set of flags > > tested in fast_math_flags_set_p. > > > > Does this make sense to you? > > It makes sense, but this is not a review of the patch. Understood, thanks for having a look! I'll go ahead and submit the patch for review as usual then. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: reload question about unmet constraints
Jim Wilson wrote: > On Mon, Sep 14, 2015 at 11:05 PM, DJ Delorie wrote: > > As a test, I added this API. It seems to work. I suppose there could > > be a better API where we determine if a constrain matches various > > memory spaces, then compare with the memory space of the operand, but > > I can't prove that's sufficiently flexible for all targets that > > support memory spaces. Heck, I'm not even sure what to call the > > macro, and > > "TARGET_IS_THIS_MEMORY_ADDRESS_RELOADABLE_TO_MATCH_THIS_CONTRAINT_P()" > > is a little long ;-) > > > > What do we think of this direction? > > We already have define_constraint and define_memory_constraint. We > could perhaps add a define_special_memory_constraint that returns > CT_SPECIAL_MEMORY which mostly operates like CT_MEMORY, except that it > doesn't assume any MEM can be reloaded to match. But the only difference between define_memory_constraint and a plain define_constraint is just that define_memory_constraint guarantees that any memory operand can be made valid by reloading the address into a base register ... If the set of operands accepted by a constraint does *not* have that property, it must not be defined via define_memory_constraint, and you should simply use define_constraint instead. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: reload question about unmet constraints
Jim Wilson wrote: > On Tue, Sep 15, 2015 at 7:42 AM, Ulrich Weigand wrote: > > But the only difference between define_memory_constraint and a plain > > define_constraint is just that define_memory_constraint guarantees > > that any memory operand can be made valid by reloading the address > > into a base register ... > > > > If the set of operands accepted by a constraint does *not* have that > > property, it must not be defined via define_memory_constraint, and > > you should simply use define_constraint instead. > > An invalid near mem can be converted to a valid near mem by reloading > its address into a base reg. An invalid far mem can be converted to a > valid far mem by reloading its address into a base reg. But one can't > convert a near mem to a far mem by reloading the address, nor can one > convert a far mem to a near mem by reloading its address. So we need > another dimension to the validity testing here, besides the question > of whether the address can be reloaded, there is the question of > whether it is in the right address space. Though I don't think the > rl78 is actually using address spaces, and it isn't clear if that > would help. I see. Is it correct then to say that reload will never be able to change a near mem into a far mem or vice versa? If that is true, there doesn't appear to be any real benefit to having both near and far mem operations as *alternatives* to the same insn pattern. In that case, you might be able to fix the bug by splitting the offending insns into two patterns, one only handling near mems and one handling one far mems, where the near/far-ness of the mem is verified by the *predicate* and not the constraints. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: reload question about unmet constraints
DJ Delorie wrote: > > In that case, you might be able to fix the bug by splitting the > > offending insns into two patterns, one only handling near mems > > and one handling one far mems, where the near/far-ness of the mem > > is verified by the *predicate* and not the constraints. > > But this means that when reload needs to, it moves far mems into > registers, which changes which insn is matched... It also adds a > *lot* of new patterns, since any of the three operands can be far, and > '0' constraints on far are allowed also - and most insns allow far > this way, so could be up to seven times as many patterns. I still think you should get by with two patterns, basically one that represents the case where you emit an override, and one for the case without override. And in fact, you should be able to decide at *expand* time which of the two you need for the given set of operands. It may be that you'll have to load a far mem into a register here, at expand time, if none of the two cases would otherwise match. This is best done at expand, since there's really nothing reload can do any better even if you wait until then. Then you need to ensure that reload will not accidentally switch that decision again, so you'll probably need constraints after all. Something like this ought to work: (define_insn "*add3_virt_far" [(set (match_operand:QHI 0 "rl78_nonimmediate_operand" "=Wfr,v,v") (plus:QHI (match_operand:QHI 1 "rl78_general_operand" "vi0,Wfr,vi") (match_operand:QHI 2 "rl78_general_operand" "vi0,vi1,Wfr"))) ] "rl78_virt_insns_ok () && rl78_far_insn_p (operands)" "v.add\t%0, %1, %2" ) (define_insn "*add3_virt_near" [(set (match_operand:QHI 0 "rl78_nonfar_nonimm_operand" "=vY,S") (plus:QHI (match_operand:QHI 1 "rl78_nonfar_operand" "viY,0") (match_operand:QHI 2 "rl78_nonfar_operand" "viY,i"))) ] "rl78_virt_insns_ok ()" "v.add\t%0, %1, %2" ) [It might be possible to emit both patterns from a single macro.] rl78_far_insn_p needs to be a function that returns true if there is exactly one, possibly duplicated, operand that is a far mem. The Wfr constraint must not be marked as memory constraint (so as to avoid reload attmpting to use it to access a stack slot). But the Y constraint *can* be marked as memory constraint. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: reload question about unmet constraints
DJ Delorie wrote: > > And in fact, you should be able to decide at *expand* time which > > of the two you need for the given set of operands. > > I already check for multiple fars at expand, and force all but one of > them to registers. OK, that's good. > Somewhere before reload they get put back in. To avoid that, you need a check like this: > >"rl78_virt_insns_ok () && rl78_far_insn_p (operands)" > > Since when does this work reliably? I've seen cases where insns get > mashed together without regard for validity before... > > I tested just this change - adding that function to addhi3 plus the > Wfr constraint sets - and it seems to work. The big question to me > now is - is this *supposed* to work this way? Or is is a coincidence > that the relevent passes happen to check that function? This certainly supposed to work reliably. Every pass that changes an insn is supposed to re-recognize it, and that will check the operand predicates as well as the overall insn predicate. Note that there are some exceptions, in particular reload, which will make certain types of changes to insn operands *without* fully revalidating the insn, because it simply assumes those changes are safe. That's why in addition to the insn predicate, you also need to constrain reload's possible actions to those that are guaranteed to not break the predicate. > > The Wfr constraint must not be marked as memory constraint (so as to > > avoid reload attempting to use it to access a stack slot). > > This also prevents reload from reloading the address when it *is* > needed. However, it seems to work ok even as a memory constraint. Is > this change *just* because of the stack slots? Could you give an > example of how it could be misused, so I can understand the need? If the constraint is marked as "memory constraint", reload will assume that everything that is forced to memory by reload can be handled by this alternative. This includes: - a pseudo that was forced to a stack slot - a pseudo that is REG_EQUIV to some memory location - any memory operand whose address already had to be reloaded since it was not legitimate - any immediate that was forced to the constant pool (and possibly others that I'm not thinking of right now). So in general, it's really not safe to mark a constraint that accepts only far memory as "memory constraint" with current reload. Note that *not* marking the constraint as memory constraint actually does not prevent reload from fixing up illegitimate addresses, so you shouldn't really see much drawbacks from not marking it ... (You really need the memory constraint marker for constraints that accept only a *subset* of legitimate addresses, so that reload will attempt to reload even addresses that would otherwise be considered legitimate.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[commit, spu] Re: [BUILDROBOT] spu: left shift of negative value
Jan-Benedict Glaw wrote: > I just noticed that (for config_list.mk builds), current GCC errors > out at spu.c, see eg. build > http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=3D469639 : > > g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-excep= > tions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrit= > e-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -peda= > ntic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -f= > no-common -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. = > -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/c= > farm/mpc/include -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../= > libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace -o = > spu.o -MT spu.o -MMD -MP -MF ./.deps/spu.TPo ../../../gcc/gcc/config/spu/sp= > u.c > ../../../gcc/gcc/config/spu/spu.c: In function =E2=80=98void spu_expand_ins= > v(rtx_def**)=E2=80=99: > ../../../gcc/gcc/config/spu/spu.c:530:46: error: left shift of negative val= > ue [-Werror=3Dshift-negative-value] >maskbits =3D (-1ll << (32 - width - start)); > ^ > ../../../gcc/gcc/config/spu/spu.c:536:46: error: left shift of negative val= > ue [-Werror=3Dshift-negative-value] >maskbits =3D (-1ll << (64 - width - start)); > ^ > cc1plus: all warnings being treated as errors > Makefile:2092: recipe for target 'spu.o' failed > make[2]: *** [spu.o] Error 1 > make[2]: Leaving directory '/home/jbglaw/build-configlist_mk/spu-elf/build-= > gcc/mk/spu-elf/gcc' I've now checked in the following fix. Thanks, Ulrich ChangeLog: * config/spu/spu.c (spu_expand_insv): Avoid undefined behavior. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 227968) --- gcc/config/spu/spu.c(working copy) *** spu_expand_insv (rtx ops[]) *** 472,478 { HOST_WIDE_INT width = INTVAL (ops[1]); HOST_WIDE_INT start = INTVAL (ops[2]); ! HOST_WIDE_INT maskbits; machine_mode dst_mode; rtx dst = ops[0], src = ops[3]; int dst_size; --- 472,478 { HOST_WIDE_INT width = INTVAL (ops[1]); HOST_WIDE_INT start = INTVAL (ops[2]); ! unsigned HOST_WIDE_INT maskbits; machine_mode dst_mode; rtx dst = ops[0], src = ops[3]; int dst_size; *** spu_expand_insv (rtx ops[]) *** 527,541 switch (dst_size) { case 32: ! maskbits = (-1ll << (32 - width - start)); if (start) ! maskbits += (1ll << (32 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 64: ! maskbits = (-1ll << (64 - width - start)); if (start) ! maskbits += (1ll << (64 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 128: --- 527,541 switch (dst_size) { case 32: ! maskbits = (~(unsigned HOST_WIDE_INT)0 << (32 - width - start)); if (start) ! maskbits += ((unsigned HOST_WIDE_INT)1 << (32 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 64: ! maskbits = (~(unsigned HOST_WIDE_INT)0 << (64 - width - start)); if (start) ! maskbits += ((unsigned HOST_WIDE_INT)1 << (64 - start)); emit_move_insn (mask, GEN_INT (maskbits)); break; case 128: -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Debugger support for __float128 type?
Hello, I've been looking into supporting __float128 in the debugger, since we're now introducing this type on PowerPC. Initially, I simply wanted to do whatever GDB does on Intel, but it turns out debugging __float128 doesn't work on Intel either ... The most obvious question is, how should the type be represented in DWARF debug info in the first place? Currently, GCC generates on i386: .uleb128 0x3# (DIE (0x2d) DW_TAG_base_type) .byte 0xc # DW_AT_byte_size .byte 0x4 # DW_AT_encoding .long .LASF0 # DW_AT_name: "long double" and .uleb128 0x3# (DIE (0x4c) DW_TAG_base_type) .byte 0x10# DW_AT_byte_size .byte 0x4 # DW_AT_encoding .long .LASF1 # DW_AT_name: "__float128" On x86_64, __float128 is encoded the same way, but long double is: .uleb128 0x3# (DIE (0x31) DW_TAG_base_type) .byte 0x10# DW_AT_byte_size .byte 0x4 # DW_AT_encoding .long .LASF0 # DW_AT_name: "long double" Now, GDB doesn't recognize __float128 on either platform, but on i386 it could at least in theory distinguish the two via DW_AT_byte_size. But on x86_64 (and also on powerpc), long double and __float128 have the identical DWARF encoding, except for the name. Looking at the current DWARF standard, it's not really clear how to make a distinction, either. The standard has no way to specifiy any particular floating-point format; the only attributes for a base type of DW_ATE_float encoding are related to the size. (For the Intel case, one option might be to represent the fact that for long double, there only 80 data bits and the rest is padding, via some combination of the DW_AT_bit_size and DW_AT_bit_offset or DW_AT_data_bit_offset attributes. But that wouldn't help for PowerPC since both long double and __float128 really use 128 data bits, just different encodings.) Some options might be: - Extend the official DWARF standard in some way - Use a private extension (e.g. from the platform-reserved DW_AT_encoding value range) - Have the debugger just hard-code a special case based on the __float128 name Am I missing something here? Any suggestions welcome ... B.t.w. is there interest in fixing this problem for Intel? I notice there is a GDB bug open on the issue, but nothing seems to have happened so far: https://sourceware.org/bugzilla/show_bug.cgi?id=14857 Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Debugger support for __float128 type?
Joseph Myers wrote: > On Wed, 30 Sep 2015, Ulrich Weigand wrote: > > > - Extend the official DWARF standard in some way > > I think you should do this. > > Note that TS 18661-4 will be coming out very soon, and includes (optional) > types > > * _FloatN, where N is 16, 32, 64 or >= 128 and a multiple of 32; > > * _DecimalN, where N >= 32 and a multiple of 32; > > * _Float32x, _Float64x, _Float128x, _Decimal64x, _Decimal128x > > so this is not simply a matter of supporting a GNU extension (not that > it's simply a GNU extension on x86_64 anyway - __float128 is explicitly > mentioned in the x86_64 ABI document), but of supporting an ISO C > extension, in any case where one of the above types is the same size and > radix as float / double / long double but has a different representation. Ah, thanks for pointing these out! The _DecimalN types are already supported by DWARF using a base type with encoding DW_ATE_decimal_float and the appropriate DW_AT_byte_size. For the interchange type, it seems one could define a new encoding, e.g. DW_ATE_interchange_float, and use this together with the appropriate DW_AT_byte_size to identify the format. However, for the extended types, the standard does not define a storage size or even a particular encoding, so it's not quite clear how to handle those. In theory, two different extended types could even have the same size ... On the other hand, in practice on current systems, it will probably be true that if you look at the set of all of the basic and extended (binary) types, there will be at most two different encodings for any given size, one corresponding to the interchange format of that size, and one other; so mapping those to DW_ATE_float vs. DW_ATE_interchange_float might suffice. I'm not sure how to handle an extended decimal format that does not match any of the decimal interchange formats. Does this occur in practice at all? > (All the above are distinct types in C, and distinct from float, double, > long double even if the representations are the same. But I don't think > DWARF needs to distinguish e.g. float and _Float32 other than by their > name - it's only the case of different representations that needs > distinguishing. The _Float* and _Float*x types have corresponding complex > types, but nothing further should be needed in DWARF for those once you > can represent _Float*.) Well, complex types have their own encoding (DW_ATE_complex_float), so we'd have to define the corresponding variants for those as well, e.g. DW_ATE_complex_interchange_float or the like. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Debugger support for __float128 type?
Joseph Myers wrote: > On Thu, 1 Oct 2015, Ulrich Weigand wrote: > > > The _DecimalN types are already supported by DWARF using a base type with > > encoding DW_ATE_decimal_float and the appropriate DW_AT_byte_size. > > Which doesn't actually say whether the DPD or BID encoding is used, but as > long as each architecture uses only one that's not a problem in practice. I see. Well, one could add a DW_ATE_decimal_interchange_float for completeness, if necessary. > > For the interchange type, it seems one could define a new encoding, > > e.g. DW_ATE_interchange_float, and use this together with the > > appropriate DW_AT_byte_size to identify the format. > > It's not clear to me that (for example) distinguishing float and _Float32 > (other than by name) is useful in DWARF (and if you change float from > DW_ATE_float to DW_ATE_interchange_float that would affect old debuggers - > is the idea to use DW_ATE_interchange_float only for the new types, not > for old types with the same encodings, so for _Float32 but not float?). > But it's true that if you say it's an interchange type then together with > size and endianness that uniquely determines the encoding. So my thinking here was: today, DWARF deliberately does not specify the details of the floating-point encoding format, so that it doesn't have to get into all the various formats that exist on all the platforms supported by DWARF. That is why a DW_ATE_float encoding simply says; this is a floating-point number of size N encoded as defined by the platform ABI. The new DW_ATE_interchange_float encoding would say instead; this is a floating-point number of size N encoded as defined by the IEEE interchange format. On platforms where the ABI-defined format actually *is* the interchange format, a DWARF producer would be free to use either DW_ATE_float or DW_ATE_interchange_float. This decision could of course take into consideration compatibility requirements with older debuggers etc. However, having two encoding definitions would allow platforms to use both the interchange format and one additional platform-defined non-interchange format of the same size, if needed. > > Well, complex types have their own encoding (DW_ATE_complex_float), so we'd > > have to define the corresponding variants for those as well, e.g. > > DW_ATE_complex_interchange_float or the like. > > And DW_ATE_imaginary_interchange_float, I suppose. Right. As an alternative to specifying the well-defined interchange format, another option might be to simply add a second DWARF attribute, e.g. DW_AT_encoding_variant, to floating-point and related base types. This would simply be an integer with platform-specific semantics. So DWARF producers could simply describe a type as: this is a floating-point number of size N encoded as defined by platform ABI encoding variant #V (If the attribute isn't present, we'd default to variant 0, which is just the current encoding.) This would allow an arbitrary number of platform-specific encodings, any of which might or might not be IEEE-defined formats ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Debugger support for __float128 type?
Joseph Myers wrote: > On Fri, 2 Oct 2015, Ulrich Weigand wrote: > > I see. Well, one could add a DW_ATE_decimal_interchange_float for > > completeness, if necessary. > > Since both DPD and BID are interchange encodings, that doesn't actually > determine things without some way to say which was used (that is, both > DW_ATE_decimal_float and DW_ATE_decimal_interchange_float would rely on > platform-specific information to determine the format). I don't know if > DW_ATE_decimal_float is being used anywhere for something that's not an > interchange format. Ah, yes. I missed that both DPD and BID are defined as interchange formats. This suggestion doesn't make sense then ... > > As an alternative to specifying the well-defined interchange format, > > another option might be to simply add a second DWARF attribute, > > e.g. DW_AT_encoding_variant, to floating-point and related base types. > > This would simply be an integer with platform-specific semantics. > > So DWARF producers could simply describe a type as: > > this is a floating-point number of size N encoded as defined by > > platform ABI encoding variant #V > > Do you want entirely platform-specific semantics? Or would it be better > to define standard values to mean it's an IEEE interchange format (or, for > decimal floating point, to specify whether it's DPD or BID), plus space > for future standard values and space for platform-specific values? Hmm, I had been thinking of leaving that entirely platform-specific. I guess one could indeed define some values with well-defined standard semantics; that would assume DWARF would want to start getting into the game of defining floating-point formats -- not sure what the position of the committee would be on this question ... [ Back when DW_ATE_decimal_float was added, the initial proposal did indeed specify the encoding should follow IEEE-754R, but that was removed when the proposal was actually added to the standard. ] > Would existing consumers safely ignore that attribute (so that producers > could safely specify IEEE interchange encoding for float, double etc. if > applicable, without breaking existing consumers)? Yes, existing consumers should simply ignore attributes they are not aware of. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: reload question about unmet constraints
DJ Delorie wrote: > For such constraints that are memory operands but not > define_memory_constraint, you need to use '*' to keep reload from > trying to guess a register class from them (it guesses wrong for > rl78). > > I.e. use "*Wfr" instead of "Wfr". Huh? That seems weird. It should make no difference for purposes of register class preferences whether a constraint is marked as memory constraint or extra constraint: neither contributes to register class preferences at all. If the '*' makes any difference, I guess this can only be because IRA chooses another alternative for the insn to begin with. Do you have an example that shows the problem? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: "cc" clobber
David Wohlferd wrote: > On 1/26/2016 4:31 PM, Bernd Schmidt wrote: > > On 01/27/2016 12:12 AM, David Wohlferd wrote: > >> I started by just commenting out the code in ix86_md_asm_adjust that > >> unconditionally clobbered the flags. I figured this would allow the > >> 'normal' "cc" handling to occur. But apparently there is no 'normal' > >> "cc" handling. > > > > I have a dim memory that there's a difference between the "cc" and > > "CC" spellings. You might want to check that. > > I checked, but my scan of the current code isn't turning up anything for > "CC" related to clobbers either. > > While presumably "cc" did something at one time, apparently now it's > just an unenforced comment (on extended asm). Not a problem, just a bit > of a surprise. I think on many targets a clobber "cc" works because the backend actually defines a register named "cc" to correspond to the flags. Therefore the normal handling of clobbering named hard registers catches this case as well. This doesn't work on i386 because there the flags register is called "flags" in the back end. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: GCC libatomic ABI specification draft
Torvald Riegel wrote: > On Fri, 2016-12-02 at 12:13 +0100, Gabriel Paubert wrote: > > On Thu, Dec 01, 2016 at 11:13:37AM -0800, Bin Fan at Work wrote: > > > Thanks for the comment. Yes, the ABI requires libatomic must query the > > > hardware. This is > > > necessary if we want the compiler to generate inlined code for 16-byte > > > atomics. Note that > > > this particular issue only affects x86. > > > > Why? Power (at least recent ones) has 128 bit atomic instructions > > (lqarx/stqcx.) and Z has 128 bit compare and swap. > > That's not the only factor affecting whether cmpxchg16b or such is used > for atomics. If the HW just offers a wide CAS but no wide atomic load, > then even an atomic load is not truly just a load, which breaks (1) > atomic loads on read-only mapped memory and (2) volatile atomic loads > (unless we claim that an idempotent store is like a load, which is quite > a stretch for volatile I think). I may have missed the context of the discussion, but just on the specific ISA question here: both Power and z not only have the 16-byte CAS (or load-and-reserve/store-conditional), but they also both have specific 16-byte atomic load and store instructions (lpq/stpq on z, lq/stq on Power). Those are available on any system supporting z/Architecture (z900 and up), and on any Power system supporting the V2.07 ISA (POWER8 and up). GCC does in fact use those instructions to implement atomic operations on 16-byte data types on those machines. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: GCC libatomic ABI specification draft
Szabolcs Nagy wrote: > On 20/12/16 13:26, Ulrich Weigand wrote: > > I may have missed the context of the discussion, but just on the > > specific ISA question here: both Power and z not only have the > > 16-byte CAS (or load-and-reserve/store-conditional), but they also both > > have specific 16-byte atomic load and store instructions (lpq/stpq > > on z, lq/stq on Power). > > > > Those are available on any system supporting z/Architecture (z900 and up), > > and on any Power system supporting the V2.07 ISA (POWER8 and up). GCC > > does in fact use those instructions to implement atomic operations on > > 16-byte data types on those machines. > > that's a bug. > > at least i don't see how gcc makes sure the libatomic > calls can interoperate with inlined atomics. Hmm, interesting. On z, there is no issue with ISA levels, since *all* 64-bit platforms support the 16-byte atomics (and on non-64-bit platforms, 16-byte data types are not supported at all). However, there still seems to be a problem, but this time related to alignment issues. We do have the 16-byte atomic instructions, but they only work on 16-byte aligned data. This is a problem in particular since the default alignment of 16-byte data types is still 8 bytes on our platform (since the ABI only guarantees 8-byte stack alignment). That's why the libatomic configure check thinks it cannot use the atomic instructions when building on z, and generates code that uses the separate lock. However, *if* a particular object can be proven by the compiler to be 16-byte aligned, it will emit the inline atomic instruction. This means there is indeed a bug if that same object is also operated on via the library routine. Andreas suggested that the best way to fix this would be to add a runtime alignment check to the libatomic routines and also use the atomic instructions in the library whenever the object actually happens to be correctly aligned. It seems that this should indeed fix the problem (and also use the most efficient way in all cases). Not sure about Power -- adding David and Segher on CC ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Issue with sub-object __builtin_object_size
Hello, the following C++ test case: struct pollfd { int fd; short int events; short int revents; }; struct Pollfd : public pollfd { }; struct Pollfd myfd[10]; int test (void) { return __builtin_object_size ((struct pollfd *)myfd, 1); } ends up returning 8 from the "test" routine, not 80. In the real-world application this test case was extracted from, this causes a call: poll(myfd, count, 0); // 1 < count < 10 to fail with a "Buffer overflow detected" message at run-time when building with _FORTIFY_SOURCE = 2 against glibc. [ Here, there is no explicit cast, but it is implied by the prototype of the "poll" routine. ] (Note that in the real-world application, the derived struct Pollfd has some member functions to construct and pretty-print the structure, but has no additional data members.) >From the __builtin_object_size documentation, it's not immediately clear to me whether this is supposed to work or not: If the least significant bit is clear, objects are whole variables, if it is set, a closest surrounding subobject is considered the object a pointer points to. Is the presence of the above cast (explicit or implicit) supposed to modify the notion of "closest surrounding subobject"? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Issue with attribute ((aligned)) ?
Hello, in comparing ABI conformance across compilers I ran across an issue where GCC's behavior does appear to deviate from the manual. In http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html, we have: aligned (alignment) This attribute specifies a minimum alignment for the variable or structure field, measured in bytes. [snip] When used on a struct, or struct member, the aligned attribute can only increase the alignment; in order to decrease it, the packed attribute must be specified as well. However, the following test case: struct test { int * __attribute__ ((aligned (2))) a; }; int main (void) { printf ("align: %d\n", (int)__alignof__(struct test)); } outputs an alignment of 2. Based on the above wording, I would have expected an alignment of 8 on a 64-bit machine. (And the latter is indeed what LLVM, for example, implements.) Note that this appears to happen only in the specific declaration above; things work as documented when using e.g. int __attribute__ ((aligned (2))) a; or even int *a __attribute__ ((aligned (2))); In fact, this also raises the question what the exact semantics of the above construct is; according to the manual in seems that int * __attribute__ ((aligned (2))) a; ought to define a (default-aligned) pointer to a 2-aligned int, in which case the above distinction would be moot and we'd expect __alignof__(struct test) to be 8 in any case. (see http://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html) That does't appear to be what's happening either; with struct test x; printf ("align: %d %d\n", (int)__alignof__(x.a), (int)__alignof__(*x.a)); I get 2 for __alignof__(x.a) and 4 for __alignof__(*x.a), so it does appear the alignment is being applied (if incorrectly) to the pointer, not the pointed-to data ... (LLVM actually also does it this way.) Is this a bug in GCC (or the documentation)? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Issue with sub-object __builtin_object_size
Jakub Jelinek wrote: > On Tue, Jun 10, 2014 at 07:37:50PM +0200, Ulrich Weigand wrote: > > the following C++ test case: > > > > struct pollfd > > { > > int fd; > > short int events; > > short int revents; > > }; > > > > struct Pollfd : public pollfd { }; > > > > struct Pollfd myfd[10]; > > > > int test (void) > > { > > return __builtin_object_size ((struct pollfd *)myfd, 1); > > } > > > > ends up returning 8 from the "test" routine, not 80. > > The thing is that the C++ FE transforms this kind of cast to > &((struct Pollfd *) &myfd)->D.2233 > so for middle-end where __builtin_object_size is evaluated this > is like: > struct Pollfd { struct pollfd something; }; > struct Pollfd myfd[10]; > int test (void) > { > return __builtin_object_size (&myfd[0].something, 1); > } > and returning 8 in that case is completely correct. > So the question is why instead of a simple cast it created > ADDR_EXPR of a FIELD_DECL instead. Sorry for the long delay, I finally found time to look into this problem again. This behavior of the C++ front-end seems to trace back to a deliberate change by Jason here: https://gcc.gnu.org/ml/gcc-patches/2004-04/msg02000.html "This patch changes the representation to b..i, which is more friendly to alias analysis." It still seems the effect of this on __builtin_object_size is surprising and probably not what was intended. I'm not sure whether the front-end or the middle-end is more "at fault" here. Thoughts? Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Issue with sub-object __builtin_object_size
Jason Merrill wrote: > On 09/15/2014 11:21 AM, Ulrich Weigand wrote: > > Jakub Jelinek wrote: > >> On Tue, Jun 10, 2014 at 07:37:50PM +0200, Ulrich Weigand wrote: > >>> the following C++ test case: > >>> > >>> struct pollfd > >>>{ > >>> int fd; > >>> short int events; > >>> short int revents; > >>>}; > >>> > >>> struct Pollfd : public pollfd { }; > >>> > >>> struct Pollfd myfd[10]; > >>> > >>> int test (void) > >>> { > >>>return __builtin_object_size ((struct pollfd *)myfd, 1); > >>> } > >>> > >>> ends up returning 8 from the "test" routine, not 80. > > This example strikes me as strange. Why do you expect the size of a > pointer to the base class subobject to give a meaningful answer, any > more than taking the address of a data member? Well, it's a test case simplified from a real-world example. To quote the original mail at the start of this thread: >In the real-world application this test case was extracted from, >this causes a call: > > poll(myfd, count, 0); // 1 < count < 10 > >to fail with a "Buffer overflow detected" message at run-time >when building with _FORTIFY_SOURCE = 2 against glibc. [ Here, >there is no explicit cast, but it is implied by the prototype >of the "poll" routine. ] > >(Note that in the real-world application, the derived struct Pollfd >has some member functions to construct and pretty-print the structure, >but has no additional data members.) (https://gcc.gnu.org/ml/gcc/2014-06/msg00116.html) Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Issue with sub-object __builtin_object_size
Jason Merrill wrote: > On 09/15/2014 11:55 AM, Ulrich Weigand wrote: > > (https://gcc.gnu.org/ml/gcc/2014-06/msg00116.html) > > > From the __builtin_object_size documentation, it's not immediately > > clear to me whether this is supposed to work or not: > > > >If the least significant > >bit is clear, objects are whole variables, if it is set, a closest > >surrounding subobject is considered the object a pointer points to. > > > > Is the presence of the above cast (explicit or implicit) supposed to > > modify the notion of "closest surrounding subobject"? > > IMO, yes. A base class subobject is a subobject, so since the least > significant bit is set, it seems to me that GCC is correctly returning > the size of that subobject. I guess I'm still a bit confused about the special handling of the array case. Even with the last bit set, array elements normally do not count as "subobjects", so __builtin_object_size still returns the size of the full array. Now in this case, we cast a pointer to the array to a pointer to a base type of the array element type -- but the intent is for the pointer to still refer to the whole array. (Of course, this only works if the base type is actually the same size as the array type.) Note that with a somewhat equivalent C construct: struct pollfd { int fd; short int events; short int revents; }; struct Pollfd { struct pollfd x; }; struct Pollfd myfd[10]; we still get an object size of 80 for either: __builtin_object_size ((struct pollfd *)myfd, 1); or even __builtin_object_size (&myfd->x, 1); so there still seems to be something special in the C++ case ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Issue with sub-object __builtin_object_size
Jason Merrill wrote: > On 09/16/2014 06:23 AM, Ulrich Weigand wrote: > > Now in this case, we cast a pointer to the array to a pointer to a base > > type of the array element type -- but the intent is for the pointer to still > > refer to the whole array. (Of course, this only works if the base type > > is actually the same size as the array type.) > > And I'm arguing that this intent is not well expressed by the code. :) > > If they want to refer to the whole array, why are they casting the > pointer to a different type? And why are they passing the "subobject > only" value as the second argument? It looks more reasonable in the original source :-) What we have there is a definition of struct Pollfd that wraps the standard header file type struct pollfd and adds a couple of helpers (like constructors that set up the struct in some standard ways etc.). Subsequent source code simply uses struct Pollfd wherever struct pollfd would otherwise be used. In particular, this occurs in a call to the system "poll" routine: struct Pollfd myfd[10]; [...] poll(myfd, count, 0); // 1 < count < 10 Now, that routine comes from glibc header files, and if _FORTIFY_SOURCE is defined, it becomes an inline function that resolves after inlining and constant folding in this instance to something like: __poll_chk ((struct pollfd *)myfd, count, 0, __builtin_object_size ((struct pollfd *)myfd, 1)); The __poll_chk routine then performs a runtime check that the value of "count" is within range of the object size passed in as fourth argument. Note that the cast is actually implicit, it simply occurs because the parameter of the inlined "poll" function is "struct pollfd *", but the caller passes in a "struct Pollfd *". The "subobject only" value comes from glibc headers, where they use it deliberately: poll is supposed to get an array of pollfd structures, and it is OK to access those; but if that array is embedded within a larger structure, we want to avoid overruns into the rest of the structure. Is this "wrapping" of system types in a derived calls supposed to work? If there is a better way to express this without breaking glibc headers, I'd be happy to let the authors know ... > > Note that with a somewhat equivalent C construct: > > > > struct pollfd > >{ > > int fd; > > short int events; > > short int revents; > >}; > > > > struct Pollfd > >{ > > struct pollfd x; > >}; > > > > struct Pollfd myfd[10]; > > > > we still get an object size of 80 for either: > > > >__builtin_object_size ((struct pollfd *)myfd, 1); > > > > or even > > > >__builtin_object_size (&myfd->x, 1); > > That strikes me as a bug, especially the second one. Jakub, any thoughts on this? Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Issue with sub-object __builtin_object_size
Jason Merrill wrote: > Well, __builtin_object_size seems to be pretty fragile magic. To make > it work I guess you could add a wrapper for poll that takes a Pollfd and > does a reinterpret_cast, i.e. > > inline int poll(Pollfd* p, nfds_t n, int t) > { >return poll(reinterpret_cast(p), n, t); > } > > so that you force the conversion to ignore the inheritance relationship > and thereby avoid the COMPONENT_REF. That does indeed seem to work for my test case, thanks! Given that the _FORTIFY_SOURCE documentation already states that: With _FORTIFY_SOURCE set to 2 some more checking is added, but some conforming programs might fail. I guess having to use a workaround like the above is good enough. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[RFC] GCC vector extension: binary operators vs. differing signedness
Hello, we've noticed the following behavior of the GCC vector extension, and were wondering whether this is actually intentional: When you use binary operators on two vectors, GCC will accept not only operands that use the same vector type, but also operands whose types only differ in signedness of the vector element type. The result type of such an operation (in C) appears to be the type of the first operand of the binary operator. For example, the following test case compiles: typedef signed int vector_signed_int __attribute__ ((vector_size (16))); typedef unsigned int vector_unsigned_int __attribute__ ((vector_size (16))); vector_unsigned_int test (vector_unsigned_int x, vector_signed_int y) { return x + y; } However, this variant vector_unsigned_int test1 (vector_unsigned_int x, vector_signed_int y) { return y + x; } fails to build: xxx.c: In function 'test1': xxx.c:12:3: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts return y + x; ^ xxx.c:12:10: error: incompatible types when returning type 'vector_signed_int {aka __vector(4) int}' but 'vector_unsigned_int {aka __vector(4) unsigned int}' was expected return y + x; ^ Given a commutative operator, this behavior seems surprising. Note that for C++, the behavior is apparently different: both test and test1 above compile as C++ code, but this version: vector_signed_int test2 (vector_unsigned_int x, vector_signed_int y) { return y + x; } which builds on C, fails on C++ with: xxx.C:17:14: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts return y + x; ^ xxx.C:17:14: error: cannot convert 'vector_unsigned_int {aka __vector(4) unsigned int}' to 'vector_signed_int {aka __vector(4) int}' in return This C vs. C++ mismatch likewise seems surprising. Now, the manual page for the GCC vector extension says: You cannot operate between vectors of different lengths or different signedness without a cast. And the change log of GCC 4.3, where the strict vector type checks (and the above-mentioned -flax-vector-conversions option) were introduced, says: Implicit conversions between generic vector types are now only permitted when the two vectors in question have the same number of elements and compatible element types. (Note that the restriction involves compatible element types, not implicitly-convertible element types: thus, a vector type with element type int may not be implicitly converted to a vector type with element type unsigned int.) This restriction, which is in line with specifications for SIMD architectures such as AltiVec, may be relaxed using the flag -flax-vector-conversions. This flag is intended only as a compatibility measure and should not be used for new code. Both of these statements appear to imply (as far as I can tell) that all the functions above ought to be rejected (unless -flax-vector-conversions). So at the very least, we should bring the documentation in line with the actual behavior. However, as seen above, that actual behavior is probably not really useful in any case, at least in C. So I'm wondering whether we should: A. Bring C in line with C++ by making the result of a vector binary operator use the unsigned type if the two input types differ in signedness? and/or B. Enforce that both operands to a vector binary operator must have the same type (except for opaque vector types) unless -flax-vector-conversions? Thanks, Ulrich PS: FYI some prior discussion of related issues that I found: https://gcc.gnu.org/ml/gcc/2006-10/msg00235.html https://gcc.gnu.org/ml/gcc/2006-10/msg00682.html https://gcc.gnu.org/ml/gcc-patches/2006-11/msg00926.html https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01634.html https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00450.html -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFC] GCC vector extension: binary operators vs. differing signedness
Richard Biener wrote: > On Wed, Dec 10, 2014 at 10:09 PM, Ulrich Weigand wrote: > > So at the very least, we should bring the documentation in line with the > > actual behavior. However, as seen above, that actual behavior is probably > > not really useful in any case, at least in C. > > > > > > So I'm wondering whether we should: > > > > A. Bring C in line with C++ by making the result of a vector binary operator > >use the unsigned type if the two input types differ in signedness? > > > > and/or > > > > B. Enforce that both operands to a vector binary operator must have the same > >type (except for opaque vector types) unless -flax-vector-conversions? > > I suppose the current behavior comes from the idea that C would > apply promotion rules to the operands of binary operators. To the extent > we can apply the same rules to vector operands we should probably > support that. This means that for example > > 1) promotions that only differ in signedness happen (partly the C frontend > behavior) > 2) promotions that change the size of the elements and thus the size of > the vector happen (probably not a good idea by default) > 3) promotions that change the kind of the elements of the vectors > (float vs. non-float) happen (probably neither a good idea) Just to clarify: your 1) is pretty much the same as my A., right? The difference in behavior between C and C++ seems to originate in different implementations in c_common_type vs. cp_common_type: C has: /* If one type is a vector type, return that type. (How the usual arithmetic conversions apply to the vector types extension is not precisely specified.) */ if (code1 == VECTOR_TYPE) return t1; if (code2 == VECTOR_TYPE) return t2; while C++ has: if (code1 == VECTOR_TYPE) { /* When we get here we should have two vectors of the same size. Just prefer the unsigned one if present. */ if (TYPE_UNSIGNED (t1)) return build_type_attribute_variant (t1, attributes); else return build_type_attribute_variant (t2, attributes); } > I think it's sensible to support 1) by default (also to not regress existing > code) but fix it so it applies to C++ as well and accepts both cases > in C. And of course fix documentation. I'm not sure I understand exactly what you mean here. C++ already implements the sign promotion rule; just C doesn't. So we could do the same for C as is currently done for C++. However, if we make that change, there will be some cases that regress: the problem is that an expression "x + y" has *one* result type, and some things you do with the result will require that type to match precisely (including signedness). So *any* change that affects what that result type is will regress some code that happened to rely on the precise result type ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFC] GCC vector extension: binary operators vs. differing signedness
Richard Biener wrote: > On Thu, Dec 11, 2014 at 4:04 PM, Ulrich Weigand wrote: > > However, if we make that change, there will be some cases that regress: the > > problem is that an expression "x + y" has *one* result type, and some things > > you do with the result will require that type to match precisely (including > > signedness). So *any* change that affects what that result type is will > > regress some code that happened to rely on the precise result type ... > > True, but IMHO that's still better. You may want to check the openCL spec > which we tried to follow losely as to what we allow. > > So again, implementing your A is ok with me. Well, the openCL spec says that operations between signed and unsigned vectors are simply prohibited (both openCL 1.2 and openCL 2.0 agree on this, and it matches the behavior of my old openCL compiler ...): 6.1.2 Implicit Conversions: Implicit conversions between built-in vector data types are disallowed. 6.2.6 Usual Arithmetic Conversions: If the operands are of more than one vector type, then an error shall occur. Implicit conversions between vector types are not permitted, per section 6.2.1. 6.3 Operators: All arithmetic operators return result of the same built-in type (integer or floating-point) as the type of the operands, after operand type conversion. After conversion, the following cases are valid: - The two operands are scalars. [...] - One operand is a scalar, and the other is a vector. [...] - The two operands are vectors of the same type. [...] All other cases of implicit conversions are illegal. xlcl error message: 1506-068 (S) Operation between types "__private uint4" and "__private int4" is not allowed. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFC] GCC vector extension: binary operators vs. differing signedness
Richard Biener wrote: > On Fri, Dec 12, 2014 at 1:08 PM, Ulrich Weigand wrote: > > Richard Biener wrote: > >> On Thu, Dec 11, 2014 at 4:04 PM, Ulrich Weigand > >> wrote: > >> > However, if we make that change, there will be some cases that regress: > >> > the > >> > problem is that an expression "x + y" has *one* result type, and some > >> > things > >> > you do with the result will require that type to match precisely > >> > (including > >> > signedness). So *any* change that affects what that result type is will > >> > regress some code that happened to rely on the precise result type ... > >> > >> True, but IMHO that's still better. You may want to check the openCL spec > >> which we tried to follow losely as to what we allow. > >> > >> So again, implementing your A is ok with me. > > > > Well, the openCL spec says that operations between signed and unsigned > > vectors are simply prohibited (both openCL 1.2 and openCL 2.0 agree on > > this, and it matches the behavior of my old openCL compiler ...): [snip] > The question is what the fallout is if we reject this by default (I suppose > we accept it with -flax-vector-conversions). I'm ok with following > OpenCL as well, > that is either solution that makes behavior consistent between C and C++. I agree that we should certainly continue to support mixed types with -flax-vector-conversions. (This means we really should fix the C/C++ inconsistency as to return type in any case, even if we disable mixed type support by default.) What the fallout of disabling mixed types by default would be is hard to say. On the one hand, all other standards I've looked at (OpenCL, Altivec/VSX, Cell SPU) prohibit mixing signed/unsigned types. So this hopefully means users of the GCC extension don't use this feature (much). [Of course, Altivec does not really talk about operators, but ideally GCC's a + b should be equivalent to the Altivec vec_add (a, b), which does not support mixing signed/unsigned types.] On the other hand, I've noticed at least two related areas where disabling mixed types could result in unexpected fallout: opaque types and platform specific vector types (e.g. "vector bool" on Altivec). Opaque types are a somewhat under-specified GCC feature that is used for different purposes by various platforms and the middle-end itself: - some platform-specific types (PPC SPE __ev64_opaque__ or MEP cp_vector) - function parameters for overloaded builtins in C before resolution - the output of vector comparison operators and vector truth_type_for It should be possible to use an opaque type together with vectors of different types, even with -flax-vector-conversions (and even if we disable support for signed/unsigned mixed types); the result of an operation on an opaque type and a non-opaque type should be the non-opaque type; it's not quite clear to me how operations on two different opaque types are (should be?) defined. Platform-specific types like Altivec "vector bool" are not really known to the middle-end; this particular case is treated just like a normal unsigned integer vector type. This means that as a side-effect of disabling signed/unsigned mixed types, we would also disallow mixing signed/bool types. But at least for Altivec vec_add, the latter is explicitly *allowed* (returning the signed type). It would certainly be preferable for + to be compatible to vec_add also for the bool types. [ Note that the current implementation also does not fully match that goal, because while signed + bool is allowed, the return value is sometimes the bool type instead of the signed type. ] This can probably only be fully solved by making the middle-end aware that things like "vector bool" need to be handled specially. I had thought that maybe the back-end simply needs to mark "vector bool" as an opaque type (after all, the middle-end also uses this for vector truth types), but that doesn't work as-is, since one of the other features of opaque types is that they cannot be initialized. (Altivec vector bool *can* be initialized.) Maybe those two features should be decoupled, so we can have opaque types used as truth types, and those that cannot be initialized ... So overall the list of actions probably looks like this: 1. Fix the selection of the common type to use for binary vector operations - C and C++ need to be consistent - If one type is opaque and the other isn't, use the non-opaque type - If one type is unsigned and the other is signed, use the unsigned type - What to do with different types where neither of the above rules apply? 2. Rework support for opaque and platform-specific vector types - Instead of the single "opaqu
Failure to dlopen libgomp due to static TLS data
a area in such cases), then we wouldn't have to init the DTV in init_one_static_tls, and then we could do without the dl_open_worker check. Does this sound reasonable? Bye, Ulrich P.S.: Appended is a small test case that shows the issue. Note that just two libraries using TLS suffice to trigger the problem, because module IDs are not even reliably re-used after a dlclose ... Makefile all: module1.so module2.so main clean: rm -f module.so module1.so module2.so main module1.so: module.c gcc -g -Wall -DMODULE=1 -fpic -shared -o module1.so module.c module2.so: module.c gcc -g -Wall -DMODULE=2 -fpic -shared -o module2.so module.c main: main.c gcc -g -Wall -D_GNU_SOURCE -o main main.c -ldl -lpthread main.c == #include #include #include #include pthread_t thread_id; void *thread_start (void *arg) { printf ("Thread started\n"); for (;;) ; } void run_thread (void) { pthread_create(&thread_id, NULL, &thread_start, NULL); } void *test (const char *name) { void *handle, *func; size_t modid; handle = dlopen (name, RTLD_NOW); if (!handle) { printf ("Cannot open %s\n", name); exit (1); } func = dlsym (handle, "func"); if (!func) { printf ("Cannot find func\n"); exit (1); } ((void (*)(void))func)(); if (dlinfo(handle, RTLD_DI_TLS_MODID, &modid)) { printf ("Cannot find TLS module ID\n"); exit (1); } printf ("Module ID: %ld\n", (long) modid); return handle; } int main (void) { void *m1, *m2; int i; run_thread (); m1 = test ("./module1.so"); m2 = test ("./module2.so"); for (i = 0; i < 100; i++) { dlclose (m1); m1 = test ("./module1.so"); dlclose (m2); m2 = test ("./module2.so"); } dlclose (m1); dlclose (m2); return 0; } module.c #include __thread int x __attribute__ ((tls_model ("initial-exec"))); void func (void) { printf ("Module %d TLS variable is: %d\n", MODULE, x); } -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Failure to dlopen libgomp due to static TLS data
Alexander Monakov wrote: > > There's a pending patch for glibc that addresses this issue among others: > https://sourceware.org/ml/libc-alpha/2014-11/msg00469.html > > ([BZ#17090/17620/17621]: fix DTV race, assert, and DTV_SURPLUS Static TLS > limit) Ah, indeed, that would fix the issue! Thanks for pointing this out. I see that the latest revision: https://sourceware.org/ml/libc-alpha/2014-11/msg00590.html has been pinged a couple of times already, so let me add another ping :-) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[RFC/RFA] Obsolete Cell Broadband Engine SPU targets
Hello, the spu-elf target in GCC supports generating code for the SPU processors of the Cell Broadband Engine; it has been part of upstream GCC since 2008. However, at this point I believe this target is no longer in use: - There is no supported Cell/B.E. hardware any more. - There is no supported operating system supporting Cell/B.E. any more. I've still been running daily regression tests until now, but I'll be unable to continue to do so much longer since the systems I've been using for this will go away. Rather than leave SPU support untested/maintained, I'd therefore propose to declare all SPU targets obsolete in GCC 9 and remove the code with GCC 10. Any objections to this approach? Bye, Ulrich gcc/ChangeLog: * config.gcc: Mark spu* targets as deprecated/obsolete. Index: gcc/config.gcc === --- gcc/config.gcc (revision 270076) +++ gcc/config.gcc (working copy) @@ -248,6 +248,7 @@ md_file= # Obsolete configurations. case ${target} in *-*-solaris2.10* \ + | spu*-*-* \ | tile*-*-* \ ) if test "x$enable_obsolete" != xyes; then -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFC/RFA] Obsolete Cell Broadband Engine SPU targets
Thanks, Richi and Jeff! > * config.gcc: Mark spu* targets as deprecated/obsolete. I've now checked this in. I've also checked in the following patch to announce the change in htdocs. Bye, Ulrich Index: htdocs/gcc-9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v retrieving revision 1.56 diff -u -r1.56 changes.html --- htdocs/gcc-9/changes.html 1 Apr 2019 14:55:53 - 1.56 +++ htdocs/gcc-9/changes.html 2 Apr 2019 13:45:27 - @@ -54,6 +54,11 @@ in the https://gcc.gnu.org/ml/gcc/2018-10/msg00139.html";> announcement. + + Cell Broadband Engine SPU (spu*-*-*). Details can be found + in the https://gcc.gnu.org/ml/gcc/2019-04/msg00023.html";> + announcement. + -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [RFC/RFA] Obsolete Cell Broadband Engine SPU targets
Eric Gallagher wrote: > On 4/2/19, Ulrich Weigand wrote: > > Hello, > > > > the spu-elf target in GCC supports generating code for the SPU processors > > of the Cell Broadband Engine; it has been part of upstream GCC since 2008. > > > > However, at this point I believe this target is no longer in use: > > - There is no supported Cell/B.E. hardware any more. > > Wait, SPU includes the Playstation 3, right? I'm pretty sure there are > still plenty of PS3s in use out there... AFAIK my university (GWU) is > still running its PS3 supercomputer cluster... (I'd have to check to > make sure...) GCC has only ever supported Linux targets on Cell/B.E. I understand Sony has removed Linux support for PS3 a long time ago. (And in fact, I believe support for PS3 in general has run out this year.) Of course, there may be some people out there still running PS3 with old firmware that runs Linux. But they will then have to also use older GCC versions, sorry. (Unless, of course, somebody steps up and volunteers to take over the maintainance of the target.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: -fsanitize=thread support on ppc64
Bill Schmidt wrote: > On Jan 23, 2017, at 8:32 AM, Jakub Jelinek wrote: > > > > Another question is, it seems upstream has s390{,x}-*-linux* support for > > asan/ubsan, does that work? In that case we should add it to configure.tgt > > too (similarly to the sparc*-*-linux* entry). > > CCing Uli for the s390 question. Asan support was added just recently to LLVM for s390x-linux. However, I'm not sure it will work out-of-the-box on GCC, since we haven't done any back-end work to enable it. Also, LLVM is 64-bit only, so there probably would have to be extra work in the libraries to enable 31-bit mode. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: Register conflicts between output memory address and output register
Matthew Fortune wrote: > If however the address of my_mem is lowered after IRA i.e. when validating > constraints in LRA then IRA has nothing to do as the address is just a > symbol_ref. When LRA resolves the constraint for the address it introduces > a register for the output memory address but does not seem to acknowledge > any conflict with the output register (bar) it can therefore end up > using the same register for the output memory address as the output registe= > r. > This leads to the obvious problem if the ASM updates %1 before %0 as it will > corrupt the address. > > This can of course be worked around by making (bar) an early clobber or an > in/out but this does seem unnecessary. > > The question is... Should LRA recognise a conflict between the registers > involved in the address portion of an output memory operand and any output > register operands or is this a case where you strictly have to use early > clobber. Usually, the constraints are interpreted under the assumption that inputs and output are used atomically, as if by a single target instruction. If you consider an instruction that both updates memory and sets a register, then most architectures will indeed allow for the output register to be the same as one of the address registers for the memory operand, and therefore GCC should allow this allocation. In those cases where this is not true, e.g. because your inline asm does in fact use multiple instructions which set the two outputs at different times, then you should indeed use the earlyclobber flag -- that is exactly what this flag is intended for. The GCC documentation of the earlyclobber flag says: Means (in a particular alternative) that this operand is an earlyclobber operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address. Note in particular "... as part of any memory address." Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Wrong-code bug in execute_update_addresses_taken?
Hello, the following program seems to be miscompiled at -O or higher: int main (void) { int data = 1; struct ptr { int val; } *ptr = (struct ptr *) &data; ptr->val = 0; return data; } This program should return 0, but actually returns 1. [ As far as I can tell, this program does not violate the aliasing rules; all memory accesses refer to an object of effective (because declared) type "int", and use lvalue references of type "int". The pointer cast might depend on implementation-defined behaviour, but is not undefined. In any case, there are no warnings, and use of -fno-strict-aliasing does not fix the problem. ] In 024t.forwprop1, we still have (correctly): # .MEMD.3455_4 = VDEF <.MEMD.3455_3(D)> dataD.1975 = 1; # .MEMD.3455_5 = VDEF <.MEMD.3455_4> MEM[(struct ptr *)&dataD.1975].valD.1977 = 0; # VUSE <.MEMD.3455_5> D.3453_2 = dataD.1975; return D.3453_2; but then in 025t.ealias, we get: dataD.1975_4 = 1; # .MEMD.3455_5 = VDEF <.MEMD.3455_3(D)> MEM[(struct ptr *)&dataD.1975].valD.1977 = 0; D.3453_2 = dataD.1975_4; return D.3453_2; This is apparently caused by the "addresses taken" pass, which results in those messages: No longer having address taken: data Registering new PHI nodes in block #0 Registering new PHI nodes in block #2 Updating SSA information for statement data = 1; Updating SSA information for statement D.3453_2 = data; It seems that for some reason, the pass claims to be able to eliminate all statements that take the address of "data", but then leaves the MEM reference unchanged ... Any suggestions on what might cause this problem? Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: Wrong-code bug in execute_update_addresses_taken?
Richard Guenther wrote: > A bug? Can you file a bugreport and CC me? I'll look into the > problem. Sure, this is now PR tree-optimization/47621. Thanks for looking into it! Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: libgcc: strange optimization
Richard Guenther wrote: > On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor wrote: > > Richard Guenther writes: > >> I suggest to amend the documentation for local call-clobbered register > >> variables to say that the only valid sequence using them is from a > >> non-inlinable function that contains only direct initializations of the > >> register variables from constants or parameters. > > > > Let's just implement those requirements in the compiler itself. > > Doesn't work for existing code, no? And if thinking new code then > I'd rather have explicit dependences (and a way to represent them). > Thus, for example > > asm ("scall" : : "asm("r0")" (10), ...) > > thus, why force new constraints when we already can figure out > local register vars by register name? Why not extend the constraint > syntax somehow to allow specifying the same effect? Maybe it would be possible to implement this while keeping the syntax of existing code by (re-)defining the semantics of register asm to basically say that: If a variable X is declared as register asm for register Y, and X is later on used as operand to an inline asm, the register allocator will choose register Y to hold that asm operand. (And this is the full specification of register asm semantics, nothing beyond this is guaranteed.) It seems this semantics could be implemented very early on, probably in the frontend itself. The frontend would mark the *asm* statement as using the specified register (there would be no special handling of the *variable* as such, after the frontend is done). The optimizers would then simply be required to pass the asm-statement register annotations though, much like today they pass constraints through. At the point where register allocation decisions are made, those register annotations would then be acted on. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] rejects-valid: error: initializer element is not computable at load time
Georg-Johann Lay wrote: > For the following 1-liner I get an error with current trunk r177267: > > const __pgm char * pallo = "pallo"; > > __pgm as a named address space qualifier. [snip] > Moreover, if a user writes a line like >const __pgm char * pallo = "pallo"; > he wants the string literal to be placed in the non-default address > space. There is no other way to express that except something like >const __pgm char * pallo = (const __pgm char*) "pallo"; > which doesn't work either and triggers > > pgm.c:1:1: error: initializer element is not constant > > with a tree dump similar to above. This is pretty much working as expected. "pallo" is a string literal which (always) resides in the default address space. According to the named address space specification (TR 18037) there are no string literals in non-default address spaces ... The assignment above would therefore need to convert a pointer to the string literal in the default space to a pointer to the __pgm address space. This might be impossible (depending on whether __pgm encloses the generic space), and even if it is possible, it is not guaranteed that the conversion can be done as a constant expression at compile time. What I'd do to get a string placed into the __pgm space is to explicitly initialize an *array* in __pgm space, e.g. like so: const __pgm char pallo[] = "pallo"; This is defined to work according to TR 18037, and it does actually work for me on spu-elf. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
DJ Delorie wrote: > > The only target supporting named address spaces today is spu-elf, > > m32c-elf does too. Huh, I totally missed that, sorry ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] rejects-valid: error: initializer element is not computable at load time
Georg-Johann Lay wrote: > Ulrich Weigand wrote: > > This is pretty much working as expected. "pallo" is a string literal > > which (always) resides in the default address space. According to the > > named address space specification (TR 18037) there are no string literals > > in non-default address spaces ... > > The intension of TR 18037 to supply "Extension to Support Embedded Systems" > and these are systems where every byte counts -- and it counts *where* the > byte will be placed. > > Basically named AS provide something like target specific qualifiers, and > if GCC, maybe under the umbrella of GNU-C, would actually implement a feature > like target specific qualifiers, that would be a great gain and much more > appreciated than -- users will perceive it that way -- being more catholic > than the pope ;-) The problem with all language extensions is that you really need to be careful that the new feature you want to add is fully specified, in all its potential interactions with the rest of the (existing) language features. If you don't, some of those ambiguities are certain to be causing you problems later on -- in fact, that's just what has happened time and again with GCC extensions that were added early on ... This is why these days, extensions usually are accepted only if they *are* fully specified (which usually means providing a "diff" to the C standard text that would add the feature to the standard). This is a non-trivial task. One of the reasons why we decided to follow the TR 18037 spec when implementing the __ea extension for SPU is that this task had already been done for us. If you want to deviate from that existing spec, you're back to doing this work yourself. > For example, you can have any combination of qualifiers like const, restrict > or volatile, but it is not possible for named AS. That's clear as long as > named AS is as strict as TR 18037. However, users want features to write > down their code an a comfortable, type-safe way and not as it is at the > moment, > i.e. by means of dreaded inline assembler macros (for my specific case). A named AS qualifier *can* be combined with other qualifiers like const. It cannot be combined with *other* named AS qualifiers, because that doesn't make sense in the semantics underlying the address space concept of TR 18037. What would you expect a combination of two AS qualifiers to mean? > > The assignment above would therefore need to convert a pointer to the > > string literal in the default space to a pointer to the __pgm address > > space. This might be impossible (depending on whether __pgm encloses > > the generic space), and even if it is possible, it is not guaranteed > > that the conversion can be done as a constant expression at compile time. > > The backend can tell. It likes to implement features to help users. > It knows about the semantic and if it's legal or not. > > And even if it's all strict under TR 18037, the resulting error messages > are *really* confusing to users because to them, a string literal's address > is known. It would be possible to the extend named AS implementation to allow AS pointer conversions in initializers in those cases where the back-end knows this can be done at load time. (Since this is all implementation-defined anyway, it would cause no issues with the standard. We simply didn't do it because on the SPU, it is not easily possible.) Of course, that still wouldn't place the string literal into the non-generic address space, it just would convert its address. > > What I'd do to get a string placed into the __pgm space is to explicitly > > initialize an *array* in __pgm space, e.g. like so: > > > > const __pgm char pallo[] = "pallo"; > > > > This is defined to work according to TR 18037, and it does actually > > work for me on spu-elf. > > Ya, but it different to the line above. Sure, because it allocates only the string data, and not in addition a pointer to it as your code did ... > Was just starting with the work and > it worked some time ago, so I wondered. I think some time in the past, there was a bug where initalizers like in you original line were silently accepted but then incorrect code was generated (i.e. the pointer would just be initialized to an address in the generic address space, without any conversion). > And I must admit I am not familiar > will all the dreaded restriction TR 18037 imposes to render it less > functional :-( It's not a restriction so much, it's simply that TR 18037 does not say anything about string literals at all, so they keep working as they do in standard C. > Do you think a feature like "target specific qualifier" would be
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
Georg-Johann Lay wrote: > Ulrich Weigand wrote: > > I'd be happy to bring this up to date if you're willing to work with > > me to get this tested on a target that needs this support ... > > Just attached a patch to bugzilla because an AVR user wanted to play > with the AS support and asked me to supply my changes. It's still in > a mess but you could get a more reasonable base than on a target where > all named addresses vanish at expand. > > The patch is fresh and attached to the enhancement PR49868, just BE stuff. > There is also some sample code. OK, I'll have a look. Looking at your definition of the routines avr_addr_space_subset_p and avr_addr_space_convert, they appear to imply that any generic address can be used without conversion as a __pgm address and vice versa. That is: avr_addr_space_subset_p says that __pgm is both a superset and a subset of the generic address space, so they must be co-extensive. avr_addr_space_convert then says that the addresses can be converted between the two without changing their value. Is this really true? If so, why have a distinct __pgm address space in the first place? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
Georg-Johann Lay wrote: > AVR hardware has basically three address spaces: [snip] OK, thanks for the information! > Of course, RAM and Flash are no subsets of each other when regarded as > physical memory, but they are subsets when regarded as numbers. This > lead to my mistake to define RAM and Flash being no subsets of each other: >http://gcc.gnu.org/ml/gcc/2010-11/msg00170.html Right, in your situation those are *not* subsets according to the AS rules, so your avr_addr_space_subset_p routine needs to always return false (which of course implies you don't need a avr_addr_space_convert routine). Getting back to the discussion in the other thread, this also means that pointer conversions during initialization cannot happen either, so this discussion is basically moot. > However, there are situations like the following where you like to take > the decision at runtime: > > char cast_3 (char in_pgm, void * p) > { > return in_pgm ? (*((char __pgm *) p)) : (*((char *) p)); > } That's really an abuse of "void *" ... if you have an address in the Flash space, you should never assign it to a "void *". Instead, if you just have a address and you don't know ahead of time whether it refers to Flash or RAM space, you ought to hold that number in an "int" (or "short" or whatever integer type is most appropriate), and then convert from that integer type to either a "char *" or a "char __pgm *". > Linearizing the address space at compiler level is not wanted because > that lead to bulky, slow code and reduced the effective address space > available for Flash which might be up to 64kWords. I guess to simplify things like the above, you might have a third address space (e.g. "__far") that is a superset of both the default address space and the __pgm address space. Pointers in the __far address space might be e.g. 3 bytes wide, with the low 2 bytes holding the address and the high byte identifying whether the address is in Flash or RAM. Then a plain dereference of a __far pointer would do the equivalent of your cast_3 routine above. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
Michael Matz wrote: > On Fri, 5 Aug 2011, Ulrich Weigand wrote: > > Instead, if you just have a address and you don't know ahead of time > > whether it refers to Flash or RAM space, you ought to hold that number > > in an "int" (or "short" or whatever integer type is most appropriate), > > and then convert from that integer type to either a "char *" or a > > "char __pgm *". > > That would leave standard C. You aren't allowed to construct pointers out > of random integers. C leaves integer-to-pointer conversion *implementation-defined*, not undefined, and GCC has always chosen to implement this by (usually) keeping the value unchanged: http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Arrays-and-pointers-implementation.html This works both for default and non-default address spaces. Of course, my suggested implementation would therefore rely on implementation-defined behaviour (but by simply using the __pgm address space, it does so anyway). > That would point to a third address space, call it "undef" :) It would be > superset of default and pgm, conversions between undef to {default,pgm} > are allowed freely (and value preserving, i.e. trivial). That would probably violate the named AS specification, since two different entities in the undef space would share the same pointer value ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
DJ Delorie wrote: > Was this reproducible for m32c also? I can test it if so... The patch simply passes the destination address space through to MODE_CODE_BASE_REG_CLASS and REGNO_MODE_CODE_OK_FOR_BASE_P, to allow targets to make register allocation decisions based on address space. As long as m32c doesn't implement those, just applying the patch wouldn't change anything. But if that capability *would* be helpful on your target, it would certainly be good if you could try it out ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint)
Uros Bizjak wrote: > Although, it would be nice for reload to subsequently fix CSE'd > non-offsetable memory by copying address to temporary reg (*as said in > the documentation*), we could simply require an XMM temporary for > TImode reloads to/from integer registers, and this fixes ICE for x32. Moves are special as far as reload is concerned. If there is already a move instruction present *before* reload, it will get fixed up according to its constraints as any other instruction. However, reload will *introduce* new moves as part of its operation, and those will *not* themselves get reloaded. Instead, reload simply assumes that every plain move will just succeed without requiring any reload; if this is not true, the target *must* provide a secondary reload for this move. (Note that the secondary reload could also work by reloading the target address into a temporary; that's up to the target to implement.) Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] rejects-valid: error: initializer element is not computable@load time
Georg-Johann Lay wrote: > Ulrich Weigand schrieb: > > It's not a restriction so much, it's simply that TR 18037 does not say > > anything > > about string literals at all, so they keep working as they do in standard C. > > Take the following bit more elaborate example where you want to describe > a menu: [snip] > Thus you have no alternative and have to use menu1_t: > > const char __as STR_YES[] = "Yes"; > const char __as STR_NO[] = "No"; > const char __as STR_DUNNO[] = "Don't really know and will decide later"; > > const menu1_t menu3 = > { > 1, > { > STR_YES, > STR_NO, > STR_DUNNO > } > }; > > With some menues you soon end up with dozens of unnecessary variables > just because TR 18037 has a blank area on it's map. I agree that this is best approach. You can even avoid the unnecessary variables by using a compound literal with AS-qualified char array type, and initialize the AS pointer to its address, like e.g. so: const __as char *pallo = (const __as char []){"pallo"}; That's still a bit awkward, but that could be hidden via a macro ... > This means that a line like > const __as char * pallo = "pallo"; > can be very useful and that there is exactly one meaning that makes > sense: put "pallo" in __as because otherwise you won't be able to > reference that string. That's not really true: on some targets, the generic address space will indeed be a subset of __as, so you *can* assign the address of a string literal in the generic space to an __as pointer variable. Even on the SPU, this works when the assignment is not done at initialization time: const __ea char *p; int test (void) { p = "test"; } This will generate code to convert the address of the literal "test" in the generic address space into a __ea address and assign it to the __ea pointer. > Even more preferred would be to assign the only sensible meaning > (provided a target allows it), which should be legal as that extension > is implementation defined, anyway. What is implementation defined is whether conversion of a pointer to a different address space is allowed at *initialization* time or not. What is *not* implementation defined is that string literals are always in the generic address space -- that's simply a requirement of the standard. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
Georg-Johann Lay wrote: > Thanks, it works. OK, thanks for testing! > std Y+2,r31 ; 30 *movphi/3 [length = 7] > std Y+1,r30 I'm actually not seeing those (maybe I'm using a different code base than you were using ...) But I still see that the frame is created. The problem is that IRA thinks it needs to allocate a pseudo on the stack, and creates a stack slot for it. But then reload goes and just reloads the pseudo into a hard register anyway, and simply doesn't need the stack slot ... but it was already allocated and accounted for such that get_frame_size () no longer returns 0. > I frequently see IRA doing a very bad job for small register classes > like here. Maybe it's better to take it completely away from IRA > and write the load as > > (set (reg:HI) > (unspec:HI (post_inc:PHI (reg:PHI Z > > Loading from __pgm is actually an unspec, i.e. reading two times from > the same address will yield the same result. This really seems to be a problem in IRA somewhere, but I'd guess it would be better to fix in there instead of working around it. Maybe you should open a bug an get in touch with the IRA maintainers ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: libgcc: strange optimization
Richard Earnshaw wrote: > Better still would be to change the specification and implementation of > local register variables to only guarantee them at the beginning of ASM > statements. At other times they are simply the same as other local > variables. Now we have a problem that the register allocator knows how > to solve. This seems to be pretty much the same as my proposal here: http://gcc.gnu.org/ml/gcc/2011-08/msg00064.html But there was some push-back on requiring additional semantics by some users ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > I was surprised that an instruction that is marked as s_operand > was getting a seemingly non-s_operand given to it, so I added an > "S" constraint: That's right. It is not good to have a constraint that accepts more than the predicate, since reload will at this point only consider the constraint. Adding a more restricted constraint should be the proper fix for this problem. > That then gave an actual compiler error instead of generating bad > code, which is a step forward: > > pdos.c: In function `pdosLoadExe': > pdos.c:2703: error: unable to generate reloads for: You'll need to mark your new constraint as EXTRA_MEMORY_CONSTRAINT so that reload knows what to do when an argument doesn't match. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > Unfortunately it's not quite right, seemingly not loading R9 properly: > > LR9,13 > AR9,13 > MVC 0(10,9),0(2) That's weird. What does the reload dump (.greg) say? > And it had a knock-on effect too, producing bad code elsewhere: > > < SLR 2,2 > < SLR 3,3 > < ST2,128(13) > < ST3,4+128(13) > < ST2,136(13) > < ST3,4+136(13) > < ST2,144(13) > < ST3,4+144(13) > --- > > MVC 128(8,13),=F'0' > > MVC 136(8,13),=F'0' > > MVC 144(8,13),=F'0' > > But I guess that is another can of worms to investigate. It seems the literal is not marked as being doubleword. That might be related to the fact that const_int's do not carry a mode, so you cannot just look at the literal's mode to determine the required size, but have to take the full instruction into account ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
p1, op1 = tem; if (op0 != XEXP (in, 0) || op1 != XEXP (in, 1)) in = gen_rtx_PLUS (GET_MODE (in), op0, op1); insn = emit_insn (gen_rtx_SET (VOIDmode, out, in)); code = recog_memoized (insn); Note how this actually performs the check whether to swap operands for commutativity. Can you debug this and find out why this doesn't work in your case? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > Hi Ulrich. I put in the following debug: > > op0 = find_replacement (&XEXP (in, 0)); > op1 = find_replacement (&XEXP (in, 1)); > > /* Since constraint checking is strict, commutativity won't be > checked, so we need to do that here to avoid spurious failure > if the add instruction is two-address and the second operand > of the add is the same as the reload reg, which is frequently > the case. If the insn would be A = B + A, rearrange it so > it will be A = A + B as constrain_operands expects. */ > > fprintf(stderr, "REGNO(out) is %d\n", REGNO(out)); > fprintf(stderr, " REG in 1 is %d\n", REGNO(XEXP(in,1))); > if (GET_CODE (XEXP (in, 1)) == REG > && REGNO (out) == REGNO (XEXP (in, 1))) > tem = op0, op0 = op1, op1 = tem; > > And it produced this output (for exactly the same code I showed > you previously): > > C:\devel\pdos\s370>\devel\gcc\gcc\gccmvs -da -DUSE_MEMMGR -Os -DS390 -S -I > . -I ../pdpclib pdos.c > REGNO(out) is 3 > REG in 1 is 32880 > REGNO(out) is 2 > REG in 1 is 32880 > REGNO(out) is 2 > REG in 1 is 32880 > REGNO(out) is 2 > REG in 1 is 112 > REGNO(out) is 3 > REG in 1 is 32880 > REGNO(out) is 4 > REG in 1 is 112 > REGNO(out) is 2 > REG in 1 is 112 > > which looks to me like it is not seeing a register, only a constant, > so cannot perform a swap. Oops, there's clearly a bug here. "in" at this point is the original expression that has not yet been reloaded, so its second operand will indeed be a constant, not a register. However, reload has already decided that this constant will end up being replaced by a register, and that is what the "find_replacement" call is checking. So at this point in the program, XEXP (in, 1) will be the constant, but op1 will be the register it is going to be replaced with. Unfortunately the test whether to swap looks at XEXP (in, 1) -- it really needs to look at op1 instead. Can you try changing the lines if (GET_CODE (XEXP (in, 1)) == REG && REGNO (out) == REGNO (XEXP (in, 1))) to if (GET_CODE (op1) == REG && REGNO (out) == REGNO (op1)) instead? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
Georg-Johann Lay wrote: > http://gcc.gnu.org/ml/gcc/2011-08/msg00131.html > > Are you going to install that patch? Or maybe you already installed it? No, it isn't approved yet (in fact, it isn't even posted for approval). Usually, patches that add new target macros, or new arguments to target macros, but do not actually add any *exploiter* of the new features, are frowned upon ... Thus, I'd prefer to wait until you have patch ready that exploits this in the AVR backend, and then submit the whole series. > Then, I wonder how the following named AS code translates: > > int a = *((__as int*) 1000); > > As const_int don't have a machmode (yet), I guess the above line just > reads from generic AS and reading from a specific address from named AS > cannot work. This should work fine. Address space processing doesn't really depend on the machine mode; the address space is annotated to the MEM RTX directly. Code like the above ought to generate a MEM with either an immediate CONST_INT operand or one with the immediate loaded into a register, depending on what the target supports, but in either case the MEM_ADDR_SPACE will be set correctly. It's then up the target to implement the access as appropriate. > Moreover, INDEX_REG_CLASS, REGNO_OK_FOR_INDEX_P, HAVE_PRE_INCREMENT et > al. and maybe others are AS-dependent. I agree for INDEX_REG_CLASS and REGNO_OK_FOR_INDEX_P. In fact, I'd suggest they should first be converted to look similar to base registers (i.e. add MODE_CODE_INDEX_REG_CLASS and REGNO_MODE_CODE_OK_FOR_INDEX_P) and then add address space support to those extended macros, so as to avoid having to change lots of back-ends. Do you need this for AVR? I could add that to the patch I posted previously ... Now for HAVE_PRE_INCREMENT, I don't think we need anything special. This is used just as a short-cut to bypass pre-increment handling completely on targets that don't support it at all. On targets that *do*, there will always be additional requirement on just which memory accesses support pre-increment. Therefore, the middle-end will still always check the target's legitimate_address callback to ensure any particular pre-incremented memory access is actually valid. This mechanism can already look at the address space to make its decision ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > if (operands[1] == const0_rtx) > { > CC_STATUS_INIT; > mvs_check_page (0, 6, 8); > return \"MVC%O0(8,%R0),=XL8'00'\"; > } > mvs_check_page (0, 6, 8); > return \"MVC%O0(8,%R0),%1\"; > }" >[(set_attr "length" "8")] > ) > > forces it to use XL8'00' instead of the default F'0' and that > seems to work. Does that seem like a proper solution to > you? Well, there isn't really anything special about const0_rtx. *Any* CONST_INT that shows up as second operand to the movdi pattern must be emitted into an 8 byte literal at this point. You can do that inline; but the more usual way would be to define an operand print format that encodes the fact that a 64-bit operand is requested. In fact, looking at the i370.h PRINT_OPERAND, there already seems to be such a format: 'W'. (Maybe not quite; since 'W' sign-extends a 32-bit operand to 64-bit. But since 'W' doesn't seem to be used anyway, maybe this can be changed.) Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands
Georg-Johann Lay wrote: > Ulrich Weigand schrieb: > > Georg-Johann Lay wrote: > > > >>http://gcc.gnu.org/ml/gcc/2011-08/msg00131.html > >> > >>Are you going to install that patch? Or maybe you already installed it? > > > > No, it isn't approved yet (in fact, it isn't even posted for approval). > > Usually, patches that add new target macros, or new arguments to target > > macros, but do not actually add any *exploiter* of the new features, > > are frowned upon ... > > I thought about implementing a "hidden" named AS first and not exposing > it to user land, e.g. to be able to do optimizations like > http://gcc.gnu.org/PR49857 > http://gcc.gnu.org/PR43745 > which need named AS to express that some pointers/accesses are different. > > The most prominent drawback of named AS at the moment is that AVR has > few address registers and register allocation often regerates unpleasant > code or even runs into spill fails. > > The AS in question can only be accessed by means of post-increment > addressing via one single hard register. Well, it doesn't really matter whether you want to expose the AS externally or just use it internally. Either way, I'll be happy to propose my patch for inclusion once you have a patch ready that depends on it ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: Derive more alias information from named address space
Bingfeng Mei wrote: > Therefore, A & B could only be disjoint, i.e., not aliased to each other. > We should be able to write: > > if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) > { > if (!targetm.addr_space.subset_p (MEM_ADDR_SPACE (mem), MEM_ADDR_SPACE > (x)) >&& !targetm.addr_space.subset_p (MEM_ADDR_SPACE (x), MEM_ADDR_SPACE > (mem))) > return 0; > else > return 1; > } > > Is this correct? Yes, this looks correct to me ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: IRA changes rules of the game
Paulo J. Matos wrote: > (define_insn_and_split "neghi_internal" > [(set (match_operand:QI 0 "register_operand" "=c") > (neg:QI (match_dup 0))) > (set (match_operand:QI 1 "register_operand" "=c") > (plus:QI > (plus:QI > (ltu:QI (neg:QI (match_dup 0)) (const_int 0)) > (match_dup 1)) > (const_int 0))) > (clobber (reg:CC RCC))] > Am I missing something or something here is broken? When reload looks at the above pattern, it will see just two operands, both of which are output-only. So when it decides to reload one of the operands, it will only provide an output reload, no input reload. For operands that are actually used for both input and output, you need to provide two match_operand clauses, and tie them using a matching constraint. Simply using match_dup doesn't accomplish that. Something along the lines of: (define_insn_and_split "neghi_internal" [(set (match_operand:QI 0 "register_operand" "=c") (neg:QI (match_operand:QI 1 "register_operand" "0"))) (set (match_operand:QI 2 "register_operand" "=c") (plus:QI (plus:QI (ltu:QI (neg:QI (match_dup 1)) (const_int 0)) (match_operand:QI 3 "register_operand" "2")) (const_int 0))) (clobber (reg:CC RCC))] ought to work as expected. (The remaining match_dup is fine, since reload already knows operand 1 is used as input, so the dup doesn't introduce a new type of use.) Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > I then found out that even with old versions of the machine definition, > I can have the warning removed by simply not defining CONST_INT > in the PREDICATE_CODES, even though it is allowed when the > function is called. ie it seems to have no effect on the code > generation, but succeeds in eliminating the warning, and without > needing to define an extra constraint for non-constant situations. Actually PREDICATE_CODES does have to match the predicate definitions. If it does not, you can run into subtle bugs, as the code in genrecog.c that generates the finite state automaton matching an RTX pattern against the .md insn definitions *does* make use of PREDICATE_CODES; for example, it will assume that two predicates with disjoint sets of PREDICATE_CODES can never both match the same RTX. This may or may not lead to visible differences when compiling simple test cases, but I'd expect effects to be visible in more complex scenarios. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > GCC 4 complained (on my Linux system) that I didn't have > various things (gimp etc) installed, which means I would need > that other software to be ported too, which is not a project > I want to work on at the moment. However, at least it means > that i have successfully merged all the GCCMVS changes > to 3.2.3 in with the (last available) 3.4.6 development, which > was a precursor to any GCC 4 port anyway. I'll see over time > how GCC 3.4.6 pans out. This probably is not gimp (the graphics editor) but gmp (the multi-precision integer operation library) and mpfr (same for floating-point). To build any recent GCC you'll indeed need these two libraries. Fortunately, they're already available on s390(x) on Linux, and shouldn't really contain anything that is OS-specific, so porting those to MVS should hopefully be straightforward ... > Until then, back at GCC 3.2.3, I have a "need" to make the entry > point 0 in my MVS modules. > Currently I generate this: [snip] > for a main program. In order to get the entry point to 0, I need to move > that > @@MAIN function to the top of the file. I don't think there is a reliable way to change the sequence of functions / objects in the output file. However, it seems to me it shouldn't really be necessary to treat "main" special. Usually, the "entry point" isn't really "main", but rather some function in crt0.o, which then in turn *calls* main after all startup processing has been done. As crt0.o can be (and usually is) completely written in assembler, you can arrange for everything to be in the sequence that is required. (On the linker command line, crt0.o would be placed first, so if the entry point is the first thing in crt0.o it will then also be the first thing in the executable.) > And I have another question - where is the code for __asm__? > > Currently that is generating garbage for me: > > unsigned int bytes = 258; > > __asm__("? :"); > > int main(void) > > BYTESEQU * > DCF'258' > o...@z > * Program text area > > when done in cross-compiler mode, and need to find out where > the literal is being translated (or more likely - failing to be > translated in the first instance). Any idea where that is? Hmm, this seems to be the bug fixed by Eric Christopher in 2004: http://gcc.gnu.org/ml/gcc-cvs/2004-02/msg01425.html > And final thing - the interest in the __asm__ comes from the hope > that in the generated 370 assembler, it would be possible to have > the C code interspersed to whatever extent possible. The plan was > to basically somehow get every line of C code, e.g. "int main(void)" > above, and automatically generate an: > __asm__("int main void)"); As you note, this doesn't seem workable as the result wouldn't be syntactically valid ... > which gives a syntax error. Any idea how to get the mixed > C/assembler when I'm running with the "-S" option and don't > have the luxury of calling the normal "as" which would > normally handle that? There doesn't seem to be an easy way to do this from the compiler. At the time compiler output is generated, the original source code text isn't even kept any more; you'd have to go back and re-read the original source files using the line-number debug data, just as the assembler would ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > I expected that the assembler generation wouldn't happen until > after the optimization was completed, and thus, by then, I > would know whether this was a main. That depends a bit on the compiler version and optimization level, but (in particular in the 3.x time frame) GCC may output assembler code on a function-by-function basis, without necessarily reading in the whole source file first. As I said, it seems the best way would be to not have care at all whether or not any particular source file contains a main routine, but instead do all entry-point processing in the crt0 startup file. > > As crt0.o can be (and usually is) completely > > written in assembler, you can arrange for everything to be in the sequence > > that is required. (On the linker command line, crt0.o would be placed > > first, so if the entry point is the first thing in crt0.o it will then > > also be the first thing in the executable.) > > Yes, I can do that, but that means I need to have a linker command > line! The way the MVS linker works, I can link my main program, > (which obviously doesn't have crt0 in it), and, thanks to the "END" > statement, I can specify an entry point. This means no complaint > from the linker about a default (and wrong) entry point, and no > need for any linker statements. It all automatically resolves. I don't know exactly how your port handles this on MVS, but usually GCC itself will invoke the linker, and will itself prepare an appropriate command linker for the linker. As part of this command line, things like crt files will be specified. You should set the STARTFILE_SPEC macro in your back-end to do that ... > > There doesn't seem to be an easy way to do this from the > > compiler. At the time compiler output is generated, the > > original source code text isn't even kept any more; you'd > > have to go back and re-read the original source files using > > the line-number debug data, just as the assembler would ... > > Ok, well - I would be content with just the source line number > appearing in the output assembler. Is this information > available as each assembler line is output? In current GCC, every insn contains "location" information pointing back to source code line (and column) numbers. However, in GCC 3.x I think this wasn't yet present, but you had to rely on line number notes interspersed with the insn stream. In any case, if you build with -g and use in addition the "debug assembler output" flag -dA the assembler output should contain human-readable comments containing line number information. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > > That depends a bit on the compiler version and optimization level, > > but (in particular in the 3.x time frame) GCC may output assembler > > code on a function-by-function basis, without necessarily reading > > in the whole source file first. > > Ok, actually it doesn't matter if it doesn't work all the time. I'll > always be compiling with -Os anyway, so it sounds like I'm in > with a chance of the whole file being read first? > > If so, where is my first opportunity, in 3.2.3, to see if there's a > "main" function in this file? Hmm, it seems 3.2.x would *always* operate on a function-by-function basis. The unit-at-a-time mode was only introduced with 3.4 (I don't recall if it was already present in 3.3). I don't think there is any way in 3.2.3 to check whether there is a "main" function in the file before it is processed ... > > I don't know exactly how your port handles this on MVS, but usually > > GCC itself will invoke the linker, and will itself prepare an > > appropriate command linker for the linker. > > GCC on MVS *only* outputs assembler. ie you always have to > use the "-S" option. > > By doing so, it turns GCC into a pure text-processing application, > which will thus work in any C90 environment. Huh. So the user will always have to invoke the linker manually, and pass all the correct options (libraries etc.)? What is the problem with having GCC itself invoke the linker, just like it does on other platforms? > > In current GCC, every insn contains "location" information pointing > > back to source code line (and column) numbers. However, in GCC 3.x > > I think this wasn't yet present, but you had to rely on line number > > notes interspersed with the insn stream. > > > > In any case, if you build with -g and use in addition the "debug > > assembler output" flag -dA the assembler output should contain > > human-readable comments containing line number information. > > The GCC assembler is never invoked. After getting the assembler > output from the -S option, this is fed into IFOX00/IEV90/ASMA90. As Paolo mentioned, the -dA flag is an option to the *compiler* that causes it to place additional information into its output stream (the assembler source code). Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > > What is the problem with having GCC itself invoke the linker, just like > > it does on other platforms? > > 1. That requires extensions to the C90 standard. The behaviour of > system() is undefined, much less even the existence of fork() etc. > > 2. The attempt to add functionality to system() in MVS has made > leaps and bounds, but has not reached the stage of being able > to detect if the SYSPRINT DCB is already open so it knows not > to reopen it, and close it, stuffing up the caller. > > 3. MVS compilers don't normally invoke the linker. That's always > a separate step. GCCMVS is an MVS compiler also. It would > be normal to generate object code though. But the compiler > normally generates the object code, rather than invoking the > assembler. In any case, the facility to allocate temporary > datasets for HLASM and deciding what space parameters > should be used etc etc has not yet been determined, and is > unwieldy regardless, and the functionality doesn't exist yet > anyway, and may be years away still, if it even makes sense. I cannot really comment on what would be desirable behaviour for MVS ... Just as an implementation note, while it is true that the GCC compiler driver needs to be able to execute other processes (preprocessor, compiler, assembler, linker) as sub-tasks, it does not require the full POSIX system/fork/exec functionality to do so. GCC relies on the libiberty "pex" family of routines, which are much more narrow in scope, and have in fact been ported to several non-UNIX systems, including even MS-DOS. Providing a port of "pex" to MVS should be much easier that porting a full Unix "system" or "fork" feature. B.t.w. if you cannot execute sub-tasks at all, how does the MVS GCC invoke the preprocessor (I think this was still a separate process in 3.2) and the core compiler (cc1) from the compiler driver? Do you even have a separate compiler driver? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > > Hmm, it seems 3.2.x would *always* operate on a function-by-function > > basis. The unit-at-a-time mode was only introduced with 3.4 (I don't > > recall if it was already present in 3.3). I don't think there is any > > way in 3.2.3 to check whether there is a "main" function in the file > > before it is processed ... > > Does that mean I could take advantage of this behaviour? I don't think this would be a good idea. > /* Store in OUTPUT a string (made with alloca) containing an >assembler-name for a local static variable named NAME. >LABELNO is an integer which is different for each call. */ > > #ifdef TARGET_PDPMAC > #define ASM_FORMAT_PRIVATE_NAME(OUTPUT, NAME, LABELNO) \ > { \ > (OUTPUT) = (char *) alloca (strlen ((NAME)) + 10); \ > sprintf ((OUTPUT), "__%d", (LABELNO)); \ > } How does this work? ASM_FORMAT_PRIVATE_NAME is not supposed to completely ignore the NAME argument, the function may well be called with the same LABELNO but different NAME strings, and this must not result in conflicting symbols ... > static void > i370_output_function_prologue (f, l) > FILE *f; > HOST_WIDE_INT l; > { > /* Don't print stack and args in PDPMAC as it makes the >comment too long */ > #ifdef TARGET_PDPMAC > fprintf (f, "* %c-func %s prologue\n", >mvs_need_entry ? 'X' : 'S', >mvs_function_name); > #else At this point, you may refer to "current_function_decl" to retrieve information about the function currently being output. In particular, you can retrieve the original source-level name associated with the routine via DECL_NAME (current_function_decl). Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > >> /* Store in OUTPUT a string (made with alloca) containing an > >>assembler-name for a local static variable named NAME. > >>LABELNO is an integer which is different for each call. */ > >> > >> #ifdef TARGET_PDPMAC > >> #define ASM_FORMAT_PRIVATE_NAME(OUTPUT, NAME, LABELNO) \ > >> { \ > >> (OUTPUT) = (char *) alloca (strlen ((NAME)) + 10); \ > >> sprintf ((OUTPUT), "__%d", (LABELNO)); \ > >> } > > > > How does this work? ASM_FORMAT_PRIVATE_NAME is not supposed > > to completely ignore the NAME argument, the function may well > > be called with the same LABELNO but different NAME strings, > > and this must not result in conflicting symbols ... > > I have compiled the entire GCC and not come up with any duplicate > static function names, so I think the number is always unique. Hmm, I see that in the 3.2.x code base this is indeed true. However, in later compilers ASM_FORMAT_PRIVATE_NAME is used for other purposes by the middle-end, not just static function or variable names. You definitely can get number collisions in later compilers ... > > At this point, you may refer to "current_function_decl" to > > retrieve information about the function currently being output. > > In particular, you can retrieve the original source-level name > > associated with the routine via DECL_NAME (current_function_decl). > > Thanks a lot! I couldn't use that directly, but this: Why not? I'd have thought something like printf ("%s", IDENTIFIER_POINTER (DECL_NAME (current_function_decl))); should work fine ... > c:\devel\gcc\gcc\config\i370>cvs diff -r 1.37 i370.c B.t.w. if you use the -u or -c option to cvs diff, the diffs are a lot more readable ... > <mvs_function_name); > --- > >fname_as_string(0)); This is a bit problematic as fname_as_string is a function defined in the C front-end. If you were e.g. to build the Fortran compiler, your back-end gets linked against the Fortran front-end instead of the C front-end, and that function simply will not be there. Generally, the rule is that the back-end must not directly call front-end routines. In any case, for C source code fname_as_string does pretty much nothing else than what I suggested above ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: i370 port
Paul Edwards wrote: > int > i370_branch_dest (branch) > rtx branch; > { > rtx dest = SET_SRC (PATTERN (branch)); > int dest_uid; > int dest_addr; > > /* first, compute the estimated address of the branch target */ > if (GET_CODE (dest) == IF_THEN_ELSE) > dest = XEXP (dest, 1); > dest = XEXP (dest, 0); This is set up only to handle direct branches of the form (set (pc) (label_ref ...)) and indirect branches of the form (set (pc) (if_then_else (...) (label_ref ...) (pc))) but *not* indirect branches of the form (set (pc) (if_then_else (...) (pc) (label_ref ...))) This latter form is accepted by the "negated conditional jump instructions in the i370.md file, like so: (define_insn "" [(set (pc) (if_then_else (eq (cc0) (const_int 0)) (pc) (label_ref (match_operand 0 "" "" ; (clobber (reg:SI 14)) ] "" "* { check_label_emit (); mvs_check_page (0, 4, 0); if (i370_short_branch(insn) || mvs_check_label (CODE_LABEL_NUMBER (operands[0]))) { Therefore, the i370_branch_dest routine needs to handle those as well. Probably something along the following lines: if (GET_CODE (dest) == IF_THEN_ELSE) { if (GET_CODE (XEXP (dest, 1) == LABEL_REF) dest = XEXP (dest, 1); else dest = XEXP (dest, 2); } gcc_assert (GET_CODE (dest) == LABEL_REF); dest = XEXP (dest, 0); Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com