LRA reloads of subregs
I'm working on converting sparc to LRA, and thanks probably to the work the powerpc folks did this is going much better than when I last tried this. The first major stumbling block I've run into is when LRA forces a reload for a SUBREG, and specifically there is a MEM involved that itself needs a reload due to having an invalid address. For example, simplify_operand_subreg() is working on this insn: (insn 18631 1099 1100 14 (set (reg:SI 13423) (subreg:SI (mem/c:QI (plus:SI (reg/f:SI 101 %sfp) (const_int -14269 [0xc843])) [0 %sfp+-14269 S1 A8]) 0)) x.c:104 63 {*movsi_insn} (expr_list:REG_DEAD (reg:QI 287) (nil))) lra_emit_move() (via insert_move_for_subreg()) is called (here, 'reg' is the MEM expression). Because the expression is a MEM, all of the special cased code in lra_emit_move() meant to avoid invalid displacements and indexes is not used, and it just performs a plain emit_move_insn(). Calling emit_move_insn() does not work properly because it emits code which needs reloads, to handle the too large CONST_INT offset in the MEM expression. We abort because lra_process_new_insns() expects everything emitted by insert_move_for_subreg() to be recognizable, and with that too large offset it cannot. I wonder why another target's LRA conversion hasn't hit this :-) Vlad I wonder how you'd like this to be handled? The code to handle this kind of situation is there in the process_address infrastructure.
Re: LRA reloads of subregs
From: Segher Boessenkool Date: Thu, 3 Sep 2015 20:26:51 -0500 > On Thu, Sep 03, 2015 at 03:33:56PM -0700, David Miller wrote: >> (insn 18631 1099 1100 14 (set (reg:SI 13423) >> (subreg:SI (mem/c:QI (plus:SI (reg/f:SI 101 %sfp) >> (const_int -14269 [0xc843])) [0 %sfp+-14269 >> S1 A8]) 0)) x.c:104 63 {*movsi_insn} >> (expr_list:REG_DEAD (reg:QI 287) >> (nil))) > >> I wonder why another target's LRA conversion hasn't hit this :-) > > Maybe a stupid question but... why are you seeing subregs of mem at all? > Does sparc not have INSN_SCHEDULING defined? The paradoxical subreg restriction in general_operand() is only enforced when reload_completed is true, which will not be the case while LRA is working.
Re: LRA reloads of subregs
From: Vladimir Makarov Date: Fri, 4 Sep 2015 10:00:54 -0400 > LRA porting frequently needs changing in constraints.md, .c, > and .md files. I did make such changes, trust me :-) First obstacle was that, unlike reload, LRA is very strict about register constraints. If a constraint doesn't evaluate to a register class, LRA refuses to consider it a register. So we had this ugly thing: (define_constraint "U" "Pseudo-register or hard even-numbered integer register" (and (match_test "TARGET_ARCH32") (match_code "reg") (ior (match_test "REGNO (op) < FIRST_PSEUDO_REGISTER") (not (match_test "reload_in_progress && reg_renumber [REGNO (op)] < 0"))) (match_test "register_ok_for_ldd (op)"))) A few years ago I tried the simple thing, changing this to a plain GENERAL_REGS register constraint and hoping that HARD_REGNO_OK would properly enforce the even register number requirement. Back then it didn't work but now it appears to work properly. I've included the full patch I am working with below in case you are curious. > I don't think we should add a new LRA code calling process_address > before adding insns for further processing. LRA just needs to get > operands from insns to make them valid. So again I'd try to make insn > recognizable for LRA first and only if it does not work then think > about other solutions in case when such change creates other problems > (it is hard for me to predict LRA behaviour definitely just reading > source files and not knowing sparc port well). If LRA is prepared to do a blind emit_move_insn() on an arbitrary MEM, before it even validates the displacements in such a MEM as needing reloads or not, it has to do something to accomodate this situation. If LRA had not done the special SUBREG processing on this insn, it indeed would have fixed up the invalid displacement using a reload. Anyways, here is the patch I am working with. diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md index e12efa1..7a18879 100644 --- a/gcc/config/sparc/constraints.md +++ b/gcc/config/sparc/constraints.md @@ -44,6 +44,8 @@ (define_register_constraint "h" "(TARGET_V9 && TARGET_V8PLUS ? I64_REGS : NO_REGS)" "64-bit global or out register in V8+ mode") +(define_register_constraint "U" "(TARGET_ARCH32 ? GENERAL_REGS : NO_REGS)") + ;; Floating-point constant constraints (define_constraint "G" @@ -135,51 +137,6 @@ (match_code "mem") (match_test "memory_ok_for_ldd (op)"))) -;; This awkward register constraint is necessary because it is not -;; possible to express the "must be even numbered register" condition -;; using register classes. The problem is that membership in a -;; register class requires that all registers of a multi-regno -;; register be included in the set. It is add_to_hard_reg_set -;; and in_hard_reg_set_p which populate and test regsets with these -;; semantics. -;; -;; So this means that we would have to put both the even and odd -;; register into the register class, which would not restrict things -;; at all. -;; -;; Using a combination of GENERAL_REGS and HARD_REGNO_MODE_OK is not a -;; full solution either. In fact, even though IRA uses the macro -;; HARD_REGNO_MODE_OK to calculate which registers are prohibited from -;; use in certain modes, it still can allocate an odd hard register -;; for DImode values. This is due to how IRA populates the table -;; ira_useful_class_mode_regs[][]. It suffers from the same problem -;; as using a register class to describe this restriction. Namely, it -;; sets both the odd and even part of an even register pair in the -;; regset. Therefore IRA can and will allocate odd registers for -;; DImode values on 32-bit. -;; -;; There are legitimate cases where DImode values can end up in odd -;; hard registers, the most notable example is argument passing. -;; -;; What saves us is reload and the DImode splitters. Both are -;; necessary. The odd register splitters cannot match if, for -;; example, we have a non-offsetable MEM. Reload will notice this -;; case and reload the address into a single hard register. -;; -;; The real downfall of this awkward register constraint is that it does -;; not evaluate to a true register class like a bonafide use of -;; define_register_constraint would. This currently means that we cannot -;; use LRA on Sparc, since the constraint processing of LRA really depends -;; upon whether an extra constraint is for registers or not. It uses -;; reg_class_for_constraint, and checks it against NO_REGS. -(define_constraint "U" - "Pseudo-register or hard even-numbered integer register" - (and (match_test "TARGET_ARCH32") - (match_code "reg") - (ior (match_test "REGNO (op) < FIRST_PSEUDO_REGISTER") - (not (match_test "reload_in_progress && reg_renumber [REGNO (op)] < 0"))) - (match_test "register_ok_for_ldd (op)"))) - ;; Equivalent to 'T' but available in 64-bit mode (define_memory_constraint "W" "Memory refere
Re: LRA reloads of subregs
From: Segher Boessenkool Date: Fri, 4 Sep 2015 06:46:04 -0500 > On Thu, Sep 03, 2015 at 11:19:43PM -0700, David Miller wrote: >> The paradoxical subreg restriction in general_operand() is only >> enforced when reload_completed is true, which will not be the >> case while LRA is working. > > This one? > > #ifdef INSN_SCHEDULING > /* On machines that have insn scheduling, we want all memory >reference to be explicit, so outlaw paradoxical SUBREGs. >However, we must allow them after reload so that they can >get cleaned up by cleanup_subreg_operands. */ > if (!reload_completed && MEM_P (sub) > && GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub))) > return 0; > #endif > > I think you misread that. Also doc/rtl.texi makes pretty clear that > you really shouldn't see subregs of mem. So where does it come from? I see what you are saying, I'll take a look into this. Thanks.
Re: LRA reloads of subregs
From: David Miller Date: Fri, 04 Sep 2015 11:30:26 -0700 (PDT) > From: Segher Boessenkool > Date: Fri, 4 Sep 2015 06:46:04 -0500 > >> On Thu, Sep 03, 2015 at 11:19:43PM -0700, David Miller wrote: >>> The paradoxical subreg restriction in general_operand() is only >>> enforced when reload_completed is true, which will not be the >>> case while LRA is working. >> >> This one? >> >> #ifdef INSN_SCHEDULING >> /* On machines that have insn scheduling, we want all memory >> reference to be explicit, so outlaw paradoxical SUBREGs. >> However, we must allow them after reload so that they can >> get cleaned up by cleanup_subreg_operands. */ >> if (!reload_completed && MEM_P (sub) >>&& GET_MODE_SIZE (mode) > GET_MODE_SIZE (GET_MODE (sub))) >> return 0; >> #endif >> >> I think you misread that. Also doc/rtl.texi makes pretty clear that >> you really shouldn't see subregs of mem. So where does it come from? > > I see what you are saying, I'll take a look into this. It looks like it is created in LRA itself, initially LRA is looking at: (insn 1100 1099 1101 14 (set (reg:SI 3376) (ior:SI (subreg:SI (reg:QI 287) 0) (subreg:SI (reg:QI 289) 0))) x.c:104 234 {iorsi3} (expr_list:REG_DEAD (reg:QI 289) (expr_list:REG_DEAD (reg:QI 287) (nil in curr_insn_transform(), and emits the move: (set (reg:SI 13423) (subreg:SI (reg:QI 287) 0)) Later the reg inside of this subreg appears to get transformed into an on-stack MEM. (insn 18631 1099 1100 14 (set (reg:SI 13423) (subreg:SI (mem/c:QI (plus:SI (reg/f:SI 101 %sfp) (const_int -14269 [0xc843])) [0 %sfp+-14269 S1 A8]) 0)) x.c:104 63 {*movsi_insn} (expr_list:REG_DEAD (reg:QI 287) (nil))) I suppose perhaps I need to make the input_operand predicate more strict on sparc. So I'll look into that now.
Re: LRA reloads of subregs
From: David Miller Date: Fri, 04 Sep 2015 11:27:31 -0700 (PDT) > From: Vladimir Makarov > Date: Fri, 4 Sep 2015 10:00:54 -0400 > >> I don't think we should add a new LRA code calling process_address >> before adding insns for further processing. LRA just needs to get >> operands from insns to make them valid. So again I'd try to make insn >> recognizable for LRA first and only if it does not work then think >> about other solutions in case when such change creates other problems >> (it is hard for me to predict LRA behaviour definitely just reading >> source files and not knowing sparc port well). I've taken some time to see exactly what is going on here, perhaps you can give me some guidance, I'm quite happy to implement anything :-) We start with: (insn 1100 1099 1101 14 (set (reg:SI 3376) (ior:SI (subreg:SI (reg:QI 287) 0) (subreg:SI (reg:QI 289) 0))) x.c:104 234 {iorsi3} (expr_list:REG_DEAD (reg:QI 289) (expr_list:REG_DEAD (reg:QI 287) (nil LRA emits, in curr_insn_transform(): (set (reg:SI 13423) (subreg:SI (reg:QI 287) 0)) LRA then spills the subreg onto the stack, which gives us: (insn 18631 1099 1100 14 (set (reg:SI 13423) (subreg:SI (mem/c:QI (plus:SI (reg/f:SI 101 %sfp) (const_int -14269 [0xc843])) [0 %sfp+-14269 S1 A8]) 0)) x.c:104 63 {*movsi_insn} (expr_list:REG_DEAD (reg:QI 287) (nil))) And this is where we run into trouble in simplify_operand_subreg(), which seems to force reloads for all SUBREGs of MEM. Normally, if there were no SUBREG here, LRA would run process_address() over the MEMs in this instruction and all would be well. It is also the case that I cannot do anything special in the SPARC move emitter to handle this, as address validization is disabled when lra_in_progress is true.
Re: Converting to LRA (calling all maintainers)
From: Eric Botcazou Date: Fri, 16 Sep 2016 23:43:43 +0200 >> p.s. Are there plans for converting the SPARC port? > > There are more than plans - actual patches by DaveM that were installed at > some point and then reverted quickly because of unexpected fallout. Yeah, sparc64 failed to bootstrap and it was the stage2 that was miscompiled. I lacked the time to debug it properly so we reverted.
Re: Converting to LRA (calling all maintainers)
From: Eric Botcazou Date: Sat, 17 Sep 2016 10:18:23 +0200 >> I lacked the time to debug it properly so we reverted. > > Do you plan to give it a try again in the near future? I was going to work on this over the past summer, but other responsibilities took up all of my time. Probably the earliest I could look into this again would be November.
Re: Converting to LRA (calling all maintainers)
From: Eric Botcazou Date: Tue, 03 Jan 2017 22:22:05 +0100 >> p.s. Are there plans for converting the SPARC port? > > The SPARC port has now been converted. Thanks so much for doing this work, I wish I could have been more helpful.
Re: [sparc64] kernel OOPS with gcc 7.1 / 7.2
From: Anthony Yznaga Date: Tue, 15 Aug 2017 17:45:12 -0700 > I compiled a kernel with gcc 7 and found that the compiler inserted a > call to __multi3() in mq_attr_ok(). The sparc64 implementation of > __multi3() was added by 1b4af13ff2cc specifically for gcc 7 and later, > but it clobbers %g4 and %g5. Not sure if that was intended but it > looks like __multi3() is not safe to call from kernel code. Good catch, we have to redo the register allocation in that routine to fix this.
Re: [sparc64] kernel OOPS with gcc 7.1 / 7.2
From: Anatoly Pugachev Date: Tue, 15 Aug 2017 21:50:45 +0300 > Together with Dmitry (ldv) , we've discovered that running test suite > from strace produces kernel OOPS, when kernel is compiled with gcc 7.1 > or with gcc 7.2 , but not with gcc 6 : Please try this patch: diff --git a/arch/sparc/lib/multi3.S b/arch/sparc/lib/multi3.S index d6b6c97..703127a 100644 --- a/arch/sparc/lib/multi3.S +++ b/arch/sparc/lib/multi3.S @@ -5,26 +5,26 @@ .align 4 ENTRY(__multi3) /* %o0 = u, %o1 = v */ mov %o1, %g1 - srl %o3, 0, %g4 - mulx%g4, %g1, %o1 + srl %o3, 0, %o4 + mulx%o4, %g1, %o1 srlx%g1, 0x20, %g3 - mulx%g3, %g4, %g5 - sllx%g5, 0x20, %o5 - srl %g1, 0, %g4 + mulx%g3, %o4, %g7 + sllx%g7, 0x20, %o5 + srl %g1, 0, %o4 sub %o1, %o5, %o5 srlx%o5, 0x20, %o5 - addcc %g5, %o5, %g5 + addcc %g7, %o5, %g7 srlx%o3, 0x20, %o5 - mulx%g4, %o5, %g4 + mulx%o4, %o5, %o4 mulx%g3, %o5, %o5 sethi %hi(0x8000), %g3 - addcc %g5, %g4, %g5 - srlx%g5, 0x20, %g5 + addcc %g7, %o4, %g7 + srlx%g7, 0x20, %g7 add %g3, %g3, %g3 movcc %xcc, %g0, %g3 - addcc %o5, %g5, %o5 - sllx%g4, 0x20, %g4 - add %o1, %g4, %o1 + addcc %o5, %g7, %o5 + sllx%o4, 0x20, %o4 + add %o1, %o4, %o1 add %o5, %g3, %g2 mulx%g1, %o2, %g1 add %g1, %g2, %g1
Re: [sparc64] kernel OOPS with gcc 7.1 / 7.2
From: Anatoly Pugachev Date: Wed, 16 Aug 2017 11:42:43 +0300 > On Wed, Aug 16, 2017 at 7:30 AM, David Miller wrote: >> From: Anatoly Pugachev >> Date: Tue, 15 Aug 2017 21:50:45 +0300 >> >>> Together with Dmitry (ldv) , we've discovered that running test suite >>> from strace produces kernel OOPS, when kernel is compiled with gcc 7.1 >>> or with gcc 7.2 , but not with gcc 6 : >> >> Please try this patch: > > Dave, > > this patch fixes OOPS, thanks. Tested on ldom (gcc 7.2, git kernel + > patch, git strace). Thanks for testing.
Re: ARM unaligned MMIO access with attribute((packed))
From: Russell King - ARM Linux Date: Wed, 2 Feb 2011 16:37:02 + > 1. there's no way to tell GCC that the inline assembly is a load >instruction and therefore it needs to schedule the following >instructions appropriately. Just add a dummy '"m" (pointer)' asm input argument to the inline asm statement. Just make sure "typeof(pointer)" has a size matching the size of the load your are performing. > 2. GCC will needlessly reload pointers from structures and other such >behaviour because it can't be told clearly what the inline assembly >is doing, so the inline asm needs to have a "memory" clobber. This behavior is correct, and in fact needed. Writing to chip registers can trigger changes to arbitrary main memory locations. > 3. It seems to misses out using the pre-index addressing, prefering to >create add/sub instructions prior to each inline assembly load/store. Yes, this is indeed a problem. But you really need that memory clobber there whether you like it or not, see above.
Re: ARM unaligned MMIO access with attribute((packed))
From: Russell King - ARM Linux Date: Wed, 2 Feb 2011 21:45:22 + > On Wed, Feb 02, 2011 at 01:38:31PM -0800, David Miller wrote: >> From: Russell King - ARM Linux >> Date: Wed, 2 Feb 2011 16:37:02 + >> >> > 1. there's no way to tell GCC that the inline assembly is a load >> >instruction and therefore it needs to schedule the following >> >instructions appropriately. >> >> Just add a dummy '"m" (pointer)' asm input argument to the inline asm >> statement. Just make sure "typeof(pointer)" has a size matching the >> size of the load your are performing. > > That involves this problematical cast from a packed struct pointer to > an unsigned long pointer, which according to the C standard and GCC > folk is undefined. It's alignment may be undefined, but it's size definitely is well defined and that's what matters here. > Practice over the last 15 years on ARM has also shown that this is not > necessary. Sorry oh big super man, little ole' me is only a kernel newbie.
Re: ARM unaligned MMIO access with attribute((packed))
From: Måns Rullgård Date: Wed, 02 Feb 2011 23:08:01 + > David Miller writes: > >> From: Russell King - ARM Linux >> Date: Wed, 2 Feb 2011 16:37:02 + >> >>> 1. there's no way to tell GCC that the inline assembly is a load >>>instruction and therefore it needs to schedule the following >>>instructions appropriately. >> >> Just add a dummy '"m" (pointer)' asm input argument to the inline asm >> statement. Just make sure "typeof(pointer)" has a size matching the >> size of the load your are performing. > > That should be "m"(*pointer). Right, thanks for the correction. >> But you really need that memory clobber there whether you like it or >> not, see above. > > I don't know of any device where the side-effects are not explicitly > indicated by other means in the code triggering them, so it probably > is safe without the clobber as Russel says. You're probably right.
float "op-and-halve"
I'm planning to support some new instructions found in recent sparc cpus, specifically VIS 3.0 adds a series of "X and halve" floating-point instructions where X is one of "add" or "subtract". There are variants which negate the result as well. They operate similar to FMA in that all the operations are performed and then rounding occurs only one time at the very end. The advantage of having these "halve" variants is that since the chip does the calculations with a larger amount of precision internally, the final result cannot overflow. Does any other cpu support this, and in particular in GCC? If not, does anyone have any suggestions on how do you model this? It'd be real disappointing to have to unspec these things and only be able to access them using builtins.
Re: VIS2 pattern review
From: Richard Henderson Date: Wed, 12 Oct 2011 17:49:19 -0700 > There's a code sample 7-1 that illustrates a 16x16 multiply: > > fmul8sux16 %f0, %f1, %f2 > fmul8ulx16 %f0, %f1, %f3 > fpadd16%f2, %f3, %f4 Be wary of code examples that don't even assemble (even numbered float registers are required here). fmul8sux16 basically does, for each element: src1 = (rs1 >> 8) & 0xff; src2 = rs2 & 0x; product = src1 * src2; scaled = (product & 0x0000) >> 8; if (product & 0x80) scaled++; rd = scaled & 0x; fmul8ulx16 does the same except the assignment to src1 is: src1 = rs1 & 0xff; Therefore, I think this "16 x 16 multiply" operation isn't the kind you think it is, and it's therefore not appropriate to use this in the compiler for vector multiplies. Just for shits and grins I tried it and the slp-7 testcase, as expected, fails. The main multiply loop in that test case is compiled to: sethi %hi(.LLC6), %i3 sethi %hi(in2), %g1 ldd [%i3+%lo(.LLC6)], %f22 sethi %hi(.LLC7), %i4 sethi %hi(.LLC8), %i2 sethi %hi(.LLC9), %i3 add %fp, -256, %g2 ldd [%i4+%lo(.LLC7)], %f20 or %g1, %lo(in2), %g1 ldd [%i2+%lo(.LLC8)], %f18 mov %fp, %i5 ldd [%i3+%lo(.LLC9)], %f16 mov %g1, %g4 mov %g2, %g3 .LL10: ldd [%g4+8], %f14 ldd [%g4+16], %f12 fmul8sux16 %f14, %f22, %f26 ldd [%g4+24], %f10 fmul8ulx16 %f14, %f22, %f24 ldd [%g4], %f8 fmul8sux16 %f12, %f20, %f34 fmul8ulx16 %f12, %f20, %f32 fmul8sux16 %f10, %f18, %f30 fpadd16 %f26, %f24, %f14 fmul8ulx16 %f10, %f18, %f28 fmul8sux16 %f8, %f16, %f26 fmul8ulx16 %f8, %f16, %f24 fpadd16 %f34, %f32, %f12 std %f14, [%g3+8] fpadd16 %f30, %f28, %f10 std %f12, [%g3+16] fpadd16 %f26, %f24, %f8 std %f10, [%g3+24] std %f8, [%g3] add %g3, 32, %g3 cmp %g3, %i5 bne,pt %icc, .LL10 add%g4, 32, %g4 and it simply gives the wrong results. The entire out2[] array is all zeros.
Re: VIS2 pattern review
From: David Miller Date: Thu, 13 Oct 2011 14:26:36 -0400 (EDT) > product = src1 * src2; > > scaled = (product & 0x0000) >> 8; > if (product & 0x80) > scaled++; In fact, all of the partitioned multiply instructions scale the result by 8 bits with rounding towards positive infinity. Therefore, we have to use an unspec for all of them.
Re: VIS2 pattern review
From: Richard Henderson Date: Wed, 12 Oct 2011 17:49:19 -0700 > The comment for fpmerge_vis is not correct. > I believe that the operation is representable with > > (vec_select:V8QI > (vec_concat:V8QI > (match_operand:V4QI 1 ...) > (match_operand:V4QI 2 ...) > (parallel [ > 0 4 1 5 2 6 3 7 > ])) > > which can be used as the basis for both of the > > vec_interleave_lowv8qi > vec_interleave_highv8qi > > named patterns. Agreed. > AFAICS, this needs an unspec, like fmul8x16al. > Similarly for fmul8sux16_vis, fmuld8sux16_vis, Yes, as we found all the partitioned multiplies need to be unspecs. >> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus]) >> (define_code_attr vis3_addsub_ss_insn >> [(ss_plus "fpadds") (ss_minus "fpsubs")]) >> >> (define_insn "_vis" >> [(set (match_operand:VASS 0 "register_operand" "=") >> (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" >> "") >> (match_operand:VASS 2 "register_operand" >> "")))] >> "TARGET_VIS3" >> "\t%1, %2, %0") > > These should be exposed as "ssadd3" "sssub3". Agreed. I'm currently regstrapping the patch at the end of this mail and will commit it to trunk if no regressions pop up. I'll look into the rest of your feedback. But I want to look into a more fundamental issue with VIS support before moving much further. I worked for several evenings on adding support for the VIS3 instructions that move directly between float and integer regs. I tried really hard to get the compiler to do something sensible but it's next to impossible for two reasons: 1) We don't represent single entry vectors using vector modes, we just use SImode, DImode etc. I think this is a huge mistake, because the compiler now thinks it can do "SImode stuff" in the float regs. Other backends are able to segregate vector vs. non-vector operations by using the single entry vector modes. 2) In addition to that, because of how we number the registers for allocation on sparc for leaf functions, the compiler starts trying to reload SImode and DImode values into the floating point registers before trying to use the non-leaf integer regs. Because the leaf allocation order is "leaf integer regs", "float regs", "non-leaf integer regs". Even if I jacked up the register move cost for the cases where the VIS3 instructions applied, it still did these kinds of reloads. This also gets reload into trouble because it believes that if it can move an address value (say Pmode == SImode) from one place to another, then a plus on the same operands can be performed (with perhaps minor reloading). But that doesn't work when this "move" is "move float reg to int reg" and therefore the operands are "%f12" and "%g3". It tries to do things like "(plus:SI (reg:SI %f12) (reg:SI %g3))" All of these troubles would be eliminated if we used vector modes for all the VIS operations instead of using SImode and DImode for the single entry vector cases. Unfortunately, that would involve some ABI changes for the VIS builtins. I'm trending towards considering just changing things anyways since the VIS intrinsics were next to unusable beforehand. I've scoured the net for examples of people actually using the GCC intrinsics before all of my recent changes, and they all fall into two categories: 1) they use inline assembler because the VIS intrinsics don't work and 2) they try to use the intrinsics but the code is disabled because it "doesn't work". Fix the RTL of some sparc VIS patterns. * config/sparc/sparc.md (UNSPEC_FPMERGE): Delete. (UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New unspecs. (fpmerge_vis): Remove inaccurate comment, represent using vec_select of a vec_concat. (vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns. (fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis): Reimplement as unspecs and remove inaccurate comments. (vis3_shift_patname): New code attr. (_vis): Rename to "v3". (vis3_addsub_ss_patname): New code attr. (_vis): Rename to "". diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 017594f..ae36634 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,18 @@ 2011-10-12 David S. Miller + * config/sparc/sparc.md (UNSPEC_FPMERGE): Delete. + (UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New + unspecs. + (fpmerge_vis): Remove inaccurate comment, represent using vec_select + of a vec_concat. + (vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns. + (fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis): + Reimplement as unspecs and remove inaccurate comments. + (vis3_shift_patname): New code attr. + (_vis): Rename to "v3". + (vis3_addsub_ss_patname): New code attr. + (_vis): Rename to "". +
Re: VIS2 pattern review
From: Richard Henderson Date: Thu, 13 Oct 2011 13:06:19 -0700 > On 10/13/2011 12:55 PM, David Miller wrote: >> -(define_insn "_vis" >> +(define_insn "" > > Missing a "3" on the end. Otherwise these look ok. Thanks for finding that. >> Unfortunately, that would involve some ABI changes for the VIS >> builtins. I'm trending towards considering just changing things >> anyways since the VIS intrinsics were next to unusable beforehand. > > Why? You can do just about anything you like inside the builtin > expander, including frobbing the modes around. Hmmm, ok, I'll look into approaching the change that way. Thanks again Richard.
Re: VIS2 pattern review
From: Eric Botcazou Date: Fri, 14 Oct 2011 00:41:42 +0200 >> Unfortunately, that would involve some ABI changes for the VIS >> builtins. I'm trending towards considering just changing things >> anyways since the VIS intrinsics were next to unusable beforehand. > > Could you elaborate? The calling conventions for vectors (like for the other > classes) shouldn't depend on the mode but only on the type. Right and as Richard said I can munge the modes during expansion of existing builtins when needed.
extending fpmuls
While working on some test cases I noticed that the 'fsmuld' instruction on sparc was not being matched by the combiner for things like: double fsmuld (float a, float b) { return a * b; } Combine does try to match: (set x (float_extend:DF (mul:SF y z))) instead of what backends (and in particular at least Sparc and Alpha) seem to use canonically for this pattern which is: (set x (mul:DF (float_extend:DF y) (float_extend:DF y))) Something similar happens for: double fnsmuld (float a, float b) { return -(a * b); } which combine should match to the *fnsmuld sparc.md pattern, but similar to above combine tries: (set x (float_extend:DF (mul:SF (neg:SF y) z))) instead of: (set x (mul:DF (neg:DF (float_extend:DF y) (float_extend:DF z Which is right? "Canonicalization of Instructions" in the internals documentation doesn't give any guidance :-)
Re: extending fpmuls
From: Jakub Jelinek Date: Tue, 25 Oct 2011 10:00:50 +0200 > I bet > double fsmuld (float a, float b) > { > return (double) a * b; > } > instead will match your pattern, then the operands are first extended > into double and then multiplied into a double product. Right, in existing testcases I've used "(double)a * (double) b" to trigger the pattern. Thanks for explaining.
cprop_reg problem on sparc
Although copy_value() in regcprop.c tries to avoid recording cases where substitutions would be illegal, there are some bad cases it still can let through. On 64-bit sparc, integer regs are 64-bit and float regs are (basically) 32-bit. So HARD_REGNO_NREGS(float_reg, DFmode) is 2, and HARD_REGNO_NREGS(integer_reg, DImode) is 1. cprop sees the sequence: (insn 330 172 230 .. (set (reg:DI %g2) const_int) (insn 171 330 173 .. (set (reg:DF %f10) (reg:DF %g2))) (insn 173 171 222 .. (set (reg:DF %f2) (reg:DF %f10))) (insn 222 173 223 .. (set (MEM:SI ..) (reg:SI %f10))) (insn 223 222 174 .. (set (MEM:SI ..) (reg:SI %f11))) And then it believes that in insn 222 it can replace %f10 with %g2, but this is not a correct transformation. cprop uses hard_regno_nregs[][] to attempt to detect illegal cases like this one, but such checks will not trigger here because hard_regno_nregs[][] is '1' for all of the registers being inspected: hard_regno_nregs[][] (reg:SI f10) 1 hard_regno_nregs[][] (reg:DI g2)1 The (set (reg:DI %g2) const_int) is generated by the *movdf_insn_sp64 insn which in turn triggers a splitter for loading float constants into integer registers. The MEM:SI stores are reloads generated by IRA for a pseudo that has to live across a call. For whatever reason it allocated only a 4-byte aligned stack location, and I suppose that is why the reload is split into 2 SImode pieces. To reproduce build gcc.c-torture/execute/ieee/mzero.c with "-m64 -mcpu=niagara3 -O2" on sparc. I'm suspecting that perhaps cprop is ok, and the real issue is that sparc's definition of CANNOT_CHANGE_MODE_CLASS needs to be adjusted.
Re: cprop_reg problem on sparc
From: Eric Botcazou Date: Thu, 27 Oct 2011 15:17:40 +0200 >> To reproduce build gcc.c-torture/execute/ieee/mzero.c with >> "-m64 -mcpu=niagara3 -O2" on sparc. > > AFAICS there is no such file as gcc.c-torture/execute/ieee/mzero.c. Sorry, the final path component should be "mzero2.c"
Re: cprop_reg problem on sparc
From: Eric Botcazou Date: Thu, 27 Oct 2011 15:17:40 +0200 >> On 64-bit sparc, integer regs are 64-bit and float regs are >> (basically) 32-bit. So HARD_REGNO_NREGS(float_reg, DFmode) is 2, and >> HARD_REGNO_NREGS(integer_reg, DImode) is 1. >> >> cprop sees the sequence: >> >> (insn 330 172 230 .. (set (reg:DI %g2) const_int) >> (insn 171 330 173 .. (set (reg:DF %f10) (reg:DF %g2))) >> (insn 173 171 222 .. (set (reg:DF %f2) (reg:DF %f10))) >> (insn 222 173 223 .. (set (MEM:SI ..) (reg:SI %f10))) >> (insn 223 222 174 .. (set (MEM:SI ..) (reg:SI %f11))) >> >> And then it believes that in insn 222 it can replace %f10 with %g2, >> but this is not a correct transformation. >> >> cprop uses hard_regno_nregs[][] to attempt to detect illegal cases >> like this one, but such checks will not trigger here because >> hard_regno_nregs[][] is '1' for all of the registers being inspected: >> >> hard_regno_nregs[][] (reg:SI f10) 1 >> hard_regno_nregs[][] (reg:DI g2)1 > > There seems to be a hole in the checks, as the number of registers is 2 for > some of the intermediate steps. I think cprop_reg can legitimately only consider the candidate read replacement (reg:SI %f10) with the most recent store to it's equivalent value (reg:DI %g2). In fact, that's exactly what cprop_reg's core algorithm is :-) The issue is how to test properly that accesses to a value in two register accesses is equivalent after a move. This pass is currently using hard_regno_nregs[][] and a paradoxical subreg test to achieve that, but it's obviously not sufficient. Note that it would actually be legal to make the transformation on insn 223, replacing %f11 with %g2. But I'll note that if IRA had properly spilled %f10 to the stack using an 8-byte aligned stack slot and DFmode, we wouldn't even be having to consider this situation. I think cprop_reg should cope with this properly, but I also think that it would be nice if we worked to minimize unnecessary mode changes like those seen here. The const-->DFmode splitter in sparc.md is partly to blame, as is IRA.
Re: cprop_reg problem on sparc
From: Eric Botcazou Date: Thu, 27 Oct 2011 23:11:33 +0200 >> Sorry, the final path component should be "mzero2.c" > > Thanks. I think we that need the same treatment in: ... > as: ... > i.e. we need to bail out if we are narrowing and this is a big-endian target. I quickly tried the patch below, but this does not prevent the transformation. diff --git a/gcc/regcprop.c b/gcc/regcprop.c index ad92a64..54e008b 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -448,6 +448,14 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, struct value_data *vd) if (hard_regno_nregs[regno][mode] > hard_regno_nregs[regno][vd->e[regno].mode]) return NULL_RTX; + + /* And likewise, if we are narrowing on big endian the transformation +is also invalid. */ + if (hard_regno_nregs[regno][mode] + < hard_regno_nregs[regno][vd->e[regno].mode] + && (GET_MODE_SIZE (vd->e[regno].mode) > UNITS_PER_WORD + ? WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)) + return NULL_RTX; } for (i = vd->e[regno].oldest_regno; i != regno; i = vd->e[i].next_regno)
Re: cprop_reg problem on sparc
From: Eric Botcazou Date: Thu, 27 Oct 2011 23:55:00 +0200 >> I quickly tried the patch below, but this does not prevent the >> transformation. > > The quoted code is in copyprop_hardreg_forward_1. Indeed :-) This patch below works for the specific test case, and I'll post to gcc-patches and commit it after regstrapping. Thanks Eric! Fix illegal register substitutions on big-endian during cprop_reg. * regcprop.c (copyprop_hardreg_forward_1): Reject the transformation when we narrow the mode on big endian. --- gcc/ChangeLog |5 + gcc/regcprop.c |8 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 403fb60..54e059e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-10-27 David S. Miller + + * regcprop.c (copyprop_hardreg_forward_1): Reject the + transformation when we narrow the mode on big endian. + 2011-10-27 Jakub Jelinek * config/i386/sse.md (avx_cvtpd2dq256_2, avx_cvttpd2dq256_2, diff --git a/gcc/regcprop.c b/gcc/regcprop.c index ad92a64..b0f0343 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -824,6 +824,14 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) if (hard_regno_nregs[regno][mode] > hard_regno_nregs[regno][vd->e[regno].mode]) goto no_move_special_case; + + /* And likewise, if we are narrowing on big endian the transformation +is also invalid. */ + if (hard_regno_nregs[regno][mode] + < hard_regno_nregs[regno][vd->e[regno].mode] + && (GET_MODE_SIZE (vd->e[regno].mode) > UNITS_PER_WORD + ? WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)) + goto no_move_special_case; } /* If the destination is also a register, try to find a source -- 1.7.6.401.g6a319
scalar vector shift expansion problem on 64-bit
I'm getting an ICE on 64-bit sparc for some vector test cases but I'm not sure where the fix belongs. When the compiler expands a vecor shift by scalar into a vector shift by a vector it uses expand_vector_broadcast(), which has a comment which states: "The mode of OP must be the element mode of VMODE." But no such guarentee exists, and during the compilation of gcc.dg/vect/pr33953.c we end up trying to expand a broadcast where the vector mode is V2SI and the scalar object is of type DImode. The tree node for the shift count that gets expanded by expand_normal() into this DImode rtx looks like: unit size align 32 symtab 0 alias set 2 canonical type 0xf71e63c0 precision 32 min max pointer_to_this > visited var def_stmt GIMPLE_NOP version 12> Anyways, in the sparc backend I made use of the invariant expand_vector_broadcast() mentions, and assumed that the vector inner mode equals the mode of the RTX object being broadcast. This results in a crash because I end up making a emit_move_insn() call where the modes don't match up. So should expand_vector_broadcast() really provide this invariant to the vec_init expander, or does the vec_init expander need to tidy things up with gen_lowpart() etc. calls?
PR c++/39480 not really fixed
g++.dg/init/copy7.C makes sure that memcpy() is not emitted with src and dst equal. The fix installed absolutely relies upon a backend implementing the movmem pattern, and essentially that such a pattern will always succeed to emit for arbitrary circumstances. However 1) not all platforms implement the pattern, it is not required, therefore it cannot be relied upon for correct code generation and 2) even those that implement this pattern have certain restrictions which could legitimately be triggered and thus cause the memcpy() with src and dst equal to still be emitted. Sparc is currently failing this testcase for reason #1, but it would be trivial to make minor modifications to the copy7.C test case to get it to fail on a score of several targets as well (even those that provide a movmem pattern). So we should either install a more complete fix or open the bug back up.
Re: PR c++/39480 not really fixed
From: Richard Guenther Date: Fri, 28 Oct 2011 11:27:25 +0200 > On Fri, Oct 28, 2011 at 9:48 AM, David Miller wrote: >> >> g++.dg/init/copy7.C makes sure that memcpy() is not emitted with >> src and dst equal. > > The testcase is bogus and should be removed. See the patch I posted > (and the PR). Unfortunately the patch didn't get any review yet. Thanks. Which patch and PR are you talking about, 39480? Meanwhile, if the conclusion is that memcpy(x, x, ...) is legal, I disagree. I can think of implementations where this will break. For example, if the memcpy implementation does a cache initializing store on the first cache line of the destination before fetching the beginning of the source, such memcpy() calls with src==dst will produce garbage.
Re: PR c++/39480 not really fixed
From: Richard Guenther Date: Fri, 28 Oct 2011 12:47:30 +0200 > Then we have to fix the middle-end which will happily expand > block-moves to memcpy with exact overlap (a = a is valid in C). > See the PR and the C testcases therein. > > Just trying to avoid this in the C++ frontend is bogus. Agreed. > Of course, as Linus would say, such implementation would be "broken" ;) > (I expect that such implementations can (and would) easily avoid the situation > with almost no cost by doing an early out for src == dst). memcpy is not required to be mindful of overlapping buffers. Otherwise there is no point in having seperate memmove() and memcpy() interfaces in the first place. Even internally inside of GCC there is confusion about the movmem patterns. The documentation explicitly states that movmem need not give special consideration to the posibility that source and destination might overlap. But the comments about the expander in the MIPS backend mention that they might overlap. And reality shows that the middle-end will end up expanding such overlapping cases.
vector shift regression on sparc
gcc.dg/pr48616.c segfaults on sparc as of a day or two ago vectorizable_shift() crashes because op1_vectype is NULL and we hit this code path: /* Vector shifted by vector. */ if (!scalar_shift_arg) { optab = optab_for_tree_code (code, vectype, optab_vector); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vector/vector shift/rotate found."); =>if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) dt[1] is vect_external_def and slp_node is non-NULL. Indeed, when the 'dt' arg to vect_is_simple_use_1() is vect_external_def *vectype will be set to NULL. And scalar_shift_arg gets set to false because of this loop check: if (slp_node) { VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (slp_node); gimple slpstmt; FOR_EACH_VEC_ELT (gimple, stmts, k, slpstmt) if (!operand_equal_p (gimple_assign_rhs2 (slpstmt), op1, 0)) scalar_shift_arg = false; } Indeed, this crashing test was added by this change: 2011-10-28 Jakub Jelinek * tree-vect-stmts.c (vectorizable_shift): Give up if op1 has different vector mode from vectype's mode.
Re: [PATCH 1/1] sparc leon: add -Aleon architecture to GAS
Please post binutils patches with the binutils development list CC:'d.
Re: [PATCH 1/1] sparc leon: Use -Aleon assembler switch for -mcpu=leon arch
GCC patches are to be posted to gcc-patches, not gcc.
Re: [PATCH 1/1] sparc leon: add -Aleon architecture to GAS
From: Konrad Eisele Date: Tue, 01 Nov 2011 10:19:04 +0100 > David Miller wrote: >> >> Please post binutils patches with the binutils development list CC:'d. >> >> > > Is the binutils development list bug-binut...@gnu.org ? No, it's binut...@sourceware.org
Re: scalar vector shift expansion problem on 64-bit
From: David Miller Date: Fri, 28 Oct 2011 01:05:54 -0400 (EDT) > So should expand_vector_broadcast() really provide this invariant to > the vec_init expander, or does the vec_init expander need to tidy > things up with gen_lowpart() etc. calls? Richard I don't know if you had a chance to look into this at all yet, but I wanted to make a comment about vec_init in general. I've come to find that I want the compiler to do as little as possible with the expressions that get put into vector initializers. I don't want it to modify the mode of the individual inner elements in the assignment. I also don't want it to force mems into registers. In fact, the less it does the better. I want to make use of the special VIS load instructions that can take a QImode or HImode value in memory and load it zero extended into a 64-bit float register. For example: int x; __v8qi test_v8qi(void) { __v8qi ret = { x, x, x, x, x, x, x, x }; return ret; } I want to be able to generate: test_v8qi: sethi %hi(x + 3), %g1 or%g1, %lo(x + 3), %g1 ldda [%g1] ASI_FL8_P, %f2 sethi %hi(0x), %g2 or%g2, %lo(0x), %g2 bmask %g2, %g0, %g0 retl bshuffle %f2, %f2, %f0 but I can't because the vec_init expander sees: (parallel:V8QI [ (reg:QI 110 [ D.2249 ]) (reg:QI 110 [ D.2249 ]) (reg:QI 110 [ D.2249 ]) (reg:QI 110 [ D.2249 ]) (reg:QI 110 [ D.2249 ]) (reg:QI 110 [ D.2249 ]) (reg:QI 110 [ D.2249 ]) (reg:QI 110 [ D.2249 ]) ]) in operands[1].
serious libgcc regression added recently
My sparc-linux-gnu builds with --enable-targets=all is failing with: ../../../../gcc/libgcc/config/sparc/lb1spc.S: Assembler messages: ../../../../gcc/libgcc/config/sparc/lb1spc.S:124: Error: detected global register use not covered by .register pseudo-op ../../../../gcc/libgcc/config/sparc/lb1spc.S:134: Error: detected global register use not covered by .register pseudo-op ... It looks like it's trying to build 32-bit sparc files during the -m64 64-bit multiarch build or similar. Or, alternatively, doing the 32-bit multiarch build with 64-bit options. And others have reported link failures during the testsuite on other targets. Rainer it seems it might be your changes?
Re: serious libgcc regression added recently
From: Joel Sherrill Date: Wed, 2 Nov 2011 16:29:16 -0500 > Is this similar to what I just got for sparc-rtems when compiling > libgcc2 with -mcpu=v8? > > /tmp/cczMc4jN.s: Assembler messages: > /tmp/cczMc4jN.s:16: Error: Hardware capability "mul32" not enabled for > "smul". > /tmp/cczMc4jN.s:18: Error: Hardware capability "mul32" not enabled for > "smul". > /tmp/cczMc4jN.s:22: Error: Hardware capability "mul32" not enabled for > "umul". > > I can prepare a PR if you think it is different. I don't think so. The bug I'm hitting seems to be simply that config/sparc/t-linux64 wasn't migrated up into the libgcc configure area properly the way that config/sparc/t-linux was. I'm working on a fix right now.
Re: serious libgcc regression added recently
From: David Miller Date: Wed, 02 Nov 2011 18:30:56 -0400 (EDT) > From: Joel Sherrill > Date: Wed, 2 Nov 2011 16:29:16 -0500 > >> Is this similar to what I just got for sparc-rtems when compiling >> libgcc2 with -mcpu=v8? >> >> /tmp/cczMc4jN.s: Assembler messages: >> /tmp/cczMc4jN.s:16: Error: Hardware capability "mul32" not enabled for >> "smul". >> /tmp/cczMc4jN.s:18: Error: Hardware capability "mul32" not enabled for >> "smul". >> /tmp/cczMc4jN.s:22: Error: Hardware capability "mul32" not enabled for >> "umul". >> >> I can prepare a PR if you think it is different. > > I don't think so. The bug I'm hitting seems to be simply that > config/sparc/t-linux64 wasn't migrated up into the libgcc configure > area properly the way that config/sparc/t-linux was. Actually the problem is that libgcc/config.host checks ${host} to decide whether to append config/sparc/t-softmul to the tmake variable. But when multilibbing 64-bit libraries on a 32-bit hosted build configured with --enable-targets=all, the host will be sparc-*-linux even when building the 64-bit libgcc. So t-softmul gets appended anyways, and this causes us to try and build config/sparc/lb1spc.S for the 64-bit libgcc which we should never do. And it will do the wrong thing in the opposite case too, when building a 64-bit hosted sparc gcc, host will be sparc64-*-* when building the multilib 32-bit libgcc, and in that cast config.host will not append t-softfp even when it should. I sure wish these changes got a lot more testing before they were installed :-/
Re: serious libgcc regression added recently
From: David Miller Date: Wed, 02 Nov 2011 18:43:52 -0400 (EDT) > So t-softmul gets appended anyways, and this causes us to try and > build config/sparc/lb1spc.S for the 64-bit libgcc which we should > never do. I tried the patch below but it just results in syntax errors in the Makefile. Is this the way differences between multilib cases are going to be handled now in libgcc, with these backtick shell conditionals that (of all things) looks at the destination directory? What if I want to put 64-bit libraries in a different location such as plain 'lib/' to create a 64-bit pure system or similar? I definitely prefer how this stuff worked beforehand wherein we would know the actual "target" we're building for and we bring in the appropriate "target" makefile fragments based upon that "target". Now we just seem to look at the host and essentially include every possible target makefile that could be multilibbed out of that host. diff --git a/libgcc/config.host b/libgcc/config.host index 05f084b..47e0e73 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1052,7 +1052,8 @@ sparc64-*-freebsd*|ultrasparc-*-freebsd*) ;; sparc64-*-linux*) # 64-bit SPARC's running GNU/Linux extra_parts="$extra_parts crtfastmath.o" - tmake_file="${tmake_file} t-crtfm sparc/t-linux sparc/t-linux64" + tmake_file="${tmake_file} t-crtfm sparc/t-linux sparc/t-linux64 \ +sparc/t-softmul" md_unwind_header=sparc/linux-unwind.h ;; sparc64-*-netbsd*) diff --git a/libgcc/config/sparc/t-softmul b/libgcc/config/sparc/t-softmul index 7142200..5489a37 100644 --- a/libgcc/config/sparc/t-softmul +++ b/libgcc/config/sparc/t-softmul @@ -1,2 +1,4 @@ -LIB1ASMSRC = sparc/lb1spc.S -LIB1ASMFUNCS = _mulsi3 _divsi3 _modsi3 +LIB1ASMSRC = `if test x$$($(CC) -print-multi-os-directory) \ + = x../lib64; then echo sparc/lb1spc.S; fi` +LIB1ASMFUNCS = `if test x$$($(CC) -print-multi-os-directory) \ + = x../lib64; then echo _mulsi3 _divsi3 _modsi3; fi`
Re: serious libgcc regression added recently
From: Andrew Pinski Date: Wed, 2 Nov 2011 16:40:13 -0700 > On Wed, Nov 2, 2011 at 4:28 PM, David Miller wrote: >> +LIB1ASMSRC = `if test x$$($(CC) -print-multi-os-directory) \ >> + = x../lib64; then echo sparc/lb1spc.S; fi` >> +LIB1ASMFUNCS = `if test x$$($(CC) -print-multi-os-directory) \ >> + = x../lib64; then echo _mulsi3 _divsi3 _modsi3; fi` >> > > -print-multi-directory is most likely easier to handle than os-directory. I was just copying the construct Rainer was already using in sparc/t-linux64 :-)
Re: serious libgcc regression added recently
From: "Joseph S. Myers" Date: Thu, 3 Nov 2011 00:22:49 + (UTC) > On Wed, 2 Nov 2011, David Miller wrote: > >> Actually the problem is that libgcc/config.host checks ${host} >> to decide whether to append config/sparc/t-softmul to the tmake >> variable. > > ${host} is the *target* when configuring target libraries. It doesn't represent the 'target' we're generating code for in a multilib instance so we can conditionalize off of it correctly. The only way sparc/t-softfp can be added to tmake is if the host matches sparc-*-* and that's what happens even when we're building the 64-bit libgcc in my case. It's sparc-*-* for both the 32-bit and 64-bit libgcc builds.
Re: serious libgcc regression added recently
From: "Joseph S. Myers" Date: Thu, 3 Nov 2011 01:21:35 + (UTC) > What is new is that you can now put tests in libgcc/configure.ac > such as the "Check 32bit or 64bit for x86." one, and select t-* > files based on those tests - whereas in the gcc/ directory there is > no possibility at all for the choice of t-* files to depend on the > multilib. That works for me, I'm bootstrapping the following: [PATCH] Fix multilib build of libgcc on Linux/sparc. * configure.ac: Set host_address on sparc too. * configure: Regenerate. * config.host: Add sparc/t-linux64 and sparc/t-softmul conditionally based upon host_address. * config/sparc/t-linux64: Set CRTSTUFF_T_CFLAGS unconditionally. --- libgcc/ChangeLog |8 libgcc/config.host| 17 ++--- libgcc/config/sparc/t-linux64 |3 +-- libgcc/configure |7 --- libgcc/configure.ac |7 --- 5 files changed, 31 insertions(+), 11 deletions(-) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 1bbe29a..3944193 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,11 @@ +2011-11-02 David S. Miller + + * configure.ac: Set host_address on sparc too. + * configure: Regenerate. + * config.host: Add sparc/t-linux64 and sparc/t-softmul conditionally + based upon host_address. + * config/sparc/t-linux64: Set CRTSTUFF_T_CFLAGS unconditionally. + 2011-11-02 Rainer Orth * gthr-single.h, gthr.h: New files. diff --git a/libgcc/config.host b/libgcc/config.host index 05f084b..647c6a1 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1008,7 +1008,10 @@ sparc-*-elf*) extra_parts="$extra_parts crti.o crtn.o crtfastmath.o" ;; sparc-*-linux*)# SPARC's running GNU/Linux, libc6 - tmake_file="${tmake_file} t-crtfm sparc/t-linux64" + tmake_file="${tmake_file} t-crtfm" + if test "${host_address}" = 64; then + tmake_file="$tmake_file sparc/t-linux64" + fi case ${host} in *-leon*) tmake_file="${tmake_file} t-fdpbit" @@ -1021,7 +1024,9 @@ sparc-*-linux*) # SPARC's running GNU/Linux, libc6 *-leon[3-9]*) ;; *) - tmake_file="$tmake_file sparc/t-softmul" + if test "${host_address}" = 32; then + tmake_file="$tmake_file sparc/t-softmul" + fi ;; esac extra_parts="$extra_parts crtfastmath.o" @@ -1052,7 +1057,13 @@ sparc64-*-freebsd*|ultrasparc-*-freebsd*) ;; sparc64-*-linux*) # 64-bit SPARC's running GNU/Linux extra_parts="$extra_parts crtfastmath.o" - tmake_file="${tmake_file} t-crtfm sparc/t-linux sparc/t-linux64" + tmake_file="${tmake_file} t-crtfm sparc/t-linux" + if test "${host_address}" = 64; then + tmake_file="${tmake_file} sparc/t-linux64" + fi + if test "${host_address}" = 32; then + tmake_file="${tmake_file} sparc/t-softmul" + fi md_unwind_header=sparc/linux-unwind.h ;; sparc64-*-netbsd*) diff --git a/libgcc/config/sparc/t-linux64 b/libgcc/config/sparc/t-linux64 index ca4a892..6583fe2 100644 --- a/libgcc/config/sparc/t-linux64 +++ b/libgcc/config/sparc/t-linux64 @@ -1,2 +1 @@ -CRTSTUFF_T_CFLAGS = `if test x$$($(CC) -print-multi-os-directory) \ - = x../lib64; then echo -mcmodel=medany; fi` +CRTSTUFF_T_CFLAGS = -mcmodel=medany diff --git a/libgcc/configure b/libgcc/configure index 0d91645..0f18037 100644 --- a/libgcc/configure +++ b/libgcc/configure @@ -4609,11 +4609,12 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_cfi" >&5 $as_echo "$libgcc_cv_cfi" >&6; } -# Check 32bit or 64bit for x86. +# Check 32bit or 64bit for x86 and sparc. case ${host} in -i?86*-*-* | x86_64*-*-*) +i?86*-*-* | x86_64*-*-* | sparc*-*-*) cat > conftest.c < conftest.c <
Re: serious libgcc regression added recently
From: Jakub Jelinek Date: Thu, 3 Nov 2011 09:22:51 +0100 > On Wed, Nov 02, 2011 at 11:41:08PM -0400, David Miller wrote: >> --- a/libgcc/configure.ac >> +++ b/libgcc/configure.ac >> @@ -255,11 +255,12 @@ AC_CACHE_CHECK([whether assembler supports CFI >> directives], [libgcc_cv_cfi], >>[libgcc_cv_cfi=yes], >>[libgcc_cv_cfi=no])]) >> >> -# Check 32bit or 64bit for x86. >> +# Check 32bit or 64bit for x86 and sparc. >> case ${host} in >> -i?86*-*-* | x86_64*-*-*) >> +i?86*-*-* | x86_64*-*-* | sparc*-*-*) >>cat > conftest.c <> -#ifdef __x86_64__ >> +#if defined(__x86_64__) || \ >> +(defined(__sparc__) && defined(__arch64__)) >> host_address=64 >> #else >> host_address=32 > > I think much better would be to handle sparc*/s390*/powerpc* differently > here, just using #ifdef __LP64__ test. i?86/x86_64 is different because > of the third weirdo multilib option. Yes, using __LP64__ for non-x86 is much better. Then we can completely remove the ${host} conditional and nobody will have to hack on this piece of configure code ever again. I'll try to find time ot hack this together if nobody beats me to it.
Re: serious libgcc regression added recently
From: Jakub Jelinek Date: Thu, 3 Nov 2011 09:22:51 +0100 > I think much better would be to handle sparc*/s390*/powerpc* differently > here, just using #ifdef __LP64__ test. i?86/x86_64 is different because > of the third weirdo multilib option. I just tested and committed the following, it seemed to make no sense to keep the case statement there and now the host_address should be available to anyone who wants to make use of it in config.host [PATCH] Tweak libgcc configure test for 64-bit. libgcc/ * configure.ac: Test for 64-bit addresses on !x86 using __LP64__. * configure: Rebuild. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@181000 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgcc/ChangeLog|5 + libgcc/configure| 15 +-- libgcc/configure.ac | 15 +-- 3 files changed, 15 insertions(+), 20 deletions(-) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index ec06a09..d3f091e 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,8 @@ +2011-11-04 David S. Miller + + * configure.ac: Test for 64-bit addresses on !x86 using __LP64__. + * configure: Rebuild. + 2011-11-04 Andreas Krebbel * config/s390/t-crtstuff: Add -fPIC to CRTSTUFF_T_CFLAGS_S diff --git a/libgcc/configure b/libgcc/configure index 0f18037..1895a76 100644 --- a/libgcc/configure +++ b/libgcc/configure @@ -4609,21 +4609,16 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_cfi" >&5 $as_echo "$libgcc_cv_cfi" >&6; } -# Check 32bit or 64bit for x86 and sparc. -case ${host} in -i?86*-*-* | x86_64*-*-* | sparc*-*-*) - cat > conftest.c < conftest.c < conftest.c < conftest.c <
bootstrap regression on sparc
While building libstdc++ I get an assertion failure in haifa-sched.c, specifically the assertion on line 3437 is failing: gcc_assert (!jump_p || ((common_sched_info->sched_pass_id == SCHED_RGN_PASS) && IS_SPECULATION_BRANCHY_CHECK_P (insn)) || (common_sched_info->sched_pass_id == SCHED_EBB_PASS)); I haven't looked more deeply at it, but the first recent suspicious change are the basic block handling changes Alan made two days ago: 2011-11-09 Alan Modra * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. * Makefile.in (function.o): Update dependencies.
Re: bootstrap regression on sparc
From: David Miller Date: Fri, 11 Nov 2011 20:41:23 -0500 (EST) > I haven't looked more deeply at it, but the first recent suspicious change > are the basic block handling changes Alan made two days ago: > > 2011-11-09 Alan Modra > > * function.c (bb_active_p): Delete. ... I've verified that the bootstrap succeeds with this change reverted.
Re: bootstrap regression on sparc
From: Dennis Clarke Date: Sat, 12 Nov 2011 12:51:18 -0500 (EST) >> While building libstdc++ I get an assertion failure in haifa-sched.c, >> specifically the assertion on line 3437 is failing: > > I am seeing no major problems on Sparc at all. What rev of GCC are you > referring to please? As always, with head. I do my builds and bootstraps configured using "--with-cpu=niagara3".
Re: bootstrap regression on sparc
From: Joel Sherrill Date: Sat, 12 Nov 2011 08:34:29 -0600 > From my perspective, the head doesn't look so good. :( I'm extremely disappointed with how the last 2 weeks have gone as well. I can't work on any of the bugs I want to work on because the tree keeps being broken. I guess the end of stage 1 means "dump as much of your half ready not-really-tested crap into the tree as possible". The tree was more stable, and broke less often, while we were still in the midst of stage 1.
Re: bootstrap regression on sparc
From: Hans-Peter Nilsson Date: Sat, 12 Nov 2011 07:25:46 -0500 (EST) > On Fri, 11 Nov 2011, David Miller wrote: >> >> While building libstdc++ I get an assertion failure in haifa-sched.c, >> specifically the assertion on line 3437 is failing: > >> I haven't looked more deeply at it, but the first recent suspicious change >> are the basic block handling changes Alan made two days ago: > > Try <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> > and maybe approve or commit the patch there. That patch indeed does fix the problem for me. However, I don't feel qualified enough on CFG issues to approve or commit that patch. Someone else will need give it the OK. Thanks!
ICE in int_mode_for_mode()
For a few days a lot of new testsuite failures have popped up on sparc, wherein int_mode_for_mode() gets called with "VOIDmode" as an argument from extract_bit_field_1 because "op0" is "(const_int 0)" I have a feeling this is a known problem, but I couldn't find any discussions about this. I strongly suspect the following change caused this problem: 2011-11-16 Andreas Krebbel PR middle-end/50325 * expmed.c (store_bit_field_1): Use extract_bit_field on big endian targets if the source cannot be exactly covered by word mode chunks.
Re: Memory corruption due to word sharing
From: Michael Matz Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) > One problem is that it's not a new problem, GCC emitted similar code since > about forever, and still they turned up only now (well, probably because > ia64 is dead, but sparc64 should have similar problems). Indeed, on sparc64 it does do the silly 64-bit access too: wrong: ldx [%o0+8], %g2 sethi %hi(2147483648), %g1 or %g2, %g1, %g1 jmp %o7+8 stx%g1, [%o0+8] Personally I've avoided C bitfields like the plague in any code I've written.
Re: Preparing to merge ARM/hard_vfp_branch to trunk
From: Eric Botcazou Date: Wed, 5 Aug 2009 17:59:01 +0200 >> I believe that I could legitimately approve that patch myself (it's >> pretty trivial and I didn't author it), but I'd prefer to get approval >> from one of the SPARC maintainers. Here's your chance: >> >> http://gcc.gnu.org/ml/gcc-patches/2009-04/msg01027.html > > OK. It looks fine to me too.
LTO and asm specs...
There is one g++ LTO test case (g++.lto/20090303) that fails on sparc, it compiles the intermediate objects with -fPIC but the final compilation creates an executable. The problem is that when LTO re-instantiates the options for the individual builds, the proper ASM specs of the target are not executed, so in this case "-K PIC" is not passed down to the assembler in response to "-fPIC". As a consequence, relocations against _GLOBAL_OFFSET_TABLE_ in code like this: sethi %hi(_GLOBAL_OFFSET_TABLE_), %g1 use the R_SPARC_HI22 relocation instead of R_SPARC_PC22. Thus the program crashes. I couldn't figure out immediately how to fix this as the way LTO does spec overriding and such looked non-trivial. Thanks.
fixincl 'make check' regressions...
Ever since your changes installed on March 12th, I've been getting fixincludes testsuite failures of the form below. I also notice that none of these changes added ChangeLog entries, and furthermore the SVN commit messages were extremely terse so it was hard to diagnose the intent or reasoning behind your changes. iso/math_c99.h /home/davem/src/GIT/GCC/gcc/fixincludes/tests/base/iso/math_c99.h differ: char 1366, line 52 *** iso/math_c99.h Mon Mar 15 22:55:36 2010 --- /home/davem/src/GIT/GCC/gcc/fixincludes/tests/base/iso/math_c99.h Thu Jan 21 04:06:11 2010 *** *** 49,55 ? __builtin_signbitf(x) \ : sizeof(x) == sizeof(long double) \ ? __builtin_signbitl(x) \ !: __builtin_signbit(x)); #endif /* SOLARIS_MATH_8_CHECK */ --- 49,55 ? __builtin_signbitf(x) \ : sizeof(x) == sizeof(long double) \ ? __builtin_signbitl(x) \ !: __builtin_signbit(x)) #endif /* SOLARIS_MATH_8_CHECK */ There were fixinclude test FAILURES
Re: LTO and asm specs...
From: Richard Henderson Date: Tue, 16 Mar 2010 11:31:44 -0700 > On 03/12/2010 09:33 PM, David Miller wrote: >> I couldn't figure out immediately how to fix this as the >> way LTO does spec overriding and such looked non-trivial. > > It would not be a bad thing, IMO, if the sparc assembler > were extended to be able to emit any reloc directly, without > needing a specific command-line option. Then you'd only > encounter this problem with legacy assemblers. It's not the assemblers fault. We're using %hi() and expecting the assembler to emit a PC relative relcation just because the symbol name happens to be _GLOBAL_OFFSET_TABLE_ And it will do this, but only when -PIC. Changing that is pretty dangerous. But even if we got past that, we need to get the assembler options right in order to enable instruction classes. For example we have to get -Av9a there when using VIS instructions. Other platforms are going to hit things like this too. LTO really needs to evaluate the specs correctly.
Re: LTO and asm specs...
From: Richard Henderson Date: Tue, 16 Mar 2010 12:53:47 -0700 > On 03/16/2010 12:28 PM, David Miller wrote: >> It's not the assemblers fault. >> >> We're using %hi() and expecting the assembler to emit a >> PC relative relcation just because the symbol name happens >> to be _GLOBAL_OFFSET_TABLE_ And it will do this, but only >> when -PIC. Changing that is pretty dangerous. > > It is the assembler's fault because it doesn't provide %pcrelhi() or > some such to allow the compiler (or asm programmer) to emit exactly > the relocation that's desired. There is %pc22() and %pc10. I don't know if it's safe to change gcc to use them in all cases though. >> But even if we got past that, we need to get the assembler options >> right in order to enable instruction classes. For example we have to >> get -Av9a there when using VIS instructions. > > How about ".arch v9a" like other platforms emit? > > Command-line options that control what the assembler emits for > the exact same bit of text are a Really Bad Idea, as we've seen > from other platforms time and time again. I think this distracts from the issue that LTO needs to process specs properly. Are you seriously against fixing that LTO bug?
Re: fixincl 'make check' regressions...
You said you would fix this several nights ago, but I still haven't seen any changes to fixincludes since then. When will you get around to fixing these regressions you introduced? Thank you.
Re: DWARF register numbering discrepancy on SPARC between GCC and GDB
From: Eric Botcazou Date: Wed, 21 Jan 2009 15:22:19 +0100 > > Obviously the GCC folks broke backwards compatibility with themselves. > > So unless we find evidence that contradicts the wiki page you cite, I > > think GCC needs to be fixed. > > Yes, the SVR4 definition used to be masked by that of the sol2.h file on > Solaris and is not anymore. But the SVR4 definition is the one used for > the various BSD variants. Ok, so it seems the fix is to reinstate the override in sol2.h, right?
Re: RFD: simple instruction cache code layout heuristics
From: Ian Lance Taylor Date: Tue, 27 Jan 2009 13:16:31 -0800 > A co-worker of mine at Google did some experiments along those lines > using gold. He was not able to demonstrate any performance > improvements, unfortunately (x86_64 target, proprietary test cases). > He was hacking linker scripts. This sort of optimization helps more on cpus with f.e. software managed TLBs. The higher the TLB miss cost, the more the benefit. This is why Sun's tools have supported this optimization for a long time, since 64-bit sparc has been predominantly using software managed TLB refill. I think it's the "-xlinkopt" options that do this with the SunPRO compiler tools.
Re: The state of glibc libm
From: "Joseph S. Myers" Date: Wed, 29 Feb 2012 17:17:17 + (UTC) Thanks for looking into all of these issues. > (c) Various functions do not set errno correctly (many cases) or raise the > proper floating-point exceptions (a smaller number of cases - both > spurious exceptions where not permitted by ISO C, and failing to raise > required overflow/underflow exceptions). In general this is a separate > bug for each function (filed as many separate bugs in glibc Bugzilla) and > can be fixed by a separate local patch for each function (adding a > testcase, of course - note that glibc's main libm-test.inc presently only > tests invalid and divide-by-zero exceptions, so if working on these error > handling issues it might be useful to extend it to cover other exceptions > as well as errno values). This reminds me that there are math tests local to the powerpc port that therefore only run on powerpc. For example, I've looked at sysdeps/powerpc/test-arith{,f}.c and they don't seem so non-portable that we couldn't run them everywhere with just some small tweaks.
Re: The state of glibc libm
From: Jeff Law Date: Wed, 14 Mar 2012 10:41:27 -0600 > The better performance with potential loss of accuracy is an across > the board request, it's not just a sin/cos issue. All the trig, exp, > pow, log, etc, which I don't think are necessarily covered by using > the old x87 fp unit. This is no surprise to me, vendors have commonly provided some kind of "optimized libm" library that provided better performance with less accuracy. For example, Sun has always had a variant of libm optimized in this way, called "libmopt", with it's SunPRO compilers.
expansion of vector shifts...
On sparc a simple test like (from the PR tree-optimization/53410 testcase): typedef int V __attribute__((vector_size (4 * sizeof (int; typedef unsigned int W __attribute__((vector_size (4 * sizeof (int; void f10 (W *p, W *q) { *p = *p < (((const W) { 1U, 1U, 1U, 1U }) << *q); } aborts in convert_move() because we're trying to move a TImode value into a V2SImode one. How does that happen? On sparc the generic tree vector layer turns the above expression into two V2SImode shifts. The *q parts of each shift are represented as: (subreg:V2SI (reg:TI xxx) 0) (subreg:V2SI (reg:TI xxx) 8) When we get down into expand_shift_1(), that SUBREG is stripped out by the SHIFT_COUNT_TRUNCATED code, and that's how we end up in the crash by the time we reach convert_move() (via expand_binop() --> expand_binop_directly() --> convert_modes() --> convert_move()). Perhaps we should elide the SUBREG stripping if the subreg has a vector mode? Actually, what seems to confuse this code is that we're passing TImode values around for this vector that the target doesn't have direct support for. The SUBREG stripper explicitly checks for INTEGRAL_MODE_P, and indeed TImode is integral.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
From: Jakub Jelinek Date: Mon, 29 Oct 2012 18:56:42 +0100 > I'd like to close the stage 1 phase of GCC 4.8 development > on Monday, November 5th. If you have still patches for new features you'd > like to see in GCC 4.8, please post them for review soon. Patches > posted before the freeze, but reviewed shortly after the freeze, may > still go in, further changes should be just bugfixes and documentation > fixes. I'd like to get the Sparc cbcond stuff in (3 revisions posted) which is waiting for Eric B. to do some Solaris specific work. I'd also like to enable LRA for at least 32-bit sparc, even if I can't find the time to work on auditing 64-bit completely.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
From: Eric Botcazou Date: Mon, 29 Oct 2012 20:25:15 +0100 >> I'd like to get the Sparc cbcond stuff in (3 revisions posted) which >> is waiting for Eric B. to do some Solaris specific work. >> >> I'd also like to enable LRA for at least 32-bit sparc, even if I can't >> find the time to work on auditing 64-bit completely. > > End of stage #1 isn't a hard limit for architecture-specific patches, so we > need not make a decision about LRA immediately. I don't think we want to > half > enable it though, so it's all or nothing. Upon further consideration, agreed. I'll only turn this on if I can get the whole backend working. FWIW, I think we should consider delaying stage1 for another reason. A large number of North American developers are about to be hit by a major natural disaster, and may be without power for weeks.
LRA unaligned reloads
On 32-bit sparc with LRA enabled we have the following (this generated for gcc.dg/vect/pr51581-4.c with -flto): (insn 252 142 165 4 (set (reg:HI 234 [ D.1511 ]) (mem/c:HI (plus:SI (reg/f:SI 1307) (const_int 24 [0x18])) [4 b+24 S2 A64])) test.c:23 59 {*movhi_insn} (expr_list:REG_EQUIV (mem/c:HI (plus:SI (reg/f:SI 1307) (const_int 24 [0x18])) [4 b+24 S2 A64]) (expr_list:REG_EQUAL (mem/c:HI (const:SI (plus:SI (symbol_ref:SI ("b.1425") [flags 0x2] <\ var_decl 0xf7519740 b>) (const_int 24 [0x18]))) [4 b+24 S2 A64]) (nil ... (insn 255 221 233 4 (set (reg:SI 1285) (ashift:SI (subreg:SI (reg:HI 234 [ D.1511 ]) 0) (const_int 16 [0x10]))) test.c:23 306 {ashlsi3} (expr_list:REG_DEAD (reg:HI 234 [ D.1511 ]) (nil))) When LRA does reloads for insn 255, it tries to strip the subregs for pseudo 234 and use SImode moves. (insn 1367 1732 1712 4 (set (reg:SI 12 %o4 [1425]) (mem/c:SI (plus:SI (reg/f:SI 3 %g3 [1307]) (const_int 22 [0x16])) [4 b+22 S4 A16])) test.c:23 61 {*movsi_insn} (nil)) (insn 255 1712 1366 4 (set (reg:SI 12 %o4 [1285]) (ashift:SI (reg:SI 12 %o4 [1425]) (const_int 16 [0x10]))) test.c:23 306 {ashlsi3} (expr_list:REG_DEAD (reg:SI 12 %o4 [1425]) (nil))) (insn 1366 255 1711 4 (set (reg:SI 12 %o4 [1285]) (reg:SI 12 %o4 [1285])) test.c:23 61 {*movsi_insn} (expr_list:REG_DEAD (reg:SI 12 %o4 [1285]) (nil))) Unfortunately, this is illegal, because insn 1367 is an SImode load to a non-4-byte-aligned location (%g3 + 22). This does work, and might even be profitable, on non-STRICT_ALIGNMENT targets. But for those which are STRICT_ALIGNMENT, we cannot make transformations like this.
LRA best_reload_nregs
Unlike the other variables that track the state of the current instruction being analyzed by the LRA constraints code, I don't see anything which initializes best_reload_nregs when we start looking at a new instruction.
Re: expansion of vector shifts...
From: Richard Sandiford Date: Mon, 29 Oct 2012 10:14:53 + > ...given that the code is like you say written: > > if (SHIFT_COUNT_TRUNCATED) > { > if (CONST_INT_P (op1) > ... > else if (GET_CODE (op1) == SUBREG > && subreg_lowpart_p (op1) > && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (op1 > op1 = SUBREG_REG (op1); > } > > INTEGRAL_MODE_P (GET_MODE (op1)) might be better than an explicit > VECTOR_MODE_P check. The code really doesn't make sense for anything > other than integers. > > (It amounts to the same thing in practice, of course...) Agreed, I've just committed the following. Thanks! Fix gcc.c-torture/compile/pr53410-2.c on sparc. * expmed.c (expand_shift_1): Don't strip non-integral SUBREGs. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@193547 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 2 ++ gcc/expmed.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9abd396..62bde4e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,7 @@ 2012-11-15 David S. Miller + * expmed.c (expand_shift_1): Don't strip non-integral SUBREGs. + * configure.ac: Add check for assembler SPARC4 instruction support. * configure: Rebuild. diff --git a/gcc/expmed.c b/gcc/expmed.c index 5b697a1..8640427 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -2165,7 +2165,8 @@ expand_shift_1 (enum tree_code code, enum machine_mode mode, rtx shifted, % GET_MODE_BITSIZE (mode)); else if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1) - && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (op1 + && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (op1))) + && INTEGRAL_MODE_P (GET_MODE (op1))) op1 = SUBREG_REG (op1); } -- 1.7.12.2.dirty
var-tracking wrt. leaf regs on sparc
Hello Eric, this is in regards to your HAVE_window_save code in var-tracking.c and elsewhere added for PR target/48220. I'm trying to fix all of the guality failures on sparc and this issue below is the first one I was able to comprehend. All of this special register window debugging handling in var-tracking.c can get tripped up any time we have a leaf function, in fact I would venture that most if not all of the guality failures on sparc are directly or indirectly related to this issue. A good example is pr54200.c, where for function foo: int __attribute__((noinline,noclone)) foo (int z, int x, int b) { if (x == 1) { bar (); return z; } else { int a = (x + z) + b; return a; /* { dg-final { gdb-test 20 "z" "3" } } */ } } with -O1 and higher we end up with a leaf function here. And incoming argument register %o0 is used for computing the function's return value. But var-tracking.c doesn't see these modifications at all. The reason is that the window tracking code converts the incoming arguments %iN registers into the outgoing %oN registers. But this is not the right thing to do because all of the RTL being inspected has not had leaf register remapping applied to it yet. So var-tracking.c sees modifications of %i0 in the RTL but doesn't grok that this is a modification of incoming argument 'z' because it thinks that 'z' lives in %o0. Leaf register remapping of the RTL stream is only done as the very last step in final.c And when debugging information is emitted, the debugging backends perform the leaf register remapping manually so that the emitted debugging regno references match what final.c is going to emit. This essentially means every other part of the compiler should look at the RTL and incoming arguments unmolested. As a quick check I found that if I just remove all of the HAVE_window_save code in var-tracking.c, the pr54200.c test case passes at all optimization levels. Of course there is more to it than this, looking quickly at the var-tracking rtl dump, it seems to now think in this test case that the incoming arguments live in both %oN and %iN. I would need to dig more deeply to figure out why that's happening. The only other alternative I can see would be to get everything in var-tracking.c and the other subsystems it uses to do leaf register remapping, but that seems completely like the wrong way to handle this.
Re: var-tracking wrt. leaf regs on sparc
From: David Miller Date: Tue, 05 Feb 2013 18:18:39 -0500 (EST) > The only other alternative I can see would be to get everything in > var-tracking.c and the other subsystems it uses to do leaf register > remapping, but that seems completely like the wrong way to handle > this. Following up to myself... :-) Now that I understand fully what we're trying to accomplish with the DT_OP_GNU_entry_value and DT_OP_GNU_call_site_parameter extensions, it does in fact seem like we will need to do leaf register remapping in var-tracking.c Here below is a patch I'm playing with. It's a rough draft but it definitely fixes the pr54200.c problem completely. Another way to do this would be to not translate the incoming parameter registers (leave them at %i*) if we don't see the window save. That way we only have to play the regno remapping game for these specific incoming argument pieces, rather than for everything we look at in the RTL stream. diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c index 714acb69..14635b9 100644 --- a/gcc/var-tracking.c +++ b/gcc/var-tracking.c @@ -1057,6 +1057,32 @@ adjust_mem_stores (rtx loc, const_rtx expr, void *data) } } +/* Given a regno from the RTL instruction stream, return the + actual register number that will be used by final and debug + info emission. */ +static unsigned int +real_regno (unsigned int regno) +{ +#ifdef LEAF_REG_REMAP + if (regno < FIRST_PSEUDO_REGISTER + && crtl->uses_only_leaf_regs) +{ + int remapped = LEAF_REG_REMAP (regno); + + if (remapped >= 0) + regno = (unsigned int) remapped; +} +#endif + + return regno; +} + +static unsigned int +real_regno_rtx (rtx reg) +{ + return real_regno (REGNO (reg)); +} + /* Simplify INSN. Remove all {PRE,POST}_{INC,DEC,MODIFY} rtxes, replace them with their value in the insn and add the side-effects as other sets to the insn. */ @@ -1804,12 +1830,12 @@ var_reg_decl_set (dataflow_set *set, rtx loc, enum var_init_status initialized, if (decl_p) dv = dv_from_decl (var_debug_decl (dv_as_decl (dv))); - for (node = set->regs[REGNO (loc)]; node; node = node->next) + for (node = set->regs[real_regno_rtx (loc)]; node; node = node->next) if (dv_as_opaque (node->dv) == dv_as_opaque (dv) && node->offset == offset) break; if (!node) -attrs_list_insert (&set->regs[REGNO (loc)], dv, offset, loc); +attrs_list_insert (&set->regs[real_regno_rtx (loc)], dv, offset, loc); set_variable_part (set, loc, dv, offset, initialized, set_src, iopt); } @@ -1875,7 +1901,7 @@ var_reg_delete_and_set (dataflow_set *set, rtx loc, bool modify, if (initialized == VAR_INIT_STATUS_UNKNOWN) initialized = get_init_value (set, loc, dv_from_decl (decl)); - nextp = &set->regs[REGNO (loc)]; + nextp = &set->regs[real_regno_rtx (loc)]; for (node = *nextp; node; node = next) { next = node->next; @@ -1904,7 +1930,7 @@ var_reg_delete_and_set (dataflow_set *set, rtx loc, bool modify, static void var_reg_delete (dataflow_set *set, rtx loc, bool clobber) { - attrs *nextp = &set->regs[REGNO (loc)]; + attrs *nextp = &set->regs[real_regno_rtx (loc)]; attrs node, next; if (clobber) @@ -2386,7 +2412,7 @@ val_bind (dataflow_set *set, rtx val, rtx loc, bool modified) if (REG_P (loc)) { if (modified) - var_regno_delete (set, REGNO (loc)); + var_regno_delete (set, real_regno_rtx (loc)); var_reg_decl_set (set, loc, VAR_INIT_STATUS_INITIALIZED, dv_from_value (val), 0, NULL_RTX, INSERT); } @@ -2584,7 +2610,7 @@ val_resolve (dataflow_set *set, rtx val, rtx loc, rtx insn) { attrs node, found = NULL; - for (node = set->regs[REGNO (loc)]; node; node = node->next) + for (node = set->regs[real_regno_rtx (loc)]; node; node = node->next) if (dv_is_value_p (node->dv) && GET_MODE (dv_as_value (node->dv)) == GET_MODE (loc)) { @@ -2838,7 +2864,8 @@ variable_union (variable src, dataflow_set *set) { if (!((REG_P (node2->loc) && REG_P (node->loc) -&& REGNO (node2->loc) == REGNO (node->loc)) +&& (real_regno_rtx (node2->loc) +== real_regno_rtx (node->loc))) || rtx_equal_p (node2->loc, node->loc))) { if (node2->init < node->init) @@ -2871,7 +2898,8 @@ variable_union (variable src, dataflow_set *set) for (node = src->var_part[i].loc_chain; node; node = node->next) if (!((REG_P (dstnode->loc) && REG_P (node->loc) -
Re: var-tracking wrt. leaf regs on sparc
From: Eric Botcazou Date: Wed, 06 Feb 2013 11:13:30 +0100 > I think testing crtl->uses_only_leaf_regs is sufficient here (and > while you're at it, you could also test the value of > HAVE_window_save, which can be 0 if -mflat is passed on the SPARC), > so > > #ifdef HAVE_window_save > if (HAVE_window_save && !crtl->uses_only_leaf_regs) > { > > } > #endif Yes, this works perfectly, Jakub any objections? gcc/ 2013-02-06 David S. Miller * var-tracking.c (vt_add_function_parameter): Test the presence of HAVE_window_save properly and do not remap argument registers when we have a leaf function. diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c index 714acb69..0db1562 100644 --- a/gcc/var-tracking.c +++ b/gcc/var-tracking.c @@ -9502,31 +9502,34 @@ vt_add_function_parameter (tree parm) /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers. If the target machine has an explicit window save instruction, the actual entry value is the corresponding OUTGOING_REGNO instead. */ - if (REG_P (incoming) - && HARD_REGISTER_P (incoming) - && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) + if (HAVE_window_save && !crtl->uses_only_leaf_regs) { - parm_reg_t p; - p.incoming = incoming; - incoming - = gen_rtx_REG_offset (incoming, GET_MODE (incoming), - OUTGOING_REGNO (REGNO (incoming)), 0); - p.outgoing = incoming; - vec_safe_push (windowed_parm_regs, p); -} - else if (MEM_P (incoming) - && REG_P (XEXP (incoming, 0)) - && HARD_REGISTER_P (XEXP (incoming, 0))) -{ - rtx reg = XEXP (incoming, 0); - if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + if (REG_P (incoming) + && HARD_REGISTER_P (incoming) + && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) { parm_reg_t p; - p.incoming = reg; - reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); - p.outgoing = reg; + p.incoming = incoming; + incoming + = gen_rtx_REG_offset (incoming, GET_MODE (incoming), + OUTGOING_REGNO (REGNO (incoming)), 0); + p.outgoing = incoming; vec_safe_push (windowed_parm_regs, p); - incoming = replace_equiv_address_nv (incoming, reg); + } + else if (MEM_P (incoming) + && REG_P (XEXP (incoming, 0)) + && HARD_REGISTER_P (XEXP (incoming, 0))) + { + rtx reg = XEXP (incoming, 0); + if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + { + parm_reg_t p; + p.incoming = reg; + reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); + p.outgoing = reg; + vec_safe_push (windowed_parm_regs, p); + incoming = replace_equiv_address_nv (incoming, reg); + } } } #endif
Re: var-tracking wrt. leaf regs on sparc
From: Jakub Jelinek Date: Wed, 6 Feb 2013 07:56:44 +0100 > so achieving zero failures might be too hard I don't believe this, all the sparc failures I see look like a similar bug just showing up in multiple tests.
Re: var-tracking wrt. leaf regs on sparc
From: Jakub Jelinek Date: Thu, 7 Feb 2013 18:22:32 +0100 > Then supposedly somewhere in dwarf2out we do some adjustment, > but still end up with d/e loclist of: > .LLST2: > .uaxword.LVL0-.Ltext0 ! Location list begin address > (*.LLST2) > .uaxword.LVL1-.Ltext0 ! Location list end address (*.LLST2) > .uahalf 0x6 ! Location expression size > .byte 0x88! DW_OP_breg24 > .byte 0 ! sleb128 0 > .byte 0x89! DW_OP_breg25 > .byte 0 ! sleb128 0 > .byte 0x22! DW_OP_plus > .byte 0x9f! DW_OP_stack_value > .uaxword.LVL1-.Ltext0 ! Location list begin address > (*.LLST2) > .uaxword.LFE0-.Ltext0 ! Location list end address (*.LLST2) > .uahalf 0x1 ! Location expression size > .byte 0x58! DW_OP_reg8 > .uaxword0 ! Location list terminator begin (*.LLST2) > .uaxword0 ! Location list terminator end (*.LLST2) > where I'd expect breg8/breg9 instead. The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c: diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 06cfb18..765d5c5 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset, } } - regno = DWARF_FRAME_REGNUM (REGNO (reg)); + regno = REGNO (reg); +#ifdef LEAF_REG_REMAP + if (crtl->uses_only_leaf_regs) +{ + int leaf_reg = LEAF_REG_REMAP (regno); + if (leaf_reg != -1) + regno = (unsigned) leaf_reg; +} +#endif + regno = DWARF_FRAME_REGNUM (regno); if (!optimize && fde && (fde->drap_reg == regno || fde->vdrap_reg == regno))
Re: var-tracking wrt. leaf regs on sparc
From: David Miller Date: Thu, 07 Feb 2013 14:38:18 -0500 (EST) > From: Jakub Jelinek > Date: Thu, 7 Feb 2013 18:22:32 +0100 > >> Then supposedly somewhere in dwarf2out we do some adjustment, >> but still end up with d/e loclist of: >> .LLST2: >> .uaxword.LVL0-.Ltext0 ! Location list begin address >> (*.LLST2) >> .uaxword.LVL1-.Ltext0 ! Location list end address (*.LLST2) >> .uahalf 0x6 ! Location expression size >> .byte 0x88! DW_OP_breg24 >> .byte 0 ! sleb128 0 >> .byte 0x89! DW_OP_breg25 >> .byte 0 ! sleb128 0 >> .byte 0x22! DW_OP_plus >> .byte 0x9f! DW_OP_stack_value >> .uaxword.LVL1-.Ltext0 ! Location list begin address >> (*.LLST2) >> .uaxword.LFE0-.Ltext0 ! Location list end address (*.LLST2) >> .uahalf 0x1 ! Location expression size >> .byte 0x58! DW_OP_reg8 >> .uaxword0 ! Location list terminator begin (*.LLST2) >> .uaxword0 ! Location list terminator end (*.LLST2) >> where I'd expect breg8/breg9 instead. > > The fix for this is trivial, just a missing leaf renumbering in dwarf2out.c: So the combined patch is below, any objections? Here is the testsuite diff: @@ -155,8 +148,8 @@ FAIL: gcc.dg/guality/vla-2.c -O2 -flto === gcc Summary === -# of expected passes 2128 -# of unexpected failures 122 +# of expected passes 2135 +# of unexpected failures 115 # of unexpected successes 31 # of expected failures 17 # of unsupported tests 136 This is undoubtedly an improvement. gcc/ 2013-02-07 David S. Miller * dwarf2out.c (based_loc_descr): Perform leaf register remapping on 'reg'. * var-tracking.c (vt_add_function_parameter): Test the presence of HAVE_window_save properly and do not remap argument registers when we have a leaf function. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 06cfb18..765d5c5 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10864,7 +10864,16 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset, } } - regno = DWARF_FRAME_REGNUM (REGNO (reg)); + regno = REGNO (reg); +#ifdef LEAF_REG_REMAP + if (crtl->uses_only_leaf_regs) +{ + int leaf_reg = LEAF_REG_REMAP (regno); + if (leaf_reg != -1) + regno = (unsigned) leaf_reg; +} +#endif + regno = DWARF_FRAME_REGNUM (regno); if (!optimize && fde && (fde->drap_reg == regno || fde->vdrap_reg == regno)) diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c index 714acb69..0db1562 100644 --- a/gcc/var-tracking.c +++ b/gcc/var-tracking.c @@ -9502,31 +9502,34 @@ vt_add_function_parameter (tree parm) /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers. If the target machine has an explicit window save instruction, the actual entry value is the corresponding OUTGOING_REGNO instead. */ - if (REG_P (incoming) - && HARD_REGISTER_P (incoming) - && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) + if (HAVE_window_save && !crtl->uses_only_leaf_regs) { - parm_reg_t p; - p.incoming = incoming; - incoming - = gen_rtx_REG_offset (incoming, GET_MODE (incoming), - OUTGOING_REGNO (REGNO (incoming)), 0); - p.outgoing = incoming; - vec_safe_push (windowed_parm_regs, p); -} - else if (MEM_P (incoming) - && REG_P (XEXP (incoming, 0)) - && HARD_REGISTER_P (XEXP (incoming, 0))) -{ - rtx reg = XEXP (incoming, 0); - if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + if (REG_P (incoming) + && HARD_REGISTER_P (incoming) + && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) { parm_reg_t p; - p.incoming = reg; - reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); - p.outgoing = reg; + p.incoming = incoming; + incoming + = gen_rtx_REG_offset (incoming, GET_MODE (incoming), + OUTGOING_REGNO (REGNO (incoming)), 0); + p.outgoing = incoming; vec_safe_push (windowed_parm_regs, p); - incoming = replace_equiv_address_nv (incoming, reg); + } + else if (MEM_P (incoming) + && REG_P (XEXP (incoming, 0)) + && HARD_REGISTER_P (XEXP (incoming, 0))) + { + rtx reg = XEXP (incoming, 0); + if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + { + parm_reg_t p; + p.incoming = reg; + reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); + p.outgoing = reg; + vec_safe_push (windowed_parm_regs, p); + incoming = replace_equiv_address_nv (incoming, reg); + } } } #endif
Re: var-tracking wrt. leaf regs on sparc
From: Jakub Jelinek Date: Thu, 7 Feb 2013 20:43:32 +0100 > This and earlier patch are ok, if it bootstraps/regtests fine, and suitable > ChangeLog entry is provided. > Running gdb testsuite before and after wouldn't hurt though. I've done all of this, and committed to trunk and the gcc-4.7 branch, thanks. In looking at the remaining failures, several have to do with an early clobber if the first incoming argument register. The issue is that this is where return values are placed, so we run into a situation where that incoming argument value can't be reconstituted in any way by the variable tracking code and thus gdb says that it has been optimized out. Many non-x86 cpus are going to run into this problem. For example, from pr36728-1.c: foo: save%sp, -96, %sp add %sp, -40, %sp mov 2, %g2 add %sp, 123, %g1 mov 25, %g4 and %g1, -32, %g1 sethi %hi(b), %g3 st %g2, [%g1] ld [%fp+92], %g2 nop ld [%g1], %i0 add %g2, 14, %g2 and %g2, -8, %g2 sub %sp, %g2, %sp stb %g4, [%sp+96] add %sp, 96, %g2 sethi %hi(a), %g4 nop return %i7+8 nop Here %i0 is written early, and then the tests can't view 'arg1' properly later in the function. Also, I noticed that calculation of the on-stack address of values with alignment regressed in gcc-4.8 vs. gcc-4.7 Again, in pr36728-1.c, 'y' can be printed properly in gcc-4.7 but in gcc-4.8 it cannot. I think it might be getting the base register wrong, I'll look more deeply if I get a chance.
Re: expansion of vector shifts...
From: David Miller Date: Fri, 16 Nov 2012 00:33:05 -0500 (EST) > From: Richard Sandiford > Date: Mon, 29 Oct 2012 10:14:53 + > >> ...given that the code is like you say written: >> >> if (SHIFT_COUNT_TRUNCATED) >> { >> if (CONST_INT_P (op1) >> ... >> else if (GET_CODE (op1) == SUBREG >> && subreg_lowpart_p (op1) >> && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (op1 >> op1 = SUBREG_REG (op1); >> } >> >> INTEGRAL_MODE_P (GET_MODE (op1)) might be better than an explicit >> VECTOR_MODE_P check. The code really doesn't make sense for anything >> other than integers. >> >> (It amounts to the same thing in practice, of course...) > > Agreed, I've just committed the following. Thanks! > > > Fix gcc.c-torture/compile/pr53410-2.c on sparc. > > * expmed.c (expand_shift_1): Don't strip non-integral SUBREGs. This is broken on sparc again, although I'm confused about how this has happened. The suggestion was to use INTEGRAL_MODE_P as the test, so what's there in expand_shift_1() is: else if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1) && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (op1))) && INTEGRAL_MODE_P (GET_MODE (op1))) op1 = SUBREG_REG (op1); but INTEGRAL_MODE_P accepts vectors. This is really confusing because I was absolutely sure I re-ran the test case with the fix I committed and it didn't crash any more. Maybe what we really mean to do here is check both op1 and SUBREG_REG (op1) against SCALAR_INT_MODE_P instead of INTEGRAL_MODE_P? Something like this: gcc/ 2013-02-12 David S. Miller * expmed.c (expand_shift_1): Only strip scalar integer subregs. diff --git a/gcc/expmed.c b/gcc/expmed.c index 4a6ddb0..954a360 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -2116,8 +2116,8 @@ expand_shift_1 (enum tree_code code, enum machine_mode mode, rtx shifted, % GET_MODE_BITSIZE (mode)); else if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1) - && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (op1))) - && INTEGRAL_MODE_P (GET_MODE (op1))) + && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1))) + && SCALAR_INT_MODE_P (GET_MODE (op1))) op1 = SUBREG_REG (op1); }
Re: expansion of vector shifts...
From: Richard Biener Date: Wed, 13 Feb 2013 12:15:13 +0100 > On Tue, Feb 12, 2013 at 11:31 PM, David Miller wrote: >> Maybe what we really mean to do here is check both op1 and SUBREG_REG >> (op1) against SCALAR_INT_MODE_P instead of INTEGRAL_MODE_P? > > Yes. Ok, I'll commit this after doing some regstraps, thanks. >> Something like this: >> >> gcc/ >> >> 2013-02-12 David S. Miller >> >> * expmed.c (expand_shift_1): Only strip scalar integer subregs. >> >> diff --git a/gcc/expmed.c b/gcc/expmed.c >> index 4a6ddb0..954a360 100644 >> --- a/gcc/expmed.c >> +++ b/gcc/expmed.c >> @@ -2116,8 +2116,8 @@ expand_shift_1 (enum tree_code code, enum machine_mode >> mode, rtx shifted, >>% GET_MODE_BITSIZE (mode)); >>else if (GET_CODE (op1) == SUBREG >>&& subreg_lowpart_p (op1) >> - && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (op1))) >> - && INTEGRAL_MODE_P (GET_MODE (op1))) >> + && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1))) >> + && SCALAR_INT_MODE_P (GET_MODE (op1))) >> op1 = SUBREG_REG (op1); >> } >> >> >> >
Re: Linux doesn't follow x86/x86-64 ABI wrt direction flag
From: Aurelien Jarno <[EMAIL PROTECTED]> Date: Wed, 05 Mar 2008 21:52:14 +0100 > H. Peter Anvin a écrit : > > The best would be if this could be controlled by a flag, which we can > > flip once kernel fixes has been around for long enough. > > I have to agree there. Whatever the decision that gcc will take, > distributions will reenable the old behaviour for some time for to allow > upgrades from a previous version. I don't think this approach is tenable. If a distribution should ship with a "fixed" kernel and compiler enabling the new direction flag behavior, any binary you create on that system will be broken on any other existing system. I think we really are stuck with this forever, overwhelming practice over the past 15 years has dictated to us what the real ABI is.
Re: RELEASE BLOCKER: Linux doesn't follow x86/x86-64 ABI wrt direction flag
From: "Richard Guenther" <[EMAIL PROTECTED]> Date: Wed, 5 Mar 2008 22:40:59 +0100 > Right. So this problem is over-exaggerated. It's not like > "any binary you create on that system will be broken on any > other existing system." I will be sure to hunt you down to help debug when someone reports that once every few weeks their multi-day simulation gives incorrect results :-) This is one of those cases where the bug is going to be a huge issue to people who actually hit it, and since we know about the problem, knowingly shipping something in that state is unforgivable.
Re: RELEASE BLOCKER: Linux doesn't follow x86/x86-64 ABI wrt direction flag
From: Michael Matz <[EMAIL PROTECTED]> Date: Wed, 5 Mar 2008 22:43:33 +0100 (CET) > The error is arcane and happens seldomly if at all. And only on > unfixed kernels. Which translates right now into "all kernels."
Re: RELEASE BLOCKER: Linux doesn't follow x86/x86-64 ABI wrt direction flag
From: Adrian Bunk <[EMAIL PROTECTED]> Date: Thu, 6 Mar 2008 00:13:04 +0200 > On Wed, Mar 05, 2008 at 10:59:21PM +0100, Michael Matz wrote: > > The problem is with old kernels, which by definition stay unfixed. > > Compiling older kernels with new gcc versions has never been supported. Adrian we're talking about userland binaries compiled by gcc-4.3, not the kernel. Please follow the discussion if you'd like to contribute. Thanks.
Re: RELEASE BLOCKER: Linux doesn't follow x86/x86-64 ABI wrt direction flag
From: Michael Matz <[EMAIL PROTECTED]> Date: Thu, 6 Mar 2008 00:07:39 +0100 (CET) > The fix lies in the kernel, the work-around in gcc. This depends upon how you interpret this ABI situation. There is at least some agreement that how things have actually been implemented by these kernels for more than 15 years trumps whatever a paper standard states.
Re: US-CERT Vulnerability Note VU#162289
From: Ian Lance Taylor <[EMAIL PROTECTED]> Date: Fri, 11 Apr 2008 11:04:38 -0700 > "Robert C. Seacord" <[EMAIL PROTECTED]> writes: > > >> What you really mean is, > >> "Use an older GCC or some other compiler that is known not to > >> take advantage of this optimization." > >> > > i think we mean what we say, which is "*Avoid newer versions of gcc" > > and *"avoiding the use of gcc versions 4.2 and later." i don't see > > any verbiage that says "use a different compiler". > > I know I'm biased, but I think "use a different compiler" is clearly > implied by the text of the advisory. If the advisory mentioned that > other compilers also implement the same optimization, then that > implication would not be there. I completely agree.
Re: US-CERT Vulnerability Note VU#162289
From: Chad Dougherty <[EMAIL PROTECTED]> Date: Wed, 23 Apr 2008 07:52:26 -0400 > We won't include information about other vendors without either a > statement from them or independent verification of their affectedness. How, may I ask, did that policy apply to the GCC "vendor" when this all got started?
Re: US-CERT Vulnerability Note VU#162289
From: Chad Dougherty <[EMAIL PROTECTED]> Date: Wed, 23 Apr 2008 08:37:11 -0400 > David Miller wrote: > > How, may I ask, did that policy apply to the GCC "vendor" > > when this all got started? > > Our own testing of multiple versions of gcc on multiple platforms and > subsequent confirmation by Mark that it was intentional, desired > behavior. This all occurred prior to even the initial version of the note. CERT is asking these vendors for "approval" for the text they will add mentioning anything about their product. That's the bit I'm talking about. They are getting protection and consideration that was not really afforded to GCC. CERT treated GCC differently.
Re: US-CERT Vulnerability Note VU#162289
From: Joe Buck <[EMAIL PROTECTED]> Date: Wed, 23 Apr 2008 08:24:44 -0700 > If CERT is to maintain its reputation, it needs to do better. The warning > is misdirected in any case; given the very large number of compilers that > these coding practices cause trouble for, you need to focus on the bad > coding practices, not on unfair demonization of new GCC releases. In my opinion CERT's advisory has been nothing but an unfair FUD campaign on compilers, and GCC specifically, and has seriously devalued CERT's advisories, in general, which were already of low value to begin with. It looks similar to a news article run by a newspaper that is losing money and has no real news to write about, but yet they have to write about something. The worst part of this fiasco is that GCCs reputation has been unfairly harmed in one way or another, and there is nothing CERT can do to rectify the damage they've caused.
Re: Should we remove java from the default bootstrap languages?
From: "Steven Bosscher" <[EMAIL PROTECTED]> Date: Sat, 21 Jun 2008 00:09:26 +0200 > What is far more worrying to me, actually, is that libjava grows > bigger and bigger and bigger with every release, so that testing it > costs developers who care zilch about java (i.e. most people) get > penalized more and more with increased bootstrap and test times. I agree and will admit that this is the one thing that has curtailed severely my contributions to GCC in the past 4 to 5 years. I used to be able to bootstrap gcc fully in minutes on average hardware 6 or so years ago. Those days are long gone. On my largest 64 cpu and 128 cpu boxes it takes forever these days. The libjava build is notoriously not helped by parallelization because certain compiles are extremely expensive, which effectively single-threads the build.
Re: Should we remove java from the default bootstrap languages?
From: Ralf Wildenhues <[EMAIL PROTECTED]> Date: Sat, 21 Jun 2008 10:58:55 +0200 > Would that be compiles of object files that end up in libgcj (as opposed > to the link, or stuff that depends on libgcj)? If yes, the lack of > parallelism should be fixable. It's the compilation of the object files, not the linking.
Re: Should we remove java from the default bootstrap languages?
From: Laurent GUERBY <[EMAIL PROTECTED]> Date: Sat, 21 Jun 2008 22:12:24 +0200 > I'm curious at how many GCC developpers use non x86/_64 as their > main development machine (and how many non x86/_64 core they use). I definitely am one. Or, maybe you are asking the wrong question, which likely should be how many maintainers of other cpus backends are in this situation :-)
Re: Should we remove java from the default bootstrap languages?
From: Laurent GUERBY <[EMAIL PROTECTED]> Date: Sun, 22 Jun 2008 09:21:32 +0200 > On Sun, 2008-06-22 at 00:04 -0700, David Miller wrote: > > From: Laurent GUERBY <[EMAIL PROTECTED]> > > Date: Sat, 21 Jun 2008 22:12:24 +0200 > > > > > I'm curious at how many GCC developpers use non x86/_64 as their > > > main development machine (and how many non x86/_64 core they use). > > > > I definitely am one. > > How many core does your main development machine have? 8 cores and 8 cpu threads per core on one, 64 cpus total. 16 cores and 8 cpu threads per core on another, 128 cpus total. It still takes an hour or so to bootstrap on these machines because the individual cpu threads are slow and individual expensive compilations become the bottleneck, especially in the libjava build. And the way the testsuite works it can't even come close to using even a significant fraction of those cpus.
Re: Should we remove java from the default bootstrap languages?
From: Laurent GUERBY <[EMAIL PROTECTED]> Date: Sun, 22 Jun 2008 10:05:26 +0200 > On Sun, 2008-06-22 at 00:45 -0700, David Miller wrote: > > > How many core does your main development machine have? > > > > 8 cores and 8 cpu threads per core on one, 64 cpus total. > > 16 cores and 8 cpu threads per core on another, 128 cpus total. > > > > It still takes an hour or so to bootstrap on these machines because > > the individual cpu threads are slow and individual expensive > > compilations become the bottleneck, especially in the libjava build. > > Did you measure how long insn-attrtab.o generation takes? > See my other email, on a 2.2 GHz barcelona it's about 5mn20s (for each > stage) and it's the main limiting factor (but I assume it's architecture > dependant). On Sparc, which is what these systems are and the cpu I target with gcc, insn-attrtab.o compilation is not expensive.
Re: [10 PATCHES] inline functions to avoid stack overflow
From: Mikulas Patocka <[EMAIL PROTECTED]> Date: Wed, 25 Jun 2008 08:53:10 -0400 (EDT) > Even worse, gcc doesn't use these additional bytes. If you try this: > > extern void f(int *i); > void g() > { > int a; > f(&a); > } > > , it allocates additional 16 bytes for the variable "a" (so there's total > 208 bytes), even though it could place the variable into 48-byte > ABI-mandated area that it inherited from the caller or into it's own > 16-byte padding that it made when calling "f". The extra 16 bytes of space allocated is so that GCC can perform a secondary reload of a quad floating point value. It always has to be present, because we can't satisfy a secondary reload by emitting yet another reload, it's the end of the possible level of recursions allowed by the reload pass. GCC could be smart and eliminate that slot when it's not used, but such a thing is not implemented yet. It would also require quite a bit of new code to determine cases like you mention above, where the incoming arg slots from the caller are unused, assuming this would be legal. And that legality is doubtful. We'd need to be careful because I think the caller is allowed to assume that those slots are untouched by the callee, and thus can be assumed to have whatever values the caller put there even after the callee returns.
Re: [10 PATCHES] inline functions to avoid stack overflow
From: "Bart Van Assche" <[EMAIL PROTECTED]> Date: Thu, 26 Jun 2008 08:32:35 +0200 > On Thu, Jun 26, 2008 at 12:09 AM, David Miller <[EMAIL PROTECTED]> wrote: > > The extra 16 bytes of space allocated is so that GCC can perform a > > secondary reload of a quad floating point value. It always has to be > > present, because we can't satisfy a secondary reload by emitting yet > > another reload, it's the end of the possible level of recursions > > allowed by the reload pass. > > Is there any floating-point code present in the Linux kernel ? Yes, but not coming from C compiled code. Floating point is used in most of the memcpy/memset implementations of the sparc64 kernel. > Would it be a good idea to add an option to gcc that tells gcc that > the compiled code does not contain floating-point instructions, such > that gcc knows that no space has to be provided for a quad floating > point value ? I think it exists already, it's called -mno-fpu :-)
Re: [10 PATCHES] inline functions to avoid stack overflow
From: Mikulas Patocka <[EMAIL PROTECTED]> Date: Wed, 2 Jul 2008 00:39:35 -0400 (EDT) > The ABI is very vague about it. The V9 ABI just displays that 6-word space > in a figure bug doesn't say anything about it's usage. The V8 ABI just > says that "the function may write incoming arguments there". If it may > write anything other, it is unknown --- probably yes, but it is not said > in the document. > > The document nicely specifies who owns which registers, but doesn't say > that about the stack space :-( Actually, I know for a fact that you have to have those slots there. A long time ago in the sparc64 kernel, in the trap entry code, I tried only giving 128 bytes of stack frame as the trap entry called into C code. And it did not work, I had to put the 6 slots there.
Re: Bootstrap failure on sparc-sun-solaris2.10
From: Rainer Orth <[EMAIL PROTECTED]> Date: Mon, 8 Sep 2008 17:18:50 +0200 (MEST) > Eric Botcazou writes: > > > Confirmed (on Solaris 9). Would you mind opening a PR? There is already > > one > > for Linux (37344) but the failure is a little different. Thanks in advance. > > Sure, done: PR bootstrap/37424. BTW, I'm also seeing the sparc-*-linux failure, and it seems the compiler is outputting an unaligned memory access somehow.
Re: Is Sun putting much effort into supporting the gcc/binutils toolchain on sparc64 ?
From: Andrew Walrond <[EMAIL PROTECTED]> Date: Wed, 12 Sep 2007 12:37:03 +0100 > I have to make buying decisions, and having tested a Sun T1000 for a > while I am impressed with Suns hardware. But, we are 100% gnu/linux and > it disturbs me that David Miller seems to be a (very impressive) team of > 1 on the sparclinux ML (My impression; perhaps I am wrong?) > > So I wonder, is Sun putting any more effort into developing and > supporting the gcc/binutils toolchain? They are sending me specs and systems very early these days, so in that perspective it's been great. I do also have contacts which I use to report firmware and hypervisor issues I'd like fixed or clarified, so that's great too. I tend to do the processor specific gcc instruction scheduler and costs entries, and adding the new instruction support to binutils as well. So no, Sun really isn't helping with any actual development.
Re: Is Sun putting much effort into supporting the gcc/binutils toolchain on sparc64 ?
From: Sunil Amitkumar Janki <[EMAIL PROTECTED]> Date: Thu, 13 Sep 2007 11:16:00 +0200 > As far as I know, GCC for SPARC appears to be a front end > to the Sun Studio compiler which happens to translate GCC > flags to ones suitable for their own compiler. That's correct, it's essentially GCC's language front ends plugged into Sun's compiler backend.
Re: Is Sun putting much effort into supporting the gcc/binutils toolchain on sparc64 ?
From: Andrew Walrond <[EMAIL PROTECTED]> Date: Thu, 13 Sep 2007 15:02:08 +0100 > So, Sun really _don't_ give a damn about gnu/linux on sparc64. That is a gross mischaracterization of the situation, don't say this, it isn't true at all. I have a full rack of Niagara systems that proves that Sun cares to some extent. I get early hardware access and documentation access, plus engineers to talk to and ask questions of. That is miles away from what you are claiming. I know what you are after, and it doesn't happen overnight. So don't be spastic and just whine as the only possible effect that will have is to scare them off and put Sun in reverse gear instead of continuing forward.
Re: Is Sun putting much effort into supporting the gcc/binutils toolchain on sparc64 ?
From: Andrew Walrond <[EMAIL PROTECTED]> Date: Fri, 14 Sep 2007 22:43:16 +0100 > And I apologise (to everyone) for any unnecessary rhetoric on my part; I > freely admit that I designed my posts specifically to sparc (ahem) this > debate, but I guess most of you knew that already ;) The big issue you keep missing in all of your rediculious complaints is that it takes a lot of time for a company to invest in future potential revenue. Most of Sun's resources are invested in things that make them money right now and keep the lights on at their buildings and their employee salaries paid. Investing in better Linux Niagara support in the way that you want is a huge risk, and you are only one very vocal customer and that does not, in and of itself, translate into lots of revenue and lots of profit for Sun. In fact, it is not even guarenteed that you yourself will buy lots of Niagara boxes to run Linux on, things change sometimes unexpectedly. So your expectations and judgments of Sun as a company are wholly disconnected from reality. Please stop this banter, and work on things to support what Sun has done so far and things which will encourage them to do more in the future. And _NO_ this does mean continuing to say that Sun isn't doing enough to satisfy you, that discourages rather than encourages in case you haven't gotten that message LOUD and CLEAR by now.