Re: 32 bit jump instruction.
Quoting Steven Bosscher <[EMAIL PROTECTED]>: > On 12/13/06, Joern Rennecke <[EMAIL PROTECTED]> wrote: > > In http://gcc.gnu.org/ml/gcc/2006-12/msg00328.html, you wrote: > > However, because the SH has delayed branches, there is always a guaranteed > way > > to find a register - one can be saved, and then be restored in the delay > slot. > > Heh, that's an interesting feature :-) > > How does that work? I always thought that the semantics of delayed > insns is that the insn in the delay slot is executed *before* the > branch. But that is apparently not the case, or the branch register > would have been over-written before the branch. How does that work on > SH? The jump address is calculated, then the delay slot instruction is executed - or sometimes, if the instructions are pairable, the delay slot insn is executed simultanously with the jump address calculations, then - or during the delay slot insn execution - the target instruction is fetched, and then executed. You can look into sim/sh/interp.c for a functional model of how this works from the programmer's point of view.
Re: matching constraints in asm operands question
> static __inline__ void atomic_add(atomic_t *v, int i) > { > __asm__ __volatile__("addl %1,%0" : "+m" (*v) : "d" (i)); > } > > Then the compiler complains with: > > /asm/atomic.h:33: warning: read-write constraint does not allow a register > > So is the warning wrong? Yes, the warning is wrong, and the text in the manual about '+' is also nonsense. Support for '+' is asms was specifically introduced to make it safe to have read-write memory operands. Jason, the point of using '+' is that the matched parts start out as the same, and the compiler is supposed to keep them the same. We initially did this with shared rtl, but IIRC we changed cse since, but still the same premise holds: the compiler is supposed to keep both parts of the match in sync. We can guarantee this for '+', but we can't for matching constraints, and this was documented properly till you till you changed extend.texi in December 2003.
Re: matching constraints in asm operands question
Quoting Jason Merrill <[EMAIL PROTECTED]>: > Well, I assumed the same thing when I started poking at that code, but then > someone pointed out that it didn't actually work that way, and as I recall > the code does in fact assume a register. I certainly would not object to > making '+' work properly for memory operands, but simply asserting that it > already does is wrong. The code in reload to make non-matching operands match assumes a register. However, a match from a plus should always kept in sync (except temporarily half-way through a substitution, since we now unshare). If it isn't, that's a regression. Do you have a testcase, and/or can point out the code that introduces the inconsistency in the rtl?
Re: diffing directories with merged-as-deleted files?
>> cvs would never do such nonsense. >Absolutely! It would just print all the directory names in the middle of the >diffs. I call that nonsense as well. But the directory names go to stderr. When you redirect stdout to a file, a diff without the directory names is written to that file, while you get an ongoing progress report.
Re: Use of FLAGS_REGNUM clashes with generates insn
Quoting "Paulo J. Matos" : That's seriously annoying. The idea was to ditch cc0 and explicitly represent CC in a register to perform optimizations like splitting add and addc for a double word addition. If by hiding my register flags means going back to cc0, then it seems that the only way to go unless I get it to work somehow. If you having anything else in mind to get it to work let me know. Hiding the flags register would mean it is not represented in the rtl at all. You can have combined compare-branch instructions. Of course, going that route would mean that the model you present to GCC is even further from the hardware than one that uses cc0. What I currently have in mind is to have a backend macro listing all the move for which a move clobber CC_REG, then whenever GCC generates a move, it queries the macro to know if the move requires clobbering and emits the clobber if required. However, I am unsure how deep the rabbit hole goes. Oh, so you do have variants that can do without the clobber. If you can make all the reloads without introducing explicit flag clobbers, that it should work. But you can't just pull a flag clobber out of thin air. You should have some way to generate valid code when the flags register is unavailable / must be saved. Then you can use peephole2 to add flag clobbers where the flags register is available. Or you can use machine_dependent_reorg or another machine-specific pass inserted with the pass manager to rewrite clobber-free instructions into ones that have a hardware equivalent; but you must make sure that your data flow remains sound in the process.
Re: Use of FLAGS_REGNUM clashes with generates insn
Quoting Hans-Peter Nilsson : On Fri, 23 Sep 2011, Joern Rennecke wrote: Quoting "Paulo J. Matos" : > My addition instruction sets all the flags. So I have: This is annoying, but can be handled. Been there, done that. dse.c needs a small patch, which I intend to submit sometime in the future. Could you be persuaded to send it to the list as-is, right now? Be sure to mark it work-in-progress, or someone might feel compelled to review it. :) Attached. The issue with this patch is that we'll have to check if gen_add3_insn might fail on any target, and also we have to identify on which targets it will create an insn that clobbers potentially live hard registers (like a flags register), and make the substitution fail in this case. I.e. if in doubt, keep the dead store with the auto-increment. But not fail for a target that knows what it clobbers in a reload_in_progress / reload_completed add. For Epiphany, the add will be expanded with a clobber of a fake hard register, and this pattern is subject to various peephole2 patterns which usually find a low-cost substitute. The point of clobbering a fake hard register is to avoid having passes like combine creating the pattern with the unresolved flag clobber problem. The unoptimized add expands into a sequentially issued five-instruction in order to save/restore the flags to a temp register, which in turn is saved on the stack. There are peephole2 patterns to use a constant directly rather than loading it into a temp register, to clobber the flags register if it is free, to move the add before an immediately preceding flag-setting instruction, to find a possibility for a dummy post-modify load from / store to the stack (for calculating a stack/frame based address), and to scavenge a temp register to save the flags into without needing to save the temp register. 2011-09-18 Joern Rennecke * dse.c (emit_inc_dec_insn_before): Use gen_add3_insn / gen_move_insn. Index: dse.c === --- dse.c (revision 2071) +++ dse.c (revision 2072) @@ -835,15 +835,15 @@ emit_inc_dec_insn_before (rtx mem ATTRIB rtx op ATTRIBUTE_UNUSED, rtx dest, rtx src, rtx srcoff, void *arg) { - rtx insn = (rtx)arg; - - if (srcoff) -src = gen_rtx_PLUS (GET_MODE (src), src, srcoff); + rtx insn = (rtx) arg, new_insn; /* We can reuse all operands without copying, because we are about to delete the insn that contained it. */ - - emit_insn_before (gen_rtx_SET (VOIDmode, dest, src), insn); + if (srcoff) +new_insn = gen_add3_insn (dest, src, srcoff); + else +new_insn = gen_move_insn (dest, src); + emit_insn_before (new_insn, insn); return -1; }
RE: Feature request concerning opcodes in the function prolog
Quoting Stefan Dösinger : I talked to Alexandre again, and his main concern wasn't so much the global flag, but that the existance of the push %ebp; mov %esp, %ebp was still up to the feelings of the compiler and the moon phase. So what he wants is something like a msvc_prolog attribute that makes sure that the function starts with the following instructions and bytecode sequence, no matter what -fomit-frame-pointer and friends say: 8b ff mov.s %edi, %edi 55push %ebp 8b ec mov.s %esp, %ebp So we basically need the msvc_prolog to add the "mov.s %edi, %edi" and force the frame pointer on, You don't need to force the frame pointer on, it is sufficient to say that ebp needs restoring at the end of the function no matter if it looks otherwise used or not - and you have to take the frame size impact of the saved ebp into account. Moreover, if your prologue beings with an unspec_volatile that emits the three-instruction sequence you want, the optimizers should leave it there at the start of the function. Although it is properly easiest to get debug and unwind information right if you make this three separate unspec_volatile patterns, with their respective REG_FRAME_RELATED_EXPR notes where applicable. I.e. the push ebp saves ebp and changes the stack. The mov.s esp,ebp needs a REG_FRAME_RELATED_EXPR note only if you are actually using a frame pointer.
RE: Feature request concerning opcodes in the function prolog
Quoting Stefan Dösinger : If the frame pointer is not needed: mov.s %edi, %edi push %ebp mov.s %esp, %ebp pop %ebp ; Continue normally here. I think that case can't be improved too much, since the msvc_prolog stuff modifies the base pointer. If ebp needs to be saved because it contains a user variable, it is better not to pop it in the prologue - pop it in the epilogue instead, and you don't need to have another save/restore. Now my problem: If the frame pointer is needed, and the stack realignment is needed: mov.s %edi, %edi push %ebp mov.s %esp, %ebp pop %ebp leal ... push %ebp mov %esp, %ebp This can be done with much shorter assembly, at the cost of a bit more logic in your prologue / epilogue expanders: mov.s %edi, %edi push %ebp mov.s %esp, %ebp leal ... (adjust value to account for the ebp stack slot) and in the epilogue, after restoring the stack to what it was prior to re-aligning, you do: pop %ebp The REG_FRAME_RELATED_EXPR is set with this, right: ? RTX_FRAME_RELATED_P (insn) = 1; No, REG_FRAME_RELATED_EXPR is a special kind of note for when the dwarf / unwind code can't figure out the information from the pattern of the instruction. Looking at dwarf2out2.c:dwarf2out_frame_debug_expr , I see that you should be able to use a PARALLEL with one part being the operation actually performed, and another part an UNSPEC_VOLATILE to prevent unwanted code motion / deletion etc. When RTX_FRAME_RELATED_P is set on an insn, I haven't yet figured out what it does exactly. When RTX_FRAME_RELATED_P is set on an insn, dwarf call frame information and exception unwinding information will be generated for this instruction (if this kind of information is generated for this translation unit). If REG_FRAME_RELATED_EXPR, this specifies the cfi information for this instruction; otherwise, dwarf2out_frame_debug_expr will analyze the PATTERN of the instruction itself.
RE: Feature request concerning opcodes in the function prolog
Quoting Stefan Dösinger : Here's some code attached that actually works, but is far from perfect. The 'msvc_prologue' attribute is limited to 32 bit. None of the applications that try to place hooks are ported to Win64 yet, so it is impossible to tell what they'll need. Besides, maybe I am lucky that when they appear I can tell their autors to think about Wine. The first problem I (still) have is passing the msvc_prologue attribute around. I abused some other code to do that. How does the attribute handling work, and what would the prefered way be? Well, you could query the attribute every time you need it, but if that would cause performance issues or significant code bloat, caching information computed from the attributes in the machine specific struct is fine. If you only need a single bit and accesses are not too frequent / often, you can also consider making the machine struct member a char or bitfield, so that it can be effectively stored together with other small struct members. The 2nd thing I have to figure out is how and when I have to set REG_FRAME_RELATED_EXPR. It's when there is some operation affecting cfi which is not expressed (in simple enough terms for dwarf2out.c to grok it) in the rtl instruction pattern. Since your nop doesn't affect the call frame or registers, no call frame information needs to be emitted for it. You'll have to also change the third instruction in your 'magic' sequence so that it is or contains an unspec, to prevent it from going walkabout when optimizations like instruction scheduling is performed. If you make it a parallel where the actual oprtation is paired with an empty unspec, no REG_FRAME_RELATED_EXPR is needed. If the actual operation is hidden in the RTL, however, you have to add it in a REG_FRAME_RELATED_EXPR. The latter alternative is more complicated. However, there is a benefit to choosing this: win the stack realign or !frame_pointer_needed cases, the (early) move of esp to ebp is not really supposed to establish a frame pointer, and thus you then don't want any cfi information emitted for it. Thus, you can then simply leave out the REG_FRAME_RELATED_EXPR note. The msvc_prologue + frame_pointer_needed + !stack_realignment_needed case produces the best possible code. The fp setup from msvc_prologue is used for its purpose. The msvc_prologue + !frame_pointer_needed case could be optimized, as you said. However, that changes all the stack frame offsets and sizes, and I do not quite understand all the code I have to modify for that. I think this should be a separate patch, although arguably be ready before msvc_prologue is added. I personally don't care too much about this combination of parameters(Wine won't need/use it), so this optimization would get lost otherwise. The code needed shouldn't be large, but if nobody would use it, it wouldn't be tested either, so even if you got it right initially it would be prone to bitrot. So if you'd need extra code for it but nobody would use it, just add a comment in the code and the option documentation that this is an optimization that could be added / is not implemented. With stack_realignment_needed frame_pointer_needed is on as well, and this code is created(copypasted together by hand, somehow the stack alignment attribute doesn't do anything on my Linux box) movl.s %edi, %edi pushl %ebp movl.s %esp, %ebp pop%ebp lea0x4(%esp),%ecx and$0xfff0,%esp pushl -0x4(%ecx) push %ebp mov%esp,%ebp If we try to get rid of the pop-push-mov, the following things change: *) The value of %ebp Yes, you have to re-do the move from esp to ebp after stack realignment. *) The location of the pushed ebp on the stack That should be fine if you make your prologue expect it there. *) The alignment of %esp after the whole procedure(its alignment+4 before, and the +4 is lost afterwards) Alignment +0 is actually better - best would be to make the alignment offset so that no alignment padding is done for the first highly-aligned stack slot, but that would be really a general stack size optimization, i.e. it goes beyond the scope of the current problem, and I don't think you want to go into this right now. Basically, what we do with saving ebp before stack realignment is that we remove the ebp stack slot from the call frame proper. So you just need to say that the size of the saved registers - and thus the total frame size - is 4 bytes less than if the ebp slot was included in the frame. Make sure that the argument pointer still has the right value, and everything should be fine.
Re: New GCC Runtime Library Exception: not fit for purpose
Quoting Ian Lance Taylor : Joern Rennecke writes: Quoting Manuel López-Ibáñez : 2009/1/29 Joern Rennecke : The runtime library license says that you can link libgcc with proprietary code, whether that proprietary code was compiled with gcc or whether it was compiled with some non-gcc proprietary compiler. No, it says that you can only do that if every file of the proprietary code is written or generated in a high level language, and uses the GCC runtime. Where does it say that? Where does it grant any other permission? It allows you to propagate a work of Target code formed by combining the Runtime Library with Independent Modules under certain conditions, but it doesn't give you any permission to propagate a work that also includes code that is neither part of the Runtime Library nor an Independent Module. I don't think it needs to. Code that is neither Target Code nor an Independent Module is code that has never been involved with gcc, and the license does not cover it. The license does not prohibit combining Target Code or Independent Modules with other code, so it is permitted. Ian
Re: Pushing the limits on vector modes
Quoting Paulo Matos : Hello, I am trying to model a predicate register mode that acts like a vector. We have a few predicate registers that have 8 bits in size but they are set accordingly to the mode of operation (not necessarily a comparison). Word size is 64. Yes need some surgery to the mode generator machinery. I had the same problem with the mxp port, which you can still find in older ARC branches.
Re: Combine pass with reused sources
Quoting "Lu, John" : However, if I modify the function so that one of the factors is reused, long f1(long a, long b, long c) { res0=((long long) a)*((long long) b); res1=((long long) c)*((long long) b); } combine will not fuse the reused sign-extension result to generate the mulhizi3 pattern. I am wondering if anyone else has hit this issue or if I have done something wrong in my port. Any help would be greatly appreciated. You need a matching constraint, as is described in md.texi.
Re: type argument in FUNCTION_ARG macro
Quoting BELBACHIR Selim : Any ideas on how to get around this problem? You can look at the name of library functions.
Re: Will backend ever see an memory operand with address wrap around?
Quoting "H.J. Lu" : What is the expect run-time behavior when a + b has overflow/underflow? The expectation is wrap-around. Note that loop strenght reduction can cause assumed wrap-around semantics in RTL for strictly conforming C input where no such wrap-around is in evidence.
Re: Will backend ever see an memory operand with address wrap around?
Quoting "H.J. Lu" : What is the run-time result when overflow happens? Assuming you use a 32 bit unsigned base address, and the space beyond 4G is unmapped, you'll get a SEGV.
Fwd: Re: [PATCH][4.3] Deprecate -ftrapv
Somehow this got stuck in the spam filter. - Forwarded message from [EMAIL PROTECTED] - Date: Sat, 01 Mar 2008 09:21:21 -0500 From: Joern Rennecke <[EMAIL PROTECTED]> Reply-To: Joern Rennecke <[EMAIL PROTECTED]> Subject: Re: [PATCH][4.3] Deprecate -ftrapv To: gcc@gcc.gnu.org Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] On Fri, 29 Feb 2008, Robert Dewar wrote: Well presumably one would want to use target dependent stuff for detecting overflow where it exists (sticky overflow bits on power, O flag on PC, trapping add on MIPS etc). In fact, when I wrote the original -ftrapv code, it was for the sole purpose of using the trapping add on mips. On Sat, 1 Mar 2008, Joseph S. Myers wrote: The only targets defining the v insn patterns at present appear to be alpha and pa. Considering the trouble that you get when you try to generate branches in a non-branch expander, we should probably have alternate named patterns to be used in ports to processors that have no conditional trap facility, or where a conditional trap is more expensive than a well predictable conditional branch. We want arithmetic-and-branch-on-overflow patterns for these. One peculiarity of these patterns would be that they would be required to expand into more than one instruction, since the write of the result must not be in the same instruction as the branch due to reload limitations. Thus the overflow condition in CC0 / other flags register / predicate register has to be actually exposed in rtl to show the dependency between arithmetic and branch. We should document this quirk in the description of these named patterns. When the machine independent expander machinery wants to expand a trapping arithmetic operation that has no matching named pattern defined by the port, and there is no conditional trap defined, it can than use the arithmetic-and-branch-on-overflow pattern to branch to an abort call if an overflow occurs. To allow branch inversion to work, we don't need to do anything special if the condition is expressed as a comparison against 0 of a 'integer' flag regsiter or a predicate bit. However, if the condition is in CC0 or a CCmode flags register, we want a way to express the overflow and non-overflow conditions so that reverse_condition or REVERSE_CONDITION can do its work. I see two possibilities here. For simplicity I will describe them here in terms of CC0, although many target ports would actually use a scheduler-exposed flags register with an appropriate CCmode mode. - We could have (overflow CC0 0) and (nooverflow CC0 0), where overflow and nooverflow are two new comparison codes, and the trailing 0 is a dummy argument for the sake of consistency with comparison operators. - We could have (ge CC0 overflow) and (lt CC0 overflow), where overflow is a new one-of-a-kind RTX object. - End forwarded message -