Re: 32 bit jump instruction.

2006-12-13 Thread amylaar
Quoting Steven Bosscher <[EMAIL PROTECTED]>:

> On 12/13/06, Joern Rennecke <[EMAIL PROTECTED]> wrote:
> > In http://gcc.gnu.org/ml/gcc/2006-12/msg00328.html, you wrote:
> > However, because the SH has delayed branches, there is always a guaranteed
> way
> > to find a register - one can be saved, and then be restored in the delay
> slot.
>
> Heh, that's an interesting feature :-)
>
> How does that work?  I always thought that the semantics of delayed
> insns is that the insn in the delay slot is executed *before* the
> branch. But that is apparently not the case, or the branch register
> would have been over-written before the branch. How does that work on
> SH?

The jump address is calculated, then the delay slot instruction is
executed - or sometimes, if the instructions are pairable, the delay
slot insn is executed simultanously with the jump address calculations,
then - or during the delay slot insn execution - the target instruction
is fetched, and then executed.  You can look into sim/sh/interp.c for
a functional model of how this works from the programmer's point of view.


Re: matching constraints in asm operands question

2005-03-04 Thread amylaar
> static __inline__ void atomic_add(atomic_t *v, int i)
> {
>   __asm__ __volatile__("addl %1,%0" : "+m" (*v) : "d" (i));
> }
>
> Then the compiler complains with:
>
> /asm/atomic.h:33: warning: read-write constraint does not allow a register
>
> So is the warning wrong?

Yes, the warning is wrong, and the text in the manual about '+' is also
nonsense.  Support for '+' is asms was specifically introduced to make
it safe to have read-write memory operands.  Jason, the point of using '+'
is that the matched parts start out as the same, and the compiler is
supposed to keep them the same.  We initially did this with shared rtl,
but IIRC we changed cse since, but still the same premise holds: the
compiler is supposed to keep both parts of the match in sync.  We
can guarantee this for '+', but we can't for matching constraints,
and this was documented properly till you till you changed extend.texi
in December 2003.



Re: matching constraints in asm operands question

2005-03-05 Thread amylaar
Quoting Jason Merrill <[EMAIL PROTECTED]>:

> Well, I assumed the same thing when I started poking at that code, but then
> someone pointed out that it didn't actually work that way, and as I recall
> the code does in fact assume a register.  I certainly would not object to
> making '+' work properly for memory operands, but simply asserting that it
> already does is wrong.

The code in reload to make non-matching operands match assumes a register.
However, a match from a plus should always kept in sync (except temporarily
half-way through a substitution, since we now unshare).  If it isn't,
that's a regression.  Do you have a testcase, and/or can point out the code
that introduces the inconsistency in the rtl?


Re: diffing directories with merged-as-deleted files?

2005-11-03 Thread amylaar
>> cvs would never do such nonsense.


>Absolutely! It would just print all the directory names in the middle of the
>diffs. I call that nonsense as well.

But the directory names go to stderr.  When you redirect stdout to a file,
a diff without the directory names is written to that file, while you
get an ongoing progress report.


Re: Use of FLAGS_REGNUM clashes with generates insn

2011-09-23 Thread amylaar

Quoting "Paulo J. Matos" :


That's seriously annoying. The idea was to ditch cc0 and explicitly
represent CC in a register to perform optimizations like splitting add
and addc for a double word addition. If by hiding my register flags
means going back to cc0, then it seems that the only way to go unless I
get it to work somehow. If you having anything else in mind to get it
to work let me know.


Hiding the flags register would mean it is not represented in the rtl at
all.  You can have combined compare-branch instructions.
Of course, going that route would mean that the model you present to
GCC is even further from the hardware than one that uses cc0.


What I currently have in mind is to have a backend macro listing all
the move for which a move clobber CC_REG, then whenever GCC generates a
move, it queries the macro to know if the move requires clobbering and
emits the clobber if required. However, I am unsure how deep the rabbit
hole goes.


Oh, so you do have variants that can do without the clobber.
If you can make all the reloads without introducing explicit flag
clobbers, that it should work.
But you can't just pull a flag clobber out of thin air.  You should
have some way to generate valid code when the flags register is
unavailable / must be saved.  Then you can use peephole2 to add
flag clobbers where the flags register is available.

Or you can use machine_dependent_reorg or another machine-specific pass
inserted with the pass manager to rewrite clobber-free instructions into
ones that have a hardware equivalent; but you must make sure that your
data flow remains sound in the process.


Re: Use of FLAGS_REGNUM clashes with generates insn

2011-10-18 Thread amylaar

Quoting Hans-Peter Nilsson :


On Fri, 23 Sep 2011, Joern Rennecke wrote:


Quoting "Paulo J. Matos" :

> My addition instruction sets all the flags. So I have:

This is annoying, but can be handled.  Been there, done that.
dse.c needs a small patch, which I intend to submit sometime in the future.


Could you be persuaded to send it to the list as-is, right now?

Be sure to mark it work-in-progress, or someone might feel
compelled to review it. :)


Attached.

The issue with this patch is that we'll have to check if gen_add3_insn might
fail on any target, and also we have to identify on which targets it will
create an insn that clobbers potentially live hard registers
(like a flags register), and make the substitution fail in this case.
I.e. if in doubt, keep the dead store with the auto-increment.
But not fail for a target that knows what it clobbers in a
reload_in_progress / reload_completed add.

For Epiphany, the add will be expanded with a clobber of a fake hard register,
and this pattern is subject to various peephole2 patterns which usually
find a low-cost substitute.
The point of clobbering a fake hard register is to avoid having passes like
combine creating the pattern with the unresolved flag clobber problem.
The unoptimized add expands into a sequentially issued five-instruction
in order to save/restore the flags to a temp register, which in turn is
saved on the stack.
There are peephole2 patterns to use a constant directly rather than loading
it into a temp register, to clobber the flags register if it is free,
to move the add before an immediately preceding flag-setting instruction,
to find a possibility for a dummy post-modify load from / store to the stack
(for calculating a stack/frame based address), and to scavenge a temp
register to save the flags into without needing to save the temp register.
2011-09-18  Joern Rennecke 

* dse.c (emit_inc_dec_insn_before): Use gen_add3_insn / gen_move_insn.

Index: dse.c
===
--- dse.c   (revision 2071)
+++ dse.c   (revision 2072)
@@ -835,15 +835,15 @@ emit_inc_dec_insn_before (rtx mem ATTRIB
  rtx op ATTRIBUTE_UNUSED,
  rtx dest, rtx src, rtx srcoff, void *arg)
 {
-  rtx insn = (rtx)arg;
-
-  if (srcoff)
-src = gen_rtx_PLUS (GET_MODE (src), src, srcoff);
+  rtx insn = (rtx) arg, new_insn;
 
   /* We can reuse all operands without copying, because we are about
  to delete the insn that contained it.  */
-
-  emit_insn_before (gen_rtx_SET (VOIDmode, dest, src), insn);
+  if (srcoff)
+new_insn = gen_add3_insn (dest, src, srcoff);
+  else
+new_insn = gen_move_insn (dest, src);
+  emit_insn_before (new_insn, insn);
 
   return -1;
 }


RE: Feature request concerning opcodes in the function prolog

2009-01-08 Thread amylaar

Quoting Stefan Dösinger :


I talked to Alexandre again, and his main concern wasn't so much the global
flag, but that the existance of the push %ebp; mov %esp, %ebp was still up
to the feelings of the compiler and the moon phase.

So what he wants is something like a msvc_prolog attribute that makes sure
that the function starts with the following instructions and bytecode
sequence, no matter what -fomit-frame-pointer and friends say:

8b ff mov.s %edi, %edi
55push %ebp
8b ec mov.s %esp, %ebp

So we basically need the msvc_prolog to add the "mov.s %edi, %edi" and force
the frame pointer on,


You don't need to force the frame pointer on, it is sufficient to say that
ebp needs restoring at the end of the function no matter if it looks otherwise
used or not - and you have to take the frame size impact of the saved ebp into
account.

Moreover, if your prologue beings with an unspec_volatile that emits the
three-instruction sequence you want, the optimizers should leave it there
at the start of the function.
Although it is properly easiest to get debug and unwind information right
if you make this three separate unspec_volatile patterns, with their
respective REG_FRAME_RELATED_EXPR notes where applicable.
I.e. the push ebp saves ebp and changes the stack.
The mov.s esp,ebp needs a REG_FRAME_RELATED_EXPR note only if you are
actually using a frame pointer.


RE: Feature request concerning opcodes in the function prolog

2009-01-08 Thread amylaar

Quoting Stefan Dösinger :




If the frame pointer is not needed:
mov.s %edi, %edi
push %ebp
mov.s %esp, %ebp
pop %ebp
; Continue normally here. I think that case can't be improved too much,
since the msvc_prolog stuff modifies the base pointer.


If ebp needs to be saved because it contains a user variable, it is better
not to pop it in the prologue - pop it in the epilogue instead, and you don't
need to have another save/restore.


Now my problem: If the frame pointer is needed, and the stack realignment is
needed:
mov.s %edi, %edi
push %ebp
mov.s %esp, %ebp
pop %ebp
leal ...
push %ebp
mov %esp, %ebp


This can be done with much shorter assembly, at the cost of a bit more
logic in your prologue / epilogue expanders:

mov.s %edi, %edi
push %ebp
mov.s %esp, %ebp
leal ... (adjust value to account for the ebp stack slot)

and in the epilogue, after restoring the stack to what it was prior to  
re-aligning, you do:

pop %ebp


The REG_FRAME_RELATED_EXPR is set with this, right: ?
RTX_FRAME_RELATED_P (insn) = 1;


No, REG_FRAME_RELATED_EXPR is a special kind of note for when the dwarf /
unwind code can't figure out the information from the pattern of the
instruction.  Looking at dwarf2out2.c:dwarf2out_frame_debug_expr ,
I see that you should be able to use a PARALLEL with one part being the
operation actually performed, and another part an UNSPEC_VOLATILE to
prevent unwanted code motion / deletion etc.

When RTX_FRAME_RELATED_P is set on an insn,


I haven't yet figured out what it does exactly.


When RTX_FRAME_RELATED_P is set on an insn, dwarf call frame information
and exception unwinding information will be generated for this instruction
(if this kind of information is generated for this translation unit).
If REG_FRAME_RELATED_EXPR, this specifies the cfi information for this  
instruction; otherwise, dwarf2out_frame_debug_expr will analyze the  
PATTERN

of the instruction itself.


RE: Feature request concerning opcodes in the function prolog

2009-01-12 Thread amylaar

Quoting Stefan Dösinger :


Here's some code attached that actually works, but is far from perfect.

The 'msvc_prologue' attribute is limited to 32 bit. None of the applications
that try to place hooks are ported to Win64 yet, so it is impossible to tell
what they'll need. Besides, maybe I am lucky that when they appear I can
tell their autors to think about Wine.

The first problem I (still) have is passing the msvc_prologue attribute
around. I abused some other code to do that. How does the attribute handling
work, and what would the prefered way be?


Well, you could query the attribute every time you need it, but if that would
cause performance issues or significant code bloat, caching information
computed from the attributes in the machine specific struct is fine.
If you only need a single bit and accesses are not too frequent / often,
you can also consider making the machine struct member a char or bitfield,
so that it can be effectively stored together with other small struct members.


The 2nd thing I have to figure out is how and when I have to set
REG_FRAME_RELATED_EXPR.


It's when there is some operation affecting cfi which is not expressed (in
simple enough terms for dwarf2out.c to grok it) in the rtl instruction
pattern.  Since your nop doesn't affect the call frame or registers, no
call frame information needs to be emitted for it.

You'll have to also change the third instruction in your 'magic' sequence
so that it is or contains an unspec, to prevent it from going walkabout
when optimizations like instruction scheduling is performed.
If you make it a parallel where the actual oprtation is paired with an
empty unspec, no REG_FRAME_RELATED_EXPR is needed.  If the actual operation
is hidden in the RTL, however, you have to add it in a REG_FRAME_RELATED_EXPR.
The latter alternative is more complicated.  However, there is a benefit to
choosing this: win the stack realign or !frame_pointer_needed cases, the
(early) move of esp to ebp is not really supposed to establish a frame
pointer, and thus you then don't want any cfi information emitted for it.
Thus, you can then simply leave out the REG_FRAME_RELATED_EXPR note.



The msvc_prologue + frame_pointer_needed + !stack_realignment_needed case
produces the best possible code. The fp setup from msvc_prologue is used for
its purpose.

The msvc_prologue + !frame_pointer_needed case could be optimized, as you
said. However, that changes all the stack frame offsets and sizes, and I do
not quite understand all the code I have to modify for that. I think this
should be a separate patch, although arguably be ready before msvc_prologue
is added. I personally don't care too much about this combination of
parameters(Wine won't need/use it), so this optimization would get lost
otherwise.


The code needed shouldn't be large, but if nobody would use it, it wouldn't
be tested either, so even if you got it right initially it would be prone
to bitrot.  So if you'd need extra code for it but nobody would use it, just
add a comment in the code and the option documentation that this is an
optimization that could be added / is not implemented.


With stack_realignment_needed frame_pointer_needed is on as well, and this
code is created(copypasted together by hand, somehow the stack alignment
attribute doesn't do anything on my Linux box)

movl.s %edi, %edi
pushl  %ebp
movl.s %esp, %ebp
pop%ebp
lea0x4(%esp),%ecx
and$0xfff0,%esp
pushl  -0x4(%ecx)
push   %ebp
mov%esp,%ebp

If we try to get rid of the pop-push-mov, the following things change:

*) The value of %ebp


Yes, you have to re-do the move from esp to ebp after stack realignment.


*) The location of the pushed ebp on the stack


That should be fine if you make your prologue expect it there.


*) The alignment of %esp after the whole procedure(its alignment+4 before,
and the +4 is lost afterwards)


Alignment +0 is actually better - best would be to make the alignment offset
so that no alignment padding is done for the first highly-aligned stack slot,
but that would be really a general stack size optimization, i.e. it goes
beyond the scope of the current problem, and I don't think you want to
go into this right now.
Basically, what we do with saving ebp before stack realignment is that we
remove the ebp stack slot from the call frame proper.
So you just need to say that the size of the saved registers - and thus
the total frame size - is  4 bytes less than if the ebp slot was included
in the frame.  Make sure that the argument pointer still has the right value,
and everything should be fine.


Re: New GCC Runtime Library Exception: not fit for purpose

2009-01-29 Thread amylaar

Quoting Ian Lance Taylor :


Joern Rennecke  writes:


Quoting Manuel López-Ibáñez :


2009/1/29 Joern Rennecke :



The runtime library license says that you can link libgcc with
proprietary code, whether that proprietary code was compiled with gcc
or whether it was compiled with some non-gcc proprietary compiler.


No, it says that you can only do that if every file of the   
proprietary code
is written or generated in a high level language, and uses the   
GCC runtime.




Where does it say that?


Where does it grant any other permission?
It allows you to propagate a work of Target code formed by combining
the Runtime Library with Independent Modules under certain conditions,
but
it doesn't give you any permission to propagate a work that also includes
code that is neither part of the Runtime Library nor an Independent Module.


I don't think it needs to.

Code that is neither Target Code nor an Independent Module is code
that has never been involved with gcc, and the license does not cover
it.  The license does not prohibit combining Target Code or
Independent Modules with other code, so it is permitted.

Ian





Re: Pushing the limits on vector modes

2013-05-17 Thread amylaar

Quoting Paulo Matos :


Hello,

I am trying to model a predicate register mode that acts like a   
vector. We have a few predicate registers that have 8 bits in size   
but they are set accordingly to the mode of operation (not   
necessarily a comparison). Word size is 64.


Yes need some surgery to the mode generator machinery.  I had the same
problem with the mxp port, which you can still find in older ARC
branches.


Re: Combine pass with reused sources

2013-08-14 Thread amylaar

Quoting "Lu, John" :


However, if I modify the function so that one of the factors is reused,

long f1(long a, long b, long c) {
  res0=((long long) a)*((long long) b);
  res1=((long long) c)*((long long) b);
}

combine will not fuse the reused sign-extension result to generate the
mulhizi3 pattern.

I am wondering if anyone else has hit this issue or if I have done   
something wrong in my port.  Any help would be greatly appreciated.


You need a matching constraint, as is described in md.texi.


Re: type argument in FUNCTION_ARG macro

2012-05-04 Thread amylaar

Quoting BELBACHIR Selim :


Any ideas on how to get around this problem?


You can look at the name of library functions.


Re: Will backend ever see an memory operand with address wrap around?

2012-05-13 Thread amylaar

Quoting "H.J. Lu" :


What is the expect run-time behavior when a + b has
overflow/underflow?


The expectation is wrap-around.  Note that loop strenght reduction can
cause assumed wrap-around semantics in RTL for strictly conforming C input
where no such wrap-around is in evidence.


Re: Will backend ever see an memory operand with address wrap around?

2012-05-13 Thread amylaar

Quoting "H.J. Lu" :


What is the run-time result when overflow happens?


Assuming you use a 32 bit unsigned base address, and the space beyond 4G
is unmapped, you'll get a SEGV.


Fwd: Re: [PATCH][4.3] Deprecate -ftrapv

2008-03-04 Thread amylaar

Somehow this got stuck in the spam filter.

- Forwarded message from [EMAIL PROTECTED] -
Date: Sat, 01 Mar 2008 09:21:21 -0500
From: Joern Rennecke <[EMAIL PROTECTED]>
Reply-To: Joern Rennecke <[EMAIL PROTECTED]>
 Subject: Re: [PATCH][4.3] Deprecate -ftrapv
  To: gcc@gcc.gnu.org
  Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]

On Fri, 29 Feb 2008, Robert Dewar wrote:

Well presumably one would want to use target dependent stuff for
detecting overflow where it exists (sticky overflow bits on
power, O flag on PC, trapping add on MIPS etc).


In fact, when I wrote the original -ftrapv code, it was for the sole purpose
of using the trapping add on mips.

On Sat, 1 Mar 2008, Joseph S. Myers wrote:

The only targets defining the v insn patterns at present
appear to be alpha and pa.


Considering the trouble that you get when you try to generate branches in
a non-branch expander, we should probably have alternate named patterns
to be used in ports to processors that have no conditional trap facility,
or where a conditional trap is more expensive than a well predictable
conditional branch.
We want arithmetic-and-branch-on-overflow patterns for these.
One peculiarity of these patterns would be that they would be required
to expand into more than one instruction, since the write of the result
must not be in the same instruction as the branch due to reload limitations.
Thus the overflow condition in CC0 / other flags register / predicate
register has to be actually exposed in rtl to show the dependency between
arithmetic and branch.
We should document this quirk in the description of these named patterns.

When the machine independent expander machinery wants to expand a
trapping arithmetic operation that has no matching named pattern defined
by the port, and there is no conditional trap defined, it can than use
the arithmetic-and-branch-on-overflow pattern to branch to an abort call
if an overflow occurs.

To allow branch inversion to work, we don't need to do anything special
if the condition is expressed as a comparison against 0 of a 'integer'
flag regsiter or a predicate bit.  However, if the condition is in CC0
or a CCmode flags register, we want a way to express the overflow
and non-overflow conditions so that reverse_condition or REVERSE_CONDITION
can do its work.

I see two possibilities here.  For simplicity I will describe them
here in terms of CC0, although many target ports would actually use a
scheduler-exposed flags register with an appropriate CCmode mode.
- We could have (overflow CC0 0) and (nooverflow CC0 0), where
   overflow and nooverflow are two new comparison codes, and the trailing
   0 is a dummy argument for the sake of consistency with comparison
   operators.
- We could have (ge CC0 overflow) and (lt CC0 overflow), where overflow
   is a new one-of-a-kind RTX object.


- End forwarded message -