Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression

2015-04-20 Thread Shiva Chen
Hi, Jeff

Thanks for your advice.

can_replace_by.patch is the new patch to handle both cases.

pr43920-2.c.244r.jump2.ori is the original  jump2 rtl dump

pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump
after patch  can_replace_by.patch

Could you help me to review the patch?

Thanks again.

Shiva

2015-04-18 0:03 GMT+08:00 Jeff Law :
> On 04/17/2015 03:57 AM, Shiva Chen wrote:
>>
>> Hi,
>>
>> I think the rtl dump in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916
>> is not jump2 phase rtl dump.
>>
>> Because jump2 is after ira, the register number should be hardware
>> register number.
>>
>> the jump2 rtl dump should as follow
>>
>> ...
>> 31: NOTE_INSN_BASIC_BLOCK 5
>> 32: [r6:SI]=r4:SI
>>REG_DEAD r6:SI
>>REG_DEAD r4:SI
>> 33: [r5:SI]=r0:SI
>>REG_DEAD r5:SI
>>REG_DEAD r0:SI
>>  7: r0:SI=0
>>REG_EQUAL 0
>> 85: use r0:SI
>> 86:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 77: barrier
>> 46: L46:
>> 45: NOTE_INSN_BASIC_BLOCK 6
>>  8: r0:SI=r4:SI
>>REG_DEAD r4:SI
>>REG_EQUAL 0x
>> 87: use r0:SI
>> 88:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 79: barrier
>> 54: L54:
>> 53: NOTE_INSN_BASIC_BLOCK 7
>>  9: r0:SI=0x <== lost REG_EQUAL after patch
>> 34: L34:
>> 35: NOTE_INSN_BASIC_BLOCK 8
>> 41: use r0:SI
>> 90:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 89: barrier
>
> Intead of the slim dump, can you please include the full RTL dump.  I find
> those much easier to read.
>
>
>
>>
>> Possible patch for  can_replace_by in cfgcleanup.c.
>>
>> -  if (!note1 || !note2 || !rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0))
>> -  || !CONST_INT_P (XEXP (note1, 0)))
>> +
>> +  if (!note1 || !CONST_INT_P (XEXP (note1, 0)))
>>   return dir_none;
>>
>> +  if (note2)
>> +{
>> +  if (!rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0)))
>> +   return dir_none;
>> +}
>> +  else
>> +{
>> +  if (!CONST_INT_P (SET_SRC (s2))
>> + || !rtx_equal_p (XEXP (note1, 0), SET_SRC (s2)))
>> +   return dir_none;
>> +}
>> +
>>
>> I'm not sure the idea is ok or it might crash something.
>> Any suggestion would be very helpful.
>
> Seems like you're on a reasonable path to me.  I suggest you stick with it.
>
> Basically what it appears you're trying to do is unify insns from different
> blocks where one looks like
>
> (set x y)  with an attached REG_EQUAL note
>
> And the other looks like
>
> (set x const_int)
>
> Where the REG_EQUAL note has the same value as the const_int in the second
> set.
>
> I think you'd want to handle both cases i1 has the note i2, no note and i1
> has no note and i2 has a note.
>
> Jeff
>
> jeff


can_replace_by.patch
Description: Binary data


pr43920-2.c.244r.jump2.ori
Description: Binary data


pr43920-2.c.244r.jump2.patch_can_replace_by
Description: Binary data


Changelog.can_replace_by
Description: Binary data


Re: NIOS and atomic primitives

2015-04-20 Thread Jonathan Wakely
On 19 April 2015 at 03:06, Sandra Loosemore wrote:
> Is it one of the standard g++ or libstdc++ test cases that is failing? I
> could check what nios2-elf does with it if I knew what to look for.

Libstdc++ does not use the __sync built-ins in the 4.9 branch, it has
switched to the __atomic ones which should be provided by libatomic,
if that exists for the target.


RE: try_merge_delay_insn with delay list > 1

2015-04-20 Thread BELBACHIR Selim
I've attached the fixed version of the patch. I've tested it on the trunk with 
my private target.

I can't provide a test because apparently no backend (other than my private 
one) uses delay slots with more that 1 slot.
I was also unable to test the behaviour of this patch for an hypothetic target 
providing delay lots with more that 1 slot AND the possibility to annul 
instruction in delay slots.

It seems to me that this patch is a small enhancement anyway.
I hope it's ok for trunk :)

Regards,

Selim

-Message d'origine-
De : Jeff Law [mailto:l...@redhat.com] 
Envoyé : vendredi 17 avril 2015 18:41
À : BELBACHIR Selim; gcc@gcc.gnu.org
Objet : Re: try_merge_delay_insn with delay list > 1

On 03/10/2015 07:40 AM, BELBACHIR Selim wrote:
> Me again :)
>
> I enhanced my patch because it was not generalized for instructions with N 
> delay_slots.
Mostly OK, though there are some formatting nits that need to be corrected.

We have whitespace around arithmetic, logical and comparison operators to 
separate them from their operands.  So instead of

slot_number+j

Use

slot_number + j

Instead of

j=1

Use

j = 1

Lines should be wrapped at 80 columns.  So you end up with something like this

foo (argument1,
  argument2,
  argument3)

ie, when you wrap, the arguments to the call will line up vertically.

It may help wrapping to create a local variable to hold PATTERN (insn). 
  Call it 'pat' :-)

When referring to variables or parameters in a comment, capitalize them.

The patch may need updating for the trunk, please test it with the trunk when 
you ask  for it to be included on the trunk.

These are all fairly minor issues.  The actual change seems reasonable.

What systems do you have that you could do a bootstrap and regression test 
with?  Ideally since you're changing the delay slot branching code it'd be a 
system with delay slots :-)  If that's not possible because you don't have 
access to a bootstrapping system with delay slots, make sure to mention it.

Ideally you'd have a test for this bug. However, with a private port I wouldn't 
consider it a necessity.  However, you may want to go ahead and create one for 
internal uses.  And if you ever submit your port to the offficial sources, you 
can include target specific tests at that time.




try_merge_patch3
Description: try_merge_patch3


How to get anchors to onlinedocs that can be used in external documents?

2015-04-20 Thread Georg-Johann Lay

How to add an anchor to one of the onlinedocs texi documents?

Suppose I'd like to add an HTML anchor to one of the onlinedocs, for example to 
link it from the gcc release notes.


Currently the only anchors are either auto-generated and contain some hashes 
(hence not usable from external documents) or are trivial like:



x86-Windows-Options.html#x86-Windows-Options
Submodel-Options.html#Submodel-Options
x86-transactional-memory-intrinsics.html#x86-transactional-memory-intrinsics
...

and are always pointing to the top of the respective HTML page.

How can I get an anchor to, say, a subsection without fragmenting the 
respective HTML document into dozens of individual documents and thereby 
changing the document layout, which is not what I want.


Examples as listed in the GCC onlindocs "Table of Contents":

- 3.17.1.1 -march and -mcpu Feature Modifiers
- 3.17.5.1 EIND and Devices with More Than 128 Ki Bytes of Flash
- 6.16.4 SPU Named Address Spaces
- 6.43.2.8 x86 Floating-Point asm Operands

All these entries in the TOC don't link to the indicated place but to the
respective document's header, e.g. "x86 Floating-Point asm Operands" does not 
forward to that section but to "Extended-Asm.html#Extended-Asm" which contains 
9 subsections.



Johann


Do we have any plans to un-flatten our header files?

2015-04-20 Thread Martin Jambor
Hi,

because I really dislike the hassle our (almost) flattened header
files cause quite often, I have made a very simple experiment to find
out how the header files really depend on each other.  Some results,
together with a dozen of short paragraphs of relevant text are here:

http://labs.suse.cz/mjambor/150417-headers_in_gcc/headers_in_gcc.html

Short summary: I wrote a script removing an include at a time and
parsing errors and recording such discovered "dependencies."  The page
has text-file reports showing what needs to be included before what,
in which files and because of which error messages.  I have also
plotted a few graphs from these "dependencies."  The pictures look
interesting, I am not sure if they are in any way useful, but have
a look and judge for yourself. 

Regardless of whether or not this experiment had any value.  Do we
plan to do un-flattening of header files in the course of the next
release or two?  The pain the flat header files is real and enduring
it for much longer or indefinitely seems unhealthy to me.

Thanks,

Martin


Re: Do we have any plans to un-flatten our header files?

2015-04-20 Thread Richard Biener
On Mon, Apr 20, 2015 at 3:02 PM, Martin Jambor  wrote:
> Hi,
>
> because I really dislike the hassle our (almost) flattened header
> files cause quite often, I have made a very simple experiment to find
> out how the header files really depend on each other.  Some results,
> together with a dozen of short paragraphs of relevant text are here:
>
> http://labs.suse.cz/mjambor/150417-headers_in_gcc/headers_in_gcc.html
>
> Short summary: I wrote a script removing an include at a time and
> parsing errors and recording such discovered "dependencies."  The page
> has text-file reports showing what needs to be included before what,
> in which files and because of which error messages.  I have also
> plotted a few graphs from these "dependencies."  The pictures look
> interesting, I am not sure if they are in any way useful, but have
> a look and judge for yourself.
>
> Regardless of whether or not this experiment had any value.  Do we
> plan to do un-flattening of header files in the course of the next
> release or two?  The pain the flat header files is real and enduring
> it for much longer or indefinitely seems unhealthy to me.

The goal was to end up with a few aggregating includes so you
just include sth like ssa-pass.h from a tree pass that works on
SSA form.  Or gimple.h for anything that only needs to look at
GIMPLE stmts.

The first step was the flattening to see what dependencies we have
(ok, it's not really good at that...) and to move stuff (mostly prototypes)
between files to reduce dependencies.

It's easy to create a ssa-pass.h file, but it's hard to determine
the minimum stuff that should be in there (no rtl.h for example).

For Andrews project (that tree stuff) the most important thing
is to not have tree(-core?).h leak into files that only use GIMPLE
(and thus the "valid" parts of tree).

Richard.

> Thanks,
>
> Martin


Re: NIOS and atomic primitives

2015-04-20 Thread Joel Sherrill


On 4/20/2015 5:04 AM, Jonathan Wakely wrote:
> On 19 April 2015 at 03:06, Sandra Loosemore wrote:
>> Is it one of the standard g++ or libstdc++ test cases that is failing? I
>> could check what nios2-elf does with it if I knew what to look for.
> Libstdc++ does not use the __sync built-ins in the 4.9 branch, it has
> switched to the __atomic ones which should be provided by libatomic,
> if that exists for the target.
This is with 4.9.2.  This is the only RTEMS target giving this error so
I must
be missing the magic bit of configurery for it.

/users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49:
undefined reference to `__sync_fetch_and_add_4'
/users/joel/rtems-4.11-work/tools/lib/gcc/nios2-rtems4.11/4.9.2/libstdc++.a(locale-inst.o):/users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49:
more undefined references to `__sync_fetch_and_add_4' follow

Where should I be looking?

-- 
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985



Re: NIOS and atomic primitives

2015-04-20 Thread Jakub Jelinek
On Mon, Apr 20, 2015 at 11:14:48AM -0500, Joel Sherrill wrote:
> 
> 
> On 4/20/2015 5:04 AM, Jonathan Wakely wrote:
> > On 19 April 2015 at 03:06, Sandra Loosemore wrote:
> >> Is it one of the standard g++ or libstdc++ test cases that is failing? I
> >> could check what nios2-elf does with it if I knew what to look for.
> > Libstdc++ does not use the __sync built-ins in the 4.9 branch, it has
> > switched to the __atomic ones which should be provided by libatomic,
> > if that exists for the target.
> This is with 4.9.2.  This is the only RTEMS target giving this error so
> I must
> be missing the magic bit of configurery for it.
> 
> /users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49:
> undefined reference to `__sync_fetch_and_add_4'
> /users/joel/rtems-4.11-work/tools/lib/gcc/nios2-rtems4.11/4.9.2/libstdc++.a(locale-inst.o):/users/joel/rtems-4.11-work/rtems-source-builder/rtems/build/nios2-rtems4.11-gcc-4.9.2-newlib-2.2.0.20150323-x86_64-linux-gnu-1/build/nios2-rtems4.11/libstdc++-v3/include/ext/atomicity.h:49:
> more undefined references to `__sync_fetch_and_add_4' follow
> 
> Where should I be looking?

Look where your atomicity.h comes from?
On the 4.9 branch, none of the atomicity.h headers included in vanilla
libstdc++-v3 have any __sync_* uses.

Jakub


RE: IRA preferencing issues

2015-04-20 Thread Wilco Dijkstra

Interestingly even when the preferences are accurate, lra_constraints
completely ignores the preferred/allocno class. If the cost of 2 alternatives
is equal in every way (which will be the case if they are both legal matches
as the standard cost functions are not used at all), the wrong one may be 
chosen depending on the order in the MD file. This is particularly bad when the
spill optimization pass later removes some of the spills and correctly allocates
them to their preferred allocno class...

Forcing win = true if the register class of the alternative intersects with 
the preferred class generates significantly better spill code for cases where
the preference is accurate (ie. not just ALL_REGS), resulting in far less
confusion between integer and FP registers. 

So shouldn't get_reg_class return the preference/allocno class like below rather
than NO_REGS?

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 0ddd842..f38914a 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -263,7 +263,10 @@ get_reg_class (int regno)
 }
   if (regno >= new_regno_start)
 return lra_get_allocno_class (regno);
-  return NO_REGS;
+  return reg_preferred_class (regno);
}

Wilco





Re: Do we have any plans to un-flatten our header files?

2015-04-20 Thread Andrew MacLeod

On 04/20/2015 09:18 AM, Richard Biener wrote:

On Mon, Apr 20, 2015 at 3:02 PM, Martin Jambor  wrote:

Hi,

because I really dislike the hassle our (almost) flattened header
files cause quite often, I have made a very simple experiment to find
out how the header files really depend on each other.  Some results,
together with a dozen of short paragraphs of relevant text are here:

http://labs.suse.cz/mjambor/150417-headers_in_gcc/headers_in_gcc.html

Short summary: I wrote a script removing an include at a time and
parsing errors and recording such discovered "dependencies."  The page
has text-file reports showing what needs to be included before what,
in which files and because of which error messages.  I have also
plotted a few graphs from these "dependencies."  The pictures look
interesting, I am not sure if they are in any way useful, but have
a look and judge for yourself.

The problem with this simple approach (which was the original one) is 
that you miss things like target macros and conditional compilation.  
There was a long discussion about why that doesnt work in general, but 
it boils down to constructs like


header.h
#define HAS_LOOP_INSN

file.c
#if HAS_LOOP_INSN
  code;
#end


file.c compiles and executes perfectly fine without header.h...  all you 
get is suboptimal code which would be hellish to find later. The same 
thing applies to header files which may depend on other header files... 
this isnt a compilation-fail dependency   Most of them will miss tm.h in 
particular, but require some closer examination to see if they are 
immune to this kind of thing.

Regardless of whether or not this experiment had any value.  Do we
plan to do un-flattening of header files in the course of the next
release or two?  The pain the flat header files is real and enduring
it for much longer or indefinitely seems unhealthy to me.

The goal was to end up with a few aggregating includes so you
just include sth like ssa-pass.h from a tree pass that works on
SSA form.  Or gimple.h for anything that only needs to look at
GIMPLE stmts.

The first step was the flattening to see what dependencies we have
(ok, it's not really good at that...) and to move stuff (mostly prototypes)
between files to reduce dependencies.

It's easy to create a ssa-pass.h file, but it's hard to determine
the minimum stuff that should be in there (no rtl.h for example).

For Andrews project (that tree stuff) the most important thing
is to not have tree(-core?).h leak into files that only use GIMPLE
(and thus the "valid" parts of tree).

Richard.




I plan to spend some time and get it wrapped it during this stage 
1...It flattened enough to be useful, and we ought to be able to 
product better aggregators.


Andrew



Re: try_merge_delay_insn with delay list > 1

2015-04-20 Thread Jeff Law

On 04/20/2015 05:08 AM, BELBACHIR Selim wrote:

I've attached the fixed version of the patch. I've tested it on the trunk with 
my private target.

I can't provide a test because apparently no backend (other than my private 
one) uses delay slots with more that 1 slot.
I was also unable to test the behaviour of this patch for an hypothetic target 
providing delay lots with more that 1 slot AND the possibility to annul 
instruction in delay slots.

It seems to me that this patch is a small enhancement anyway.
I hope it's ok for trunk :)
Even for small enhancements or bugfixes, we try to at least do some 
basic testing.  Unfortunately with no sparc or mips machines in the 
compile farm, good testing of a reorg.c change is hard.


I built mips-elf cross tools and used those to compile newlib for 
mips-elf.  Then I applied your patch, rebuilt the compiler and used that 
to compile newlib again.  Then I compared all the objects from the two 
copies of newlib and verified the code we generated as identical.  So 
there's at least some degree of confidence we didn't mess anything up in 
the single delay slot case.


I fixed a couple more minor formatting problems and installed your 
change on the trunk.


jeff




Announcing Segher Boessenkool as combine.c maintainer

2015-04-20 Thread Jeff Law
I'm pleased to announce that Segher Boessenkool has been appointed as 
maintainer for instruction combiner (combine.c).


Segher, can you please add yourself to the MAINTAINERS file for the 
additional role.


Thanks,

jeff




Re: IRA preferencing issues

2015-04-20 Thread Vladimir Makarov



On 17/04/15 09:26 AM, Matthew Fortune wrote:

Wilco Dijkstra  writes:

While investigating why the IRA preferencing algorithm often chooses
incorrect preferences from the costs, I noticed this thread:
https://gcc.gnu.org/ml/gcc/2011-05/msg00186.html

I am seeing the exact same issue on AArch64 - during the final
preference selection ira-costs takes the union of any register classes
that happen to have equal cost. As a result many registers get ALL_REGS
as the preferred register eventhough its cost is much higher than either
GENERAL_REGS or FP_REGS. So we end up with lots of scalar SIMD
instructions and expensive int<->FP moves in integer code when register
pressure is high. When the preference is computed correctly as in the
proposed patch (choosing the first class with lowest cost, ie.
GENERAL_REGS) the resulting code is much more efficient, and there are
no spurious SIMD instructions.

Choosing a preferred class when it doesn't have the lowest cost is
clearly incorrect. So is there a good reason why the proposed patch
should not be applied? I actually wonder why we'd ever need to do a
union - if there are 2 classes with equal cost, you'd use the 2nd as the
alternative class.

The other question I had is whether there is a good way to get improve
the preference in cases like this and avoid classes with equal cost
altogether. The costs are clearly not equal: scalar SIMD instructions
have higher latency and require extra int<->FP moves. It is possible to
mark variants in the MD patterns using '?' to discourage them but that
seems like a hack, just like '*'. Is there a general way to say that
GENERAL_REGS is preferred over FP_REGS for SI/DI mode?

MIPS has the same problem here and we have been looking at ways to address
it purely via costings rather than changing IRA. What we have done so
far is to make the cost of a move from GENERAL_REGS to FP_REGS more
expensive than memory if the move has an integer mode. The goal for MIPS
is to never allocate an FP register to an integer mode unless it was
absolutely necessary owing to an integer to fp conversion where the
integer has to be put in an FP register. Ideally I'd like a guarantee
that FP registers will never be used unless a floating point type is
present in the source but I haven't found a way to do that given the
FP-int conversion issue requiring SImode to be allowed in FP regs.

The patch for MIPS is not submitted yet but has eliminated the final
two uses of FP registers when building the whole Linux kernel with
hard-float enabled. I am however still not confident enough to say
you can build integer only code with hard-float and never touch an FP
register.

Since there are multiple architectures suffering from this I guess we
should look at properly addressing it in generic code.


Wilco, Matt, thanks for sharing your problem cases.  It would be nice if 
you provide a small test case and fill PR in GCC bugzilla, or point me 
one if it already exists.


Preferred and alternative classes are from old RA which implemented 
Chow's priority based coloring.  This algorithm has a different coloring 
criteria, generally speaking it can be considered as simultaneous 
coloring and assigning and permits easy usage of several different 
priorities reg classes for a pseudo.


IRA uses Chaitin-Briggs coloring originally with Kempe's criteria which 
is a standard now for industrial optimizing compilers (LLVM is a rare 
exclusion).  It assumes that we have non-intersected reg-classes and 
each pseudo belongs to one class.  New coloring criteria developed on 
ones proposed by Smith's and Holloway were lately added to IRA.  These 
criteria permit pseudo-register classes form tree (one class can fully 
include another class).  Preferred and alternative classes can not be 
integrated to Chaitin-Briggs approach (as to
undeservedly forgotten Ershow's graph coloring based on merging 
non-connected nodes of conflict graph).


So we need to use one class in IRA for a pseudo.  If we have mem-mem 
move through a register and floating point or integer register, it would 
be wrong to choose only one class in case when only one integer or fp 
register pressure is high.


IRA costs has many drawbacks and I planned to improve it.  I believe it 
should choose insn alternatives first and reg classes after that based 
on the chosen alternatives.


Wilco, I might be wrong and the patch you mentioned works well (e.g. 
probability of the above case mem-mem move is small).  If you provide 
data (e.g. on SPEC2000/SPEC2006) for your target how the patch improves 
the code, we could consider it as some temporary (it might become a 
permanent) solution until ira-costs.c is rewritten. I myself could 
measure the patch effect on x86/x86-64 to make a final decision.


Thanks.



Re: PR63633: May middle-end come up width hard regs for insn expanders?

2015-04-20 Thread Vladimir Makarov



On 17/04/15 05:58 AM, Georg-Johann Lay wrote:
I allowed me to CC Vladimir; maybe he can propose how the backend can 
describe an efficient, constraint-based solution.  The problem is 
about expanders producing insns with non-fixed hard-regs as in/out 
operands or clobbers.  This includes move insn from non-generic 
address spaces which require dedicated hard regs. Issue is about 
correctness and efficiency of generated code.


I might be wrong but I think you have a bloated code because you use 
scratches.  I already told several times that usage of scratch is always 
a bad idea.  It was a bad idea for an old RA and is still a bad idea for 
IRA.  The usage of scratches should be prohibited, probably we should 
write it somewhere.  It is better to use just a regular pseudo instead.


Why it is a bad idea?  Because IRA (or the old global RA) does not take 
them into account *at all*.  It means that IRA might think that there 
are enough registers for pseudos but in reality it is wrong because of 
scratches in live range of the pseudos.


If it is not the case I should investigate why you have a bloated code 
and small test would help here.


Thanks,  I hope my comments will be useful.




Re: PR63633: May middle-end come up width hard regs for insn expanders?

2015-04-20 Thread Steven Bosscher
On Mon, Apr 20, 2015 at 10:11 PM, Vladimir Makarov wrote:
> I might be wrong but I think you have a bloated code because you use
> scratches.  I already told several times that usage of scratch is always a
> bad idea.  It was a bad idea for an old RA and is still a bad idea for IRA.
> The usage of scratches should be prohibited, probably we should write it
> somewhere.  It is better to use just a regular pseudo instead.

Thanks Vladimir, I didn't know this.
Does this mean that, for example, extendsidi in i386.md would be
better if it did not use match_scratch?

The expander and the 32-bits version of the insn currently look like this:

(define_expand "extendsidi2"
  [(set (match_operand:DI 0 "register_operand")
(sign_extend:DI (match_operand:SI 1 "register_operand")))]
  ""
{
  if (!TARGET_64BIT)
{
  emit_insn (gen_extendsidi2_1 (operands[0], operands[1]));
  DONE;
}
})

(define_insn "extendsidi2_1"
  [(set (match_operand:DI 0 "nonimmediate_operand" "=*A,r,?r,?*o")
(sign_extend:DI (match_operand:SI 1 "register_operand" "0,0,r,r")))
   (clobber (reg:CC FLAGS_REG))
   (clobber (match_scratch:SI 2 "=X,X,X,&r"))]
  "!TARGET_64BIT"
  "#") // there is a post reg-alloc splitter

If I understand your remark on SCRATCH correctly, then the expander
should use gen_reg_rtx for operand 2 of extendsidi2_1; and the insn
should use match_operand on operand 2 instead of match_scratch.
Correct?


When is a scratch still OK?

The most common pattern in i386 is (clobber (match_scratch ...)))
which probably always results in the need for a register that IRA
doesn't see, so I assume that this is one of the cases where you would
recommend using a pseudo instead...? (A quick grep on config/*.md
shows 952 cases like this, and 113 match_scratch uses of a different
form.)

A match_scratch or (clobber (scratch)) in a define_peephole2 should
also always be fine, because that's just a way to see if there is a
suitable register available to perform the peephole code
transformation (peephole2 runs after reg-alloc).

(mem:BLK (scratch)) is always OK,


> Why it is a bad idea?  Because IRA (or the old global RA) does not take
> them into account *at all*.  It means that IRA might think that there are
> enough registers for pseudos but in reality it is wrong because of
> scratches in live range of the pseudos.

Is there a reason why IRA doesn't replace scratches with pseudos, like
LRA does (and IIRC reload does, also)?

Ciao!
Steven


[wwwdocs] PATCH for Re: [ANN] gcc-lua: Lua plugin for the GNU Compiler Collection

2015-04-20 Thread Gerald Pfeifer
Hi Peter,

On Wed, 31 Oct 2012, Peter Colberg wrote:
> gcc‑lua extends the GNU Compiler Collection with the ability to run Lua 
> scripts. The plugin provides an interface to register callback functions 
> for plugin events, and inspect the abstract syntax tree of a translation 
> unit. The plugin is useful for static C code analysis. gcc‑lua supports 
> GCC 4.6 or 4.7, and, Lua 5.1 or 5.2, or LuaJIT.

I added this to our GCC Extensions page at 
https://gcc.gnu.org/extensions.html with the patch below.

Let us know if you'd like to see this tweaked or changed
otherwise.

Thanks for the heads-up!

Gerald

Index: extensions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/extensions.html,v
retrieving revision 1.53
diff -u -r1.53 extensions.html
--- extensions.html 16 Apr 2012 12:46:03 -  1.53
+++ extensions.html 20 Apr 2015 22:49:19 -
@@ -26,6 +26,13 @@
   to ease development of GCC plugin-like extensions.
 
 
+http://colberg.org/gcc-lua/";>Lua plugin for GCC
+
+gcc‑lua extends GCC with the ability to run Lua scripts. The plugin
+  provides an interface to register callback functions for plugin events,
+  and inspect the abstract syntax tree of a translation unit.
+
+
 http://runtime.bordeaux.inria.fr/StarPU/";>StarPU
 
 StarPU is a GCC extension and run-time support library for hybrid

Re: PR63633: May middle-end come up width hard regs for insn expanders?

2015-04-20 Thread Vladimir Makarov



On 20/04/15 06:27 PM, Steven Bosscher wrote:

On Mon, Apr 20, 2015 at 10:11 PM, Vladimir Makarov wrote:

I might be wrong but I think you have a bloated code because you use
scratches.  I already told several times that usage of scratch is always a
bad idea.  It was a bad idea for an old RA and is still a bad idea for IRA.
The usage of scratches should be prohibited, probably we should write it
somewhere.  It is better to use just a regular pseudo instead.

Thanks Vladimir, I didn't know this.
Does this mean that, for example, extendsidi in i386.md would be
better if it did not use match_scratch?

The expander and the 32-bits version of the insn currently look like this:

(define_expand "extendsidi2"
   [(set (match_operand:DI 0 "register_operand")
 (sign_extend:DI (match_operand:SI 1 "register_operand")))]
   ""
{
   if (!TARGET_64BIT)
 {
   emit_insn (gen_extendsidi2_1 (operands[0], operands[1]));
   DONE;
 }
})

(define_insn "extendsidi2_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=*A,r,?r,?*o")
 (sign_extend:DI (match_operand:SI 1 "register_operand" "0,0,r,r")))
(clobber (reg:CC FLAGS_REG))
(clobber (match_scratch:SI 2 "=X,X,X,&r"))]
   "!TARGET_64BIT"
   "#") // there is a post reg-alloc splitter

If I understand your remark on SCRATCH correctly, then the expander
should use gen_reg_rtx for operand 2 of extendsidi2_1; and the insn
should use match_operand on operand 2 instead of match_scratch.
Correct?

This case is more complicated than one for the AVR patch where scratch 
should always get a register.  I've should not have been too 
unconditional about prohibiting scratches.

When is a scratch still OK?
There are some insns where scratch will rest as scratch after RA. The 
above insn is an example (constraint X).  If it is changed by pseudo in 
RA, the pseudo in such case might get a stack slot. It is not a big deal 
as scratch pseudo are short-live one and probability of stack space 
increase is small.  But for small stackless functions the penalty could 
be big.


For the above case it is also hard for me to say now will the pseudo use 
in IRA improve the allocation because we don't know yet what is the 
final insn alternative.  We might assign a hard reg in IRA and spill 
other pseudos for this, although the final alternative will use 
constraint X.


If scratch should always get a register, that what you describe above 
would be a right thing to do.  In such case using scratch is really a 
bad thing.

The most common pattern in i386 is (clobber (match_scratch ...)))
which probably always results in the need for a register that IRA
doesn't see, so I assume that this is one of the cases where you would
recommend using a pseudo instead...? (A quick grep on config/*.md
shows 952 cases like this, and 113 match_scratch uses of a different
form.)

A match_scratch or (clobber (scratch)) in a define_peephole2 should
also always be fine, because that's just a way to see if there is a
suitable register available to perform the peephole code
transformation (peephole2 runs after reg-alloc).

(mem:BLK (scratch)) is always OK,


Right, there are cases where scratch is useful.

Why it is a bad idea?  Because IRA (or the old global RA) does not take
them into account *at all*.  It means that IRA might think that there are
enough registers for pseudos but in reality it is wrong because of
scratches in live range of the pseudos.

Is there a reason why IRA doesn't replace scratches with pseudos, like
LRA does (and IIRC reload does, also)?

As I remember reload does not do it but of course it takes scratches 
into account.  When reload processes insn, it looks what reload 
registers are needed for operands including scratches.  LRA process all 
pseudos not just reload ones having a bigger picture and of course it 
should looks at scratch too.  All scratches are changed by pseudos to 
simplify implementation of LRA and spilled pseudo are transformed back 
into scratches.


IRA could use the same approach with scratches as LRA but as I wrote 
above for some cases it is hard to say is it a right thing to do as we 
don't know the final insn alternative.  For LRA, changing scratches by 
pseudos is the best approach as we assign hard registers after deciding 
the final insn alternative.


So I should have written in my first email that usage of scratches 
should be deprecated when it has register constraints without X.




Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression

2015-04-20 Thread Jeff Law

On 04/20/2015 01:09 AM, Shiva Chen wrote:

Hi, Jeff

Thanks for your advice.

can_replace_by.patch is the new patch to handle both cases.

pr43920-2.c.244r.jump2.ori is the original  jump2 rtl dump

pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump
after patch  can_replace_by.patch

Could you help me to review the patch?

Thanks.  This looks pretty good.

I expanded the comment for the new function a bit and renamed the 
function in an effort to clarify its purpose.  From reviewing 
can_replace_by, it seems it should have been handling this case, but 
clearly wasn't due to implementation details.


I then bootstrapped and regression tested the patch on x86_64-linux-gnu 
where it passed.  I also instrumented that compiler to see how often 
this code triggers.  During a bootstrap it triggers a couple hundred 
times (which is obviously a proxy for cross jumping improvements).  So 
it's triggering regularly on x86_64, which is good.


I also verified that this fixes BZ64916 for an arm-non-eabi toolchain 
configured with --with-arch=armv7.


Installed on the trunk.  No new testcase as it's covered by existing tests.

Thanks,,
jeff



Re: PR63633: May middle-end come up width hard regs for insn expanders?

2015-04-20 Thread Segher Boessenkool
On Tue, Apr 21, 2015 at 12:27:40AM +0200, Steven Bosscher wrote:
> On Mon, Apr 20, 2015 at 10:11 PM, Vladimir Makarov wrote:
> > I might be wrong but I think you have a bloated code because you use
> > scratches.  I already told several times that usage of scratch is always a
> > bad idea.  It was a bad idea for an old RA and is still a bad idea for IRA.
> > The usage of scratches should be prohibited, probably we should write it
> > somewhere.  It is better to use just a regular pseudo instead.
> 
> Thanks Vladimir, I didn't know this.
> Does this mean that, for example, extendsidi in i386.md would be
> better if it did not use match_scratch?

The combiner can add or remove clobbers of scratches whenever needed,
but it cannot do that for clobbers of pseudos.


Segher