ARM's Changing Call Used Registers Causes Weird Bugs

2015-06-08 Thread lin zuojian
Hi,
in arm.c 
static void
arm_conditional_register_usage (void)
...
  if (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP)
{
  /* VFPv3 registers are disabled when earlier VFP
 versions are selected due to the definition of
 LAST_VFP_REGNUM.  */
  for (regno = FIRST_VFP_REGNUM;
   regno <= LAST_VFP_REGNUM; ++ regno)
{
  fixed_regs[regno] = 0;
  call_used_regs[regno] = regno < FIRST_VFP_REGNUM + 16
|| regno >= FIRST_VFP_REGNUM + 32;
}
}

these lines will change the called used registers, when using
compiler flags like: -mfpu=neon.
That causes weird bugs. Consider the situation in Android ARM
architecture: I have a shared object supposed to run in a neon cpu,
and -mfpu=neon added. But the system is not compiled using this
flag. So when calling the system's library,  my code will risk using
the clobbered d8-d16.
The example will be:
while (true) {
struct my_struct s = {0}; // my_struct is 8 bytes long.
call_system_library...
}
in this example. d8 is used to initialize s to zero. The assembly
code like:
push {d8} // because d8 is not call used.
// the loop header
vmov.i32 d8, #0
// the loop body
vstr d8, &s
bl system_library
b loop_body

d8 is clobbered after branch link to system library, so the second
loop will initialize s to random value, which causes crash.

So I am forced to remove the -mfpu=neon for compatibility. My
question is whether the gcc code show above confront to ARM
standard. If so, why ARM make such a weird standard.
--
Lin Zuojian


Is that a problem?

2014-08-21 Thread lin zuojian
Hi,
After applied a patch to GCC to make it warn about strict aliasing
violating, like this: 
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index b6ecaa4..95e745c 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -2913,6 +2913,10 @@ setup_one_parameter (copy_body_data *id, tree p, tree 
value, tree fn,
}
 }
 
+  if (warn_strict_aliasing > 2)
+if (strict_aliasing_warning (TREE_TYPE (rhs), TREE_TYPE(p), rhs))
+  warning (OPT_Wstrict_aliasing, "during inlining function %s into 
function %s", fndecl_name(fn), function_name(cfun));
+
Compiling gcc/testsuite/g++.dg/opt/pmf1.C triggers that warning:

gcc/testsuite/g++.dg/opt/pmf1.C: In function 'int main()':
gcc/testsuite/g++.dg/opt/pmf1.C:72:42: warning: dereferencing type-punned 
pointer will break strict-aliasing rules. With expression: &t, type of 
expresssion: struct Container *, type to cast: struct WorldObject * const. 
[-Wstrict-aliasing]
 t.forward(itemfunptr, &Item::fred, 1);
  ^
gcc/testsuite/g++.dg/opt/pmf1.C:72:42: warning: during inlining function void 
WorldObject::forward(memfunT, arg1T, arg2T) [with memfunT = void 
(Container::*)(void (Item::*)(int), int); arg1T = void (Item::*)(int); arg2T = 
int; Derived = Container] into function int main() [-Wstrict-aliasing]

It that a problem here? We try to case type Container to its base
type WorldObject, and that violating the strict aliasing? Let's take
a look at this.

--
Lin Zuojian 


Re: Is that a problem?

2014-08-21 Thread lin zuojian
Hi,
I knew what is going on now. strict_aliasing_warning has not
considered tbaa. We might want to fix it.
--
Lin Zuojian


record_component_aliases will not record the bases with no field declaration

2014-08-29 Thread lin zuojian
Hi,
record_component_aliases only handle fields of a record type, and
would ignore the base of this record if the base has no field
declaration.
Is this a bug or on purpose?
Thanks.
--
Lin Zuojian


Re: record_component_aliases will not record the bases with no field declaration

2014-08-29 Thread lin zuojian
Thanks Richard.
---
Lin Zuojian


Visualizing Call Hierarchy In Chromium Browser

2014-09-02 Thread lin zuojian
Hi,
I use chrome://tracing to demonstrate the hierarchy of ira-color.c.
And I think it's helpful to understand the code.
If anyone feels interested in it, visit the project in
github:https://github.com/linzj/gen-trace

Its very easy to use, that's why I don't use the code of chromium.
Just include the ctrace.h, add C_TRACE_0 to the functions you may
interested, and load the trace.json file from chrome://tracing.

Sorry I can't send the png attachment. The graph looks like this:
|   ira color   |
|   |

...
|   color_allocnos  |
|   |

  | form_..forest|   | push.. |
  |  |   ||
   --        --
--
Lin Zuojian


df.c Actully Introduces A Way To Think Of Algorithm

2014-10-15 Thread lin zuojian
Hi,
I don't know if this is mentioned before, df.c is introducing a very
good way to think of algorithms. It brings two effective orders to
handle many problems: DF_FORWARD and DF_BACKWARD. They means two
fact respectively: a subproblem will not be accessed until all
its parents get accessed; And a problem will not be accessed until
all its subproblems get accessed.

The single source shortest path will have a O(n) solution instead of
O(n + log(n)) if using DF_FORWARD. Defining
df_confluence_function_n like following:
{
if(e->wight + e->src->total_wight < e.dest->total_wight)
{
e->dest.total_wight = e->wight + e->src->total_wight;
e->dest.parent = e->src;
}
}

I think it's really O(n) for a dfs right?
That's what I want to proposed.
--
Lin Zuojian


A Question About LRA/reload

2014-12-09 Thread lin zuojian
Hi,
I have read ira/lra code for a while, but still fails to understand
their relationship. The main question is why ira do color so early?
lra pass will do the assignment anyway. Sorry if I mess up coloring
and hard register assignment, but I think it's better to get job
done after lra elimiation, inheriation, ...

Any professional can help me out? Thanks.
---
Lin Zuojian


Re: A Question About LRA/reload

2014-12-09 Thread lin zuojian
Hi Kugan,
I have read these pdfs. My question is LRA will change the insns, so
why brother do the coloring so early. Changing the insns can
generates new pseudo registers, so they needs to re-assign. Is that
correct?
--
Thanks Kugan

On Tue, Dec 09, 2014 at 09:08:46PM +1100, Kugan wrote:
> On 09/12/14 20:37, lin zuojian wrote:
> > Hi,
> > I have read ira/lra code for a while, but still fails to understand
> > their relationship. The main question is why ira do color so early?
> > lra pass will do the assignment anyway. Sorry if I mess up coloring
> > and hard register assignment, but I think it's better to get job
> > done after lra elimiation, inheriation, ...
> 
> 
> IRA does the register allocation and LRA matches insn constraints.
> Therefore IRA has to do the coloring. LRA, in the process matching
> constraints may change some of these assignment. Please look at the
> following links for more info.
> 
> https://ols.fedoraproject.org/GCC/Reprints-2007/makarov-reprint.pdf
> https://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=Local_Register_Allocator_Project_Detail.pdf
> 
> 
> Thanks,
> Kugan
> 


Re: A Question About LRA/reload

2014-12-09 Thread lin zuojian
Moreover, LRA assignment does not refer to the assignment result of ira
directly.  In find_hard_regno_for, the value of hard_regno comes from
ira_class_hard_regs[rclass][i] with least cost.

On Tue, Dec 09, 2014 at 06:14:29PM +0800, lin zuojian wrote:
> Hi Kugan,
> I have read these pdfs. My question is LRA will change the insns, so
> why brother do the coloring so early. Changing the insns can
> generates new pseudo registers, so they needs to re-assign. Is that
> correct?
> --
> Thanks Kugan
> 
> On Tue, Dec 09, 2014 at 09:08:46PM +1100, Kugan wrote:
> > On 09/12/14 20:37, lin zuojian wrote:
> > > Hi,
> > > I have read ira/lra code for a while, but still fails to understand
> > > their relationship. The main question is why ira do color so early?
> > > lra pass will do the assignment anyway. Sorry if I mess up coloring
> > > and hard register assignment, but I think it's better to get job
> > > done after lra elimiation, inheriation, ...
> > 
> > 
> > IRA does the register allocation and LRA matches insn constraints.
> > Therefore IRA has to do the coloring. LRA, in the process matching
> > constraints may change some of these assignment. Please look at the
> > following links for more info.
> > 
> > https://ols.fedoraproject.org/GCC/Reprints-2007/makarov-reprint.pdf
> > https://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=Local_Register_Allocator_Project_Detail.pdf
> > 
> > 
> > Thanks,
> > Kugan
> > 


Re: A Question About LRA/reload

2014-12-09 Thread lin zuojian
Thanks Vladimir & Jeff & Kugan. Combining the replies I get a better
view of RA problem.
--
Lin Zuojian
On Tue, Dec 09, 2014 at 12:10:29PM -0500, Vladimir Makarov wrote:
> On 12/09/2014 04:37 AM, lin zuojian wrote:
> > Hi,
> > I have read ira/lra code for a while, but still fails to understand
> > their relationship. The main question is why ira do color so early?
> > lra pass will do the assignment anyway. Sorry if I mess up coloring
> > and hard register assignment, but I think it's better to get job
> > done after lra elimiation, inheriation, ...
> >
>   There are two major approaches in RA implementation.  One is division
> of it on global and local passes and another one is to use iterative
> approach with the same coloring algorithm as it is described in most
> books and research articles.  Historically GCC used separate passes
> approach (even more complicated regmove, local RA, then global RA, and
> then reload).  Some industrial compilers to which I had access to the
> sources use this approach too (e.g. pathscale compiler uses global and
> local RA).
> 
>   I believe changing RA in GCC is very challenging task and only
> evolutionary approach could work.   Changing old global/local/regmove by
> IRA took 4-5 years.  Changing reload by LRA took 3 years and is still in
> progress. This is even more challenging task than IRA.  That is why we
> have what we have: IRA (global RA) and LRA (local RA).  The division
> also solves compilation time problem: one expensive global RA IRA (which
> builds conflict graph and dealing with irregular register file
> architectures by dynamically created register classes which is further
> development of Chaitin-Briggs coloring algorithm) and LRA which uses
> simplified coloring without building expensive conflict graph but making
> it several times (in average about 2-4 coloring passes for function). 
> On my estimation, using the same algorithm as in IRA iteratively would
> add 5-10% more to all GCC compilation time.  It would be hard to
> convince people to such approach as GCC compilation speed is a sensitive
> area (especially when we have a strong competitor of GCC as LLVM).  The
> articles about RA never say about RA cycling problem (although Peter
> Bergner who implemented an extension of Chaitin-Brigs allocator
> mentioned this to me).  This problem happens in LRA even with simpler
> coloring algorithm.  It would be worse if we used more complicated one
> from IRA.
>  
>   Saying all that, I would also add that using iteratively the same
> algorithm is appealing idea to me too.  I try to use this approach in
> YARA project (predecessor of IRA) in which I tried to remove all old GCC
> RA (including reload) at once but it was too slow and did not even
> generate the correct code in many cases even for x86.  Jeff Law tried
> IRA coloring reusage too for reload but whole RA became slower (although
> he achieved performance improvements on x86).  As I know, Preston Briggs
> (and his small team) from google tried to implement his iterative
> coloring algorithm only for x86/x86-64 but performance results were not
> better than IRA+reload that time.
> 
>   So many things in IRA/LRA have historical roots.  That is what we
> worked out.  Probably some approaches could work and used too if the
> same performance and compiler speed is achieved for major architectures
> and they generates correct code for all other architectures (supporting
> all of them by RA is complicated task by itself).  Personally, I like
> classical iterative Chaitin-Briggs allocator approach but I failed to
> implement this and probably will avoid to returning to its
> implementation but I encourage anyone to try other approaches and help
> to include them into GCC if the results will be promising.
> 
>   I hope I answer your question.
>  
> 


X86_64 insns combination is not working well

2014-03-02 Thread lin zuojian
Hi,
   I wrote a test code like this:
void foo(int * a)
{
a[0] = 0xfafafafb;
a[1] = 0xfafafafc;
a[2] = 0xfafafafe;
a[3] = 0xfafafaff;
a[4] = 0xfafafaf0;
a[5] = 0xfafafaf1;
a[6] = 0xfafafaf2;
a[7] = 0xfafafaf3;
a[8] = 0xfafafaf4;
a[9] = 0xfafafaf5;
a[10] = 0xfafafaf6;
a[11] = 0xfafafaf7;
a[12] = 0xfafafaf8;
a[13] = 0xfafafaf9;
a[14] = 0xfafafafa;
a[15] = 0xfafaf0fa;
}
that was what gcc generated:
movl$-84215045, (%rdi)
movl$-84215044, 4(%rdi)
movl$-84215042, 8(%rdi)
movl$-84215041, 12(%rdi)
movl$-84215056, 16(%rdi)
...
that was what LLVM/clang generated:
movabsq $-361700855600448773, %rax # imm = 0xFAFAFAFCFAFAFAFB
movq%rax, (%rdi)
movabsq $-361700842715546882, %rax # imm = 0xFAFAFAFFFAFAFAFE
movq%rax, 8(%rdi)
movabsq $-361700902845089040, %rax # imm = 0xFAFAFAF1FAFAFAF0
movq%rax, 16(%rdi)
movabsq $-361700894255154446, %rax # imm = 0xFAFAFAF3FAFAFAF2
...
I ran the code on my i7 machine for 100 times.Here was the result:
gcc:
real0m50.613s
user0m50.559s
sys 0m0.000s

LLVM/clang:
real0m32.036s
user0m32.001s
sys 0m0.000s

That mean movabsq did do a better job!
Should gcc peephole pass add such a combine?
--
Regards
lin zuojian



Vim format in gcc source?

2014-03-02 Thread lin zuojian
Hi guys,
How do I set the format of vim,so that my code doen't look alien?
--
Regards
lin zuojian


Re: Vim format in gcc source?

2014-03-03 Thread lin zuojian
Thx,Jonathan.
--
Regards
lin zuojian

On Mon, Mar 03, 2014 at 09:37:01AM +, Jonathan Wakely wrote:
> On 3 March 2014 07:00, lin zuojian wrote:
> > Hi guys,
> > How do I set the format of vim,so that my code doen't look alien?
> 
> Do you mean how do you set vim to match the GCC coding style?
> 
> It's not quite right, and it's mostly used for C++, but I use:
> 
> setl formatoptions=croql cindent cinoptions=:0,g0 
> comments=sr:/*,mb:*,el:*/,://
> setl cinoptions+=,{1s,>2s,n-1s
> setl noet


linux says it is a bug

2014-03-03 Thread lin zuojian
Hi,
in include/linux/compiler-gcc.h :

/* Optimization barrier */
/* The "volatile" is due to gcc bugs */
#define barrier() __asm__ __volatile__("": : :"memory")

The comment of Linux says this is a gcc bug.But will any sane compiler
disable optimization without "volatile" key word?

--
Regards
lin zuojian



Re: linux says it is a bug

2014-03-04 Thread lin zuojian
On Tue, Mar 04, 2014 at 12:08:19PM +0100, Richard Biener wrote:
> On Tue, Mar 4, 2014 at 10:33 AM, Hannes Frederic Sowa
>  wrote:
> > On Tue, Mar 04, 2014 at 09:26:31AM +, Andrew Haley wrote:
> >> On 03/04/2014 09:24 AM, Hannes Frederic Sowa wrote:
> >> >> > So the bug was probably fixed more than 15 years ago.
> >> > Probably :)
> >> >
> >> > But the __volatile__ shoud do no harm and shouldn't influence code
> >> > generation in any way, no?
> >>
> >> Of course it will: it's a barrier.
> >
> > Sure. My question was about the volatile marker. asm("":::"memory") should 
> > act
> > as the barrier alone.
> 
> __asm__("":::"memory")
> 
> is a memory barrier
> 
> volatile __asm__("":::"memory")
> 
> is a memory barrier and a barrier for other volatile instructions.
Hi Andrew,
What is volatile instructions?Can you give us an example?

--
Regards
lin zuojian


Please have a look at PR60438

2014-03-06 Thread lin zuojian
Hi,
I have found a crash when compiling,and have narrowed the bug scope
in the csa pass.But I am not quite familar with that.Please help.
Cause: 
   if (hasRelativeWidth || hasRelativeHeight)
 true   false
 /  \
/\ 
   /  \
 ......
call availableWidth()   call size() this function returns a 
structure,which matches "call_pop"  
.cfa_offset 96  .cfa_offset 92
 \   push edx (should has a REG_ARGS_SIZE 16 here,
  \  but csa eliminates it)
   \(inlined functions are expanded here)
\   /
 \ /
  \   /
  (should all be cfa_offset 96)
 add 16,%esp
As the cfg above described,both branches should be end with cfa offset
96,but one branch is end with 92.
That will case assertion failure in dwarf2cfi.c:2339,which asserts that
all the branches should have the same cfa offset when coming to the
joint.

--
Regards
lin zuojian


Re: Please have a look at PR60438

2014-03-07 Thread lin zuojian
Hi,
I have found the cause:

Okay let me sum it up:
at first the code looks like this
call xxx: .cfa 92
float ops
add sp 12 .cfa 80

And then split2 splits the float ops,then it looks like this
call xxx: .cfa 92
push edx
float ops2
add sp 4
...
add sp 12 .cfa 80

Note that the split code has a sp ops but no cfa notes.
And then cfa feels that's ugly,it changes the code to
call xxx : .cfa 92
push edx
float ops2
...
add sp 16 .cfa 80

And then jump2 finds another branch also has an "add sp 16 .cfa 80",so the 
combination has occurred:
call xxx :.cfa 92
push edx
float ops2
...
label jump_from_other_branch ( (hasRelativeWidth || hasRelativeHeight) == 
true )
add sp 16 .cfa 80


then dwarf2cfi.c will first find the "add sp 16 .cfa 80" row has cfa 92 
coming first,and then cfa 96.

Anybody has any comment to fix it?
--
Regards
lin zuojian


Re: linux says it is a bug

2014-03-10 Thread lin zuojian
On Wed, Mar 05, 2014 at 10:39:51AM +0400, Yury Gribov wrote:
> >What is volatile instructions? Can you give us an example?
> 
> Check volatile_insn_p. AFAIK there are two classes of volatile instructions:
> * volatile asm
> * unspec volatiles (target-specific instructions for e.g. protecting
> function prologues)
> 
> -Y
Thanks.


Scheduler:LLVM vs gcc, which is better

2014-03-10 Thread lin zuojian
Hi,
I read LLVM code for a while,and a question raise:Whose scheduler is
better?
LLVM brings in the DAG,and make it look important just like IR or
MachineInst.But is that necessary?I don't see what kind of problem
it tries to solve.
From the pipeline of the compiler, LLVM can not do sched2.Is that
suck?

--
Regards
lin zuojian.



Re: Scheduler:LLVM vs gcc, which is better

2014-03-10 Thread lin zuojian
On Mon, Mar 10, 2014 at 07:11:43PM -0700, Chandler Carruth wrote:
> On Mon, Mar 10, 2014 at 6:59 PM, lin zuojian  wrote:
> >
> > Hi,
> > I read LLVM code for a while,and a question raise:Whose scheduler is
> > better?
> > LLVM brings in the DAG,and make it look important just like IR or
> > MachineInst.But is that necessary?I don't see what kind of problem
> > it tries to solve.
> > From the pipeline of the compiler, LLVM can not do sched2.Is that
> > suck?
> 
> I clearly can't speak for GCC developers, but as an LLVM developer I
> have to say, this seems like a (somewhat rudely phrased) question for
> the LLVM mailing lists where there are people more familiar with the
> LLVM internals. Happy to reply in more depth there (or here if folks
> are actually interested).
Hi,
I just ask for opinions.I think many GCC developers do familiar with
the opponent.If I ask in the LLVM mailing list, I have to worry
about If they are familiar with GCC, too(what's sched2 pass?).
--
Regards 
lin zuojian 


Re: Scheduler:LLVM vs gcc, which is better

2014-03-10 Thread lin zuojian
Hi Chandler,
Thanks a lot for your answer.It is pretty misleading to find out
that DAG has schedule unit.
--
Regards
lin zuojian


Re: Scheduler:LLVM vs gcc, which is better

2014-03-12 Thread lin zuojian
On Tue, Mar 11, 2014 at 11:30:28AM +0800, lin zuojian wrote:
> Hi Chandler,
> Thanks a lot for your answer.It is pretty misleading to find out
> that DAG has schedule unit.
> --
> Regards
> lin zuojian

Hi Chandler,
I have looked into their "Machine Instr Scheduler", and find out
that LLVM have not yet enable them by default.And further test find
they are still not yet working.(e.g,-mtune=cortex-a9,a15,a53
generates the same code).

--
Regards
lin zuojian