question on arm soft-fp function __aeabi_d2uiz

2011-05-07 Thread Amker.Cheng
Hi,
I found in gcc/config/arm/ieee754-df.S, the function __aeabi_d2uiz
converts double into unsigned integer and the function always return 0
if the double value is negative. for example the following codes:
---sample codes--
unsigned long ul;
double d = -1.1;

int main(void)
{
ul = (unsigned long)d;
fprintf (stdout, "ul = 0x%X\n", ul);

return 0;
}
the output of __aeabi_d2uiz on arm soft-fpu is 0x0, resulting in
different behaviors between "(unsigned int)(int)d" and "(unsigned
int)d".

I also tried the code on x86-cygwin, which prints 0x.
I am wondering why __aeabi_d2uiz returns 0 for negative double values.
Is this behavior defined by arm fpu and it's different with x86 in fpu
implementation?
I have no arm fpu platform to verify this question and have know
little about float porints, So any clarification?

Thanks very much.
-- 
Best Regards.


question on find_if_case_2 in ifcvt.c

2011-09-08 Thread Amker.Cheng
Hi,
In ifcvt.c's function find_if_case_2, it uses cheap_bb_rtx_cost_p to
judge the conversion.

Function cheap_bb_rtx_cost_p checks whether the total insn_rtx_cost on
non-jump insns in
basic block BB is less than MAX_COST.

So the question is why uses cheap_bb_rtx_cost_p, even when we know the
ELSE is predicted,
which means there is benefit from this conversion anyway.
Second, should cheap_bb_rtx_cost_p be tuned as "checks whether the
total insn_rtx_cost on
non-jump insns in basic block BB is no larger than MAX_COST." to
prefer normal instructions
than branch even there have same costs.

Any suggestions? Thanks in advance.
-- 
Best Regards.


CFLAGS used in libgcc makefile?

2011-09-13 Thread Amker.Cheng
Hi guys,
Is it CFLAGS used by libgcc/Makefile.in to build libgcc.a?
It seems if I configure gcc with CFLAGS="-O0 -g " environment
variable, libgcc is also compiled with -O0 option.
I'm wondering why do not use CFLAGS_FOR_TARGET
here(CFLAGS->INTERNAL_CFLAGS->gcc_compile_bare->gcc_compile).

Please help, thanks.

-- 
Best Regards.


Question on _GLIBCXX_HOSTED macro libstdc++ and libsupc++

2011-09-23 Thread Amker.Cheng
Hi,

In libstdc++-v3/libsupc++/eh_term_handler.cc, it says by default the
demangler things are pulled in,
according to whether _GLIBCXX_HOSTED is defined. the demangler
exception terminating handler
are really big, especially for embedded system.

Secondly, _GLIBCXX_HOSTED is now defined if --enable-hosted-libstdcxx
is given(by default it is).
This option also controls whether libstdc++.a itself is built for target system.

So, for an embedded system, how could I provide the earlier "silent
death" handler by defining _GLIBCXX_HOSTED,
also with libstdc++ built?

Any suggestion? Thanks in advance.
FYI, all above are talking about cross-toolchain.

-- 
Best Regards.


Re: Question on _GLIBCXX_HOSTED macro libstdc++ and libsupc++

2011-09-23 Thread Amker.Cheng
> (Any reason this wasn't sent to the libstdc++ list?)
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43852 proposes a "quiet
> mode" which would reduce code size by disabling some of the code in
> eh_term_handler.cc and pure.cc - would that do what you want?
>
> I've not had time to do anything about it, but I think Sebastian
> (CC'd) has a copyright assignment in place now, and he's provided a
> patch implementing it.
>
Sorry for missing the list, cced now.

It is exactly what I meant, thanks very much.

-- 
Best Regards.


missing conditional propagation in cprop.c pass

2011-09-27 Thread Amker.Cheng
Hi,
I ran into a case and found conditional (const) propagation is
mishandled in cprop pass.
With following insn sequence after cprop1 pass:

(note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK)

(insn 882 881 883 96 (set (reg:CC 24 cc)
(compare:CC (reg:SI 684 [ default_num_contexts ])
(const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn}
 (nil))

(jump_insn 883 882 886 96 (set (pc)
(if_then_else (ne (reg:CC 24 cc)
(const_int 0 [0]))
(label_ref:SI 905)
(pc))) core_main.c:265 223 {*arm_cond_branch}
 (expr_list:REG_DEAD (reg:CC 24 cc)
(expr_list:REG_BR_PROB (const_int 9100 [0x238c])
(nil)))
 -> 905)

(note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK)

(insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ])
(reg:SI 684 [ default_num_contexts ])) core_main.c:265 709
{*thumb2_movsi_insn}
 (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ])
(expr_list:REG_EQUAL (const_int 0 [0])
(nil
..

(code_label 905 54 904 47 54 "" [1 uses])

(note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK)

(insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ])
(const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn}
 (nil))


The insn49 should be propagated with conditional const from insn882
and jump_insn883, optimized into "r291<-0" as following code, then let
pre do redundancy elimination work.

(note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK)

(insn 882 881 883 96 (set (reg:CC 24 cc)
(compare:CC (reg:SI 684 [ default_num_contexts ])
(const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn}
 (nil))

(jump_insn 883 882 886 96 (set (pc)
(if_then_else (ne (reg:CC 24 cc)
(const_int 0 [0]))
(label_ref:SI 905)
(pc))) core_main.c:265 223 {*arm_cond_branch}
 (expr_list:REG_DEAD (reg:CC 24 cc)
(expr_list:REG_BR_PROB (const_int 9100 [0x238c])
(nil)))
 -> 905)

(note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK)

(insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ])
(const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn}
 (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ])
(expr_list:REG_EQUAL (const_int 0 [0])
(nil
..

(code_label 905 54 904 47 54 "" [1 uses])

(note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK)

(insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ])
(const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn}
 (nil))


The problem is function one_cprop_pass does local const/copy
propagation pass first, then the global pass, which only handles
global opportunities.
Though conditional const information "r684 <- 0" is collected by
find_implicit_sets, the conditional information is recorded as local
information of bb 97, and it is not recorded in avout of bb 96, so not
in avin of bb 97 either.

Unfortunately, the global pass only considers potential opportunities
from avin of each basic block in function cprop_insn and
find_avail_set.

That's why the conditional propagation opportunity in bb 97 is missed.

I worked a patch to fix this, and wanna hear more suggestions on this topic.
Is it a bug or I missed something important?

Thanks

BTW, I'm using gcc mainline which configured for arm-none-eabi target.


Re: missing conditional propagation in cprop.c pass

2011-09-29 Thread Amker.Cheng
On Tue, Sep 27, 2011 at 4:19 PM, Amker.Cheng  wrote:
> Hi,
> I ran into a case and found conditional (const) propagation is
> mishandled in cprop pass.
> With following insn sequence after cprop1 pass:
> 
> (note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK)
>
> (insn 882 881 883 96 (set (reg:CC 24 cc)
>        (compare:CC (reg:SI 684 [ default_num_contexts ])
>            (const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn}
>     (nil))
>
> (jump_insn 883 882 886 96 (set (pc)
>        (if_then_else (ne (reg:CC 24 cc)
>                (const_int 0 [0]))
>            (label_ref:SI 905)
>            (pc))) core_main.c:265 223 {*arm_cond_branch}
>     (expr_list:REG_DEAD (reg:CC 24 cc)
>        (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
>            (nil)))
>  -> 905)
>
> (note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK)
>
> (insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ])
>        (reg:SI 684 [ default_num_contexts ])) core_main.c:265 709
> {*thumb2_movsi_insn}
>     (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ])
>        (expr_list:REG_EQUAL (const_int 0 [0])
>            (nil
> ..
>
> (code_label 905 54 904 47 54 "" [1 uses])
>
> (note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK)
>
> (insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ])
>        (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn}
>     (nil))
> 
>
> The insn49 should be propagated with conditional const from insn882
> and jump_insn883, optimized into "r291<-0" as following code, then let
> pre do redundancy elimination work.
> 
> (note 878 877 880 96 [bb 96] NOTE_INSN_BASIC_BLOCK)
>
> (insn 882 881 883 96 (set (reg:CC 24 cc)
>        (compare:CC (reg:SI 684 [ default_num_contexts ])
>            (const_int 0 [0]))) core_main.c:265 211 {*arm_cmpsi_insn}
>     (nil))
>
> (jump_insn 883 882 886 96 (set (pc)
>        (if_then_else (ne (reg:CC 24 cc)
>                (const_int 0 [0]))
>            (label_ref:SI 905)
>            (pc))) core_main.c:265 223 {*arm_cond_branch}
>     (expr_list:REG_DEAD (reg:CC 24 cc)
>        (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
>            (nil)))
>  -> 905)
>
> (note 886 883 49 97 [bb 97] NOTE_INSN_BASIC_BLOCK)
>
> (insn 49 886 0 97 (set (reg/v:SI 291 [ total_errors ])
>        (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn}
>     (expr_list:REG_DEAD (reg:SI 684 [ default_num_contexts ])
>        (expr_list:REG_EQUAL (const_int 0 [0])
>            (nil
> ..
>
> (code_label 905 54 904 47 54 "" [1 uses])
>
> (note 904 905 46 47 [bb 47] NOTE_INSN_BASIC_BLOCK)
>
> (insn 46 904 47 47 (set (reg/v:SI 291 [ total_errors ])
>        (const_int 0 [0])) core_main.c:265 709 {*thumb2_movsi_insn}
>     (nil))
> 
>
> The problem is function one_cprop_pass does local const/copy
> propagation pass first, then the global pass, which only handles
> global opportunities.
> Though conditional const information "r684 <- 0" is collected by
> find_implicit_sets, the conditional information is recorded as local
> information of bb 97, and it is not recorded in avout of bb 96, so not
> in avin of bb 97 either.
>
> Unfortunately, the global pass only considers potential opportunities
> from avin of each basic block in function cprop_insn and
> find_avail_set.
>
> That's why the conditional propagation opportunity in bb 97 is missed.
>
> I worked a patch to fix this, and wanna hear more suggestions on this topic.
> Is it a bug or I missed something important?
>
> Thanks
>
> BTW, I'm using gcc mainline which configured for arm-none-eabi target.
>

No Interest? Any tips will be great appreciated, thanks.

-- 
Best Regards.


Re: missing conditional propagation in cprop.c pass

2011-09-29 Thread Amker.Cheng
> Unless there's something arch specific related to arm, insn 882 is a
> compare, which won't change r684. Why do you think 0 should
> propagated to r291 if r684 is not zero?
>

Thanks for replying.
Sorry if I misunderstood anything below, and please correct me.

insn 882  : cc <- compare (r684, 0)
jump_insn 883 : if (cc != 0) goto insn 46
insn 49: r291 <- r684
..
insn 46

cc contains the result of subtracting 0 from r684;
control flow goes to insn_49 only if (cc == 0), which implies (r684 == 0).
Then at insn_49 we have conditional const propagation "r684 <- 0", is it right?

Thanks again.
-- 
Best Regards.


Re: missing conditional propagation in cprop.c pass

2011-09-29 Thread Amker.Cheng
>
> Nobody mentioned this so I might be way off but cc doesn't get (minus
> (reg r684) (const_int 0)). It gets the `condition codes` modification as
> a consequence of the subtraction.
>

Hi Paulo,
According to section "comparison operations" in internal:
"The comparison operators may be used to compare the condition codes (cc0)
against zero, as in (eq (cc0) (const_int 0)). Such a construct
actually refers to
the result of the preceding instruction in which the condition codes were set."

and the result of preceding instruction here is the result of the
(compare: r684, 0),
which according to the definition:
"
(compare:m x y)
Represents the result of subtracting y from x for purposes of comparison."

I'm not sure if I've misunderstood any thing and please comment.

Thanks very much.

-- 
Best Regards.


Re: missing conditional propagation in cprop.c pass

2011-09-29 Thread Amker.Cheng
>>
>> I believe, the optimization you may be referring to is value range
>> propagation which does predication of values based on predicates of
>> conditions. GCC definitely applies VRP at the tree stage, I am not
>> sure if there is an RTL pass to do the same.
> There are also RTL optimizers which perform this kind of constant
> propagation.  See cprop.c (in older versions of gcc this code was in
> gcse.c)
>
Hi Jeff,
This is exactly what I referred in the first message.
Though the cprop.c pass collected the implicit_set information, it is recorded
as local info of basic block, and cprop only does global propagation.
The result is such conditional const propagation opportunities is missed.

The whole process in cprop pass is like:

bb0 : if (x)
then
  bb1
else
  bb2
end

1, implicit_set from the preceding bb0 is tagged as local in bb1;
2, in compute_local_properties, the implicit_set is recorded in avloc[bb1];
3, in compute_cprop_available, the implicit_set is only recorded in avout[bb1],
not in avin[bb1], which it should be;
4, in cprop_insn and find_avail_set, only info recorded in avin[bb1]
is considered
when try to do propagation for bb1;

Well, I believe it is a small problem, since implicit_set is recorded
in avout[bb1],
The basic block bb1 is the only one get missed in propagation.

Don't know if I described the problem clearly and please comment.

Thanks very much.

-- 
Best Regards.


Re: missing conditional propagation in cprop.c pass

2011-10-10 Thread Amker.Cheng
Hi Jeff, Steven,

I have filed a bug at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50663
Could somebody confirm it?

I am studying this piece of codes and have spent some time on it,
I'm working on a patch and hoping could help on this issue,
Please help me review it later. Thanks.

-- 
Best Regards.


At which pass thing goes wrong for PR43491?

2011-11-25 Thread Amker.Cheng
Hi,
I looked into PR43491 a while and found in this case the gimple
generated before pre
is like:

reg.0_12 = reg
...
c()
reg.0_1 = reg
D.xxx = MEM[reg.0_1 + 8B]

The pre pass transforms it into:

reg.0_12 = reg
...
c()
reg.0_1 = reg.0_12
D.xxx = MEM[reg.0_1 + 8B]

>From now on, following passes(like copy_prop) can not transform it back and
resulting in an additional mov instruction as the bug reported.

The flow is like:
1, when rewriting gimple into ssa, reg is treated as a memory use;
2, seems pre noticed that reg is const and replace reg with reg.0_12,
by this pre thinks it has eliminated an additional memory load operation;
3, following passes do not transform it back either because reg is treated
as mem use or the const attribute is ignored.

I think pre does the right thing given the information it knows, so wondering
at which pass thing starts going wrong and how could this issue be handled?

Thanks very much
-- 
Best Regards.


Re: At which pass thing goes wrong for PR43491?

2011-12-01 Thread Amker.Cheng
On Sat, Nov 26, 2011 at 3:41 PM, Amker.Cheng  wrote:
> Hi,
> I looked into PR43491 a while and found in this case the gimple
> generated before pre
> is like:
>
> reg.0_12 = reg
> ...
> c()
> reg.0_1 = reg
> D.xxx = MEM[reg.0_1 + 8B]
>
> The pre pass transforms it into:
>
> reg.0_12 = reg
> ...
> c()
> reg.0_1 = reg.0_12
> D.xxx = MEM[reg.0_1 + 8B]
>
> From now on, following passes(like copy_prop) can not transform it back and
> resulting in an additional mov instruction as the bug reported.
>
> The flow is like:
> 1, when rewriting gimple into ssa, reg is treated as a memory use;
> 2, seems pre noticed that reg is const and replace reg with reg.0_12,
>    by this pre thinks it has eliminated an additional memory load operation;
> 3, following passes do not transform it back either because reg is treated
>    as mem use or the const attribute is ignored.
>
> I think pre does the right thing given the information it knows, so wondering
> at which pass thing starts going wrong and how could this issue be handled?
>

Should PRE be changed to global register variable aware, thus it does not
do the mentioned unnecessary elimination?


-- 
Best Regards.


Re: At which pass thing goes wrong for PR43491?

2011-12-06 Thread Amker.Cheng
On Thu, Dec 1, 2011 at 11:45 PM, Richard Guenther
 wrote:

> Well, it's not that easy if you still want to properly do redundant expression
> removal on global registers.

Yes, it might be complicate to make PRE fully aware of global register.
I also found comments in is_gimple_reg which says gcc does not do
much optimization with register variable at the tree level for now.

Back to this issue, I think it can be fixed by following way without hurting
redundancy elimination on global register variables:

After insert() being called in pre, in function eliminate() we can check for
single assignment statement from global register variable to ssa_name.
If it is the case, we can just skip the elimination operation.

In this way:
1, normal redundancy elimination on global registers will not be hurt,
since sccvn and pre has already detected the true elimination chances
and they will be eliminated afterward in function eliminate;
2, the inserted statements(including PHIs) for global register variables
will not be marked as NECESSARY in function eliminate and will be
deleted in remove_dead_inserted_code;

I attached an example which can illustrates that the normal redundancy does
get eliminated.
I will send a patch for review if it worth a discuss. So what do you think?

Thanks

-- 
Best Regards.
/* { dg-do compile } */ 
/* { dg-options "-O2 -fdump-tree-pre-stats" } */
register int data_0 asm("r4");
register int data_3 asm("r5"); 
int motion_test1(int data, int v)
{
	int i;
	int t, u;

	if (data)
		i = data_0 + data_3;
	else {
		v = 2;
		i = 5;
	}
	t = data_0 + data_3;
	u = i;
	return v * t * u;
}
/* We should eliminate one computation of data_0 + data_3 along the 
   main path.  We cannot re-associate v * t * u due to undefined
   signed overflow so we do not eliminate one computation of v * i along
   the main path. */
/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */
/* { dg-final { cleanup-tree-dump "pre" } } */


ssa-pre-2.c.093t.crited
Description: Binary data


ssa-pre-2.c.094t.pre.orig
Description: Binary data

;; Function motion_test1 (motion_test1, funcdef_no=0, decl_uid=4055, cgraph_uid=0)

Points-to analysis

Constraints:

ANYTHING = &ANYTHING
READONLY = &READONLY
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
data = &NONLOCAL
v = &NONLOCAL
*r4 = NONLOCAL
data_0.0_4 = *r4
*r5 = NONLOCAL
data_3.1_5 = *r5
i_6 = data_0.0_4
i_6 = data_3.1_5
v_1 = v
v_1 = &NONLOCAL
i_2 = i_6
i_2 = &NONLOCAL
data_0.0_10 = *r4
data_3.1_11 = *r5
t_12 = data_0.0_10
t_12 = data_3.1_11
D.4935_14 = v_1
D.4935_14 = t_12
D.4934_15 = i_2
D.4934_15 = D.4935_14
ESCAPED = D.4934_15

Collapsing static cycles and doing variable substitution
Building predecessor graph
Detecting pointer and location equivalences
Rewriting constraints and unifying variables
Uniting pointer but not location equivalent variables
Finding indirect cycles
Solving graph

Points-to sets

ANYTHING = { ANYTHING }
READONLY = { READONLY }
ESCAPED = { ESCAPED NONLOCAL }
NONLOCAL = { ESCAPED NONLOCAL } same as *r4
STOREDANYTHING = { }
INTEGER = { ANYTHING }
data = { NONLOCAL }
v = { NONLOCAL } same as data
data_0.0_4 = { ESCAPED NONLOCAL } same as *r4
*r4 = { ESCAPED NONLOCAL }
data_3.1_5 = { ESCAPED NONLOCAL } same as *r4
*r5 = { ESCAPED NONLOCAL } same as *r4
i_6 = { ESCAPED NONLOCAL } same as *r4
v_1 = { NONLOCAL } same as data
i_2 = { ESCAPED NONLOCAL } same as *r4
data_0.0_10 = { ESCAPED NONLOCAL } same as *r4
data_3.1_11 = { ESCAPED NONLOCAL } same as *r4
t_12 = { ESCAPED NONLOCAL } same as *r4
D.4935_14 = { ESCAPED NONLOCAL } same as *r4
D.4934_15 = { ESCAPED NONLOCAL } same as *r4


Alias information for motion_test1

Aliased symbols

.MEM, UID D.4937, void, is global, default def: .MEM_16(D)
data_0, UID D.4051, int, is global
data_3, UID D.4052, int, is global

Call clobber information

ESCAPED, points-to non-local, points-to vars: { }

Flow-insensitive points-to information


;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 5 3 4
;; 2 succs { 3 5 }
;; 5 succs { 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }
Could not find SSA_NAME representative for expression:{mult_expr,i_6,2}
Created SSA_NAME representative pretmp.5_17 for expression:{mult_expr,i_6,2}
Could not find SSA_NAME representative for expression:{mult_expr,i_6,v_7(D)}
Created SSA_NAME representative pretmp.5_18 for expression:{mult_expr,i_6,v_7(D)}


Symbols to be put in SSA form

{ .MEM }


Incremental SSA update started at block: 0

Number of blocks in CFG: 6
Number of blocks to update: 5 ( 83%)



motion_test1 (int data, int v)
{
  int prephitmp.6;
  int pretmp.5;
  int t;
  int i;
  int D.4935;
  int D.4934;
  int data_3.1;
  int data_0.0;

:
  if (data_3(D) != 0)
goto ;
  else
goto ;

:
  pretmp.5_19 = data_0;
  pretmp.5_21 = data_3;
  pretmp.5_23 = pretmp.5_1

question on behavior of tree-ssa-ccp

2011-12-15 Thread Amker.Cheng
HI,
I encountered a case with below codes:

int data_0;
int motion_test1(int data, int v)
{
int i;
int t, u;
int x;

if (data)
i = data_0 + x;
else {
v = 2;
i = 5;
}
t = data_0 + x;
u = i;
return v * t * u;
}
The dump file for 023t.ccp1 is like:

motion_test1 (int data, int v)
{
  int x;
  int t;
  int D.4723;
  int D.4722;
  int data_0.0;

:
  if (data_3(D) != 0)
goto ;
  else
goto ;

:
  v_8 = 2;

:
  # v_1 = PHI 
  data_0.0_10 = data_0;
  t_11 = data_0.0_10 + x_5(D);
  D.4723_13 = v_1 * t_11;
  D.4722_14 = D.4723_13 * 5;
  return D.4722_14;

}

Seems the result is computed as "v*(data_0+x)*5", which is wrong.
The question is whether it is a bug or intended behavior due to
unintialized "x"?

Any tips is welcome. Thanks.

-- 
Best Regards.


Re: question on behavior of tree-ssa-ccp

2011-12-15 Thread Amker.Cheng
Forgot the command line:
arm-none-eabi-gcc -O2 -mthumb -mcpu=cortex-m3 -S test.c -o test.S
-fdump-tree-all

gcc is comfigured as arm-non-eabi, but I think it's independent of target.

-- 
Best Regards.


RFC: Handle conditional expression in sccvn/fre/pre

2012-01-02 Thread Amker.Cheng
Hi,
Since SCCVN operates on SSA graph instead of the control flow graph
for the sake of efficiency,
it does not handle or value number the conditional expression of
GIMPLE_COND statement.
As a result, FRE/PRE does not simplify conditional expression, as
reported in bug 30997.

Since it would be complicate and difficult to process conditional
expression in currently SCCVN
algorithm, how about following method?

STEP1  Before starting FRE/PRE, we can factor out the conditional
expression, like change following
codes:

if (cond_expr)
  goto lable_a
else
  goto label_b

into codes:

tmp = cond_expr
if (tmp == 1)
  goto label_a
else
  goto label_b

STEP2  Let SCCVN/FRE/PRE do its job on value numbering cond_expr and
redundancy elimination;
STEP3  After FRE/PRE, for those "tmp=cond_expr" not used in any
redundancy elimination,
we can forward it to the corresponding GIMPLE_COND statement, just
like tree-ssa-forwprop.c.

In this way, the conditional expression will be handled as other
expressions and no
redundant assignment generated.
Most important,this does not affect the current implementation of SCCVN/FRE/PRE.

The only problem is the method cannot work on reversion of conditional
expression.
For example:

x = a > 2;
if (a<=2)
  goto label_a
else
  goto lable_b
could be optimized as:

x = a > 2
if (x == 0)
  goto label_a
else
  goto label_b

I have worked a draft patch to do the work and would like to hear your
comments on this.

Thanks very much.
-- 
Best Regards.


Re: RFC: Handle conditional expression in sccvn/fre/pre

2012-01-02 Thread Amker.Cheng
Thanks Richard,

On Mon, Jan 2, 2012 at 8:33 PM, Richard Guenther
 wrote:
>
> I've previously worked on changing GIMPLE_COND to no longer embed
> the comparison but carry a predicate SSA_NAME only (this is effectively
> what you do as pre-processing before SCCVN).  It had some non-trivial
> fallout (for example PRE get's quite confused and ends up separating
> conditionals and jumps too far ...) so I didn't finish it.
Here changing GIMPLE_COND to no longer embed the comparison,
do you mean this only in fre/pre passes or in common?
If only in fre/pre passes, when and how these changed GIMPLE_COND
be changed back to normal ones?
If in common, won't this affects passes working on GIMPLE_COND, like
tree-ssa-forwprop.c?

>
> A subset of all cases can be catched by simply looking up the
> N-ary at eliminate () time and re-writing the GIMPLE_COND to use
> the predicate - which might not actually be beneficial (but forwprop
> will undo not beneficial cases - hopefully).
>
> In the end I'd rather go the way changing the GIMPLE IL to not
> embed the comparison in the GIMPLE_COND - that reduces
> the amount of redundant way we can express the same thing.
Will you try to handle the reversion comparison case as mentioned
in my previous message? I guess this needs both sccvn and fre/pre's
work. It would be great to hear your thoughts on this.

Thanks

-- 
Best Regards.


Re: RFC: Handle conditional expression in sccvn/fre/pre

2012-01-02 Thread Amker.Cheng
On Mon, Jan 2, 2012 at 9:37 PM, Richard Guenther
 wrote:

> Well, with
>
> Index: gcc/tree-ssa-pre.c
> ===
> --- gcc/tree-ssa-pre.c  (revision 182784)
> +++ gcc/tree-ssa-pre.c  (working copy)
> @@ -4335,16 +4335,23 @@ eliminate (void)
>             available value-numbers.  */
>          else if (gimple_code (stmt) == GIMPLE_COND)
>            {
> -             tree op0 = gimple_cond_lhs (stmt);
> -             tree op1 = gimple_cond_rhs (stmt);
> +             tree op[2];
>              tree result;
> +             vn_nary_op_t nary;
>
> -             if (TREE_CODE (op0) == SSA_NAME)
> -               op0 = VN_INFO (op0)->valnum;
> -             if (TREE_CODE (op1) == SSA_NAME)
> -               op1 = VN_INFO (op1)->valnum;
> +             op[0] = gimple_cond_lhs (stmt);
> +             op[1] = gimple_cond_rhs (stmt);
> +             if (TREE_CODE (op[0]) == SSA_NAME)
> +               op[0] = VN_INFO (op[0])->valnum;
> +             if (TREE_CODE (op[1]) == SSA_NAME)
> +               op[1] = VN_INFO (op[1])->valnum;
>              result = fold_binary (gimple_cond_code (stmt), boolean_type_node,
> -                                   op0, op1);
> +                                   op[0], op[1]);
> +             if (!result)
> +               result = vn_nary_op_lookup_pieces (2, gimple_cond_code (stmt),
> +                                                  boolean_type_node,
> +                                                  op, &nary);
> +
>              if (result && TREE_CODE (result) == INTEGER_CST)
>                {
>                  if (integer_zerop (result))
> @@ -4354,6 +4361,13 @@ eliminate (void)
>                  update_stmt (stmt);
>                  todo = TODO_cleanup_cfg;
>                }
> +             else if (result && TREE_CODE (result) == SSA_NAME)
> +               {
> +                 gimple_cond_set_code (stmt, NE_EXPR);
> +                 gimple_cond_set_lhs (stmt, result);
> +                 gimple_cond_set_rhs (stmt, boolean_false_node);
> +                 update_stmt (stmt);
> +               }
>            }
>          /* Visit indirect calls and turn them into direct calls if
>             possible.  */
>
> you get the CSE (too simple patch, you need to check leaders properly).
> You can then add similar lookups for an inverted conditional.

Thanks for your explanation. On shortcoming of this method is that it
cannot find/take cond_expr(and the implicitly defined variable) as the
leader in pre. I guess this is why you said it can handle a subset of all
cases in previous message?

on the other hand, I like this method, given the simplicity especially. :)

-- 
Best Regards.


Re: RFC: Handle conditional expression in sccvn/fre/pre

2012-01-03 Thread Amker.Cheng
On Mon, Jan 2, 2012 at 10:54 PM, Richard Guenther
 wrote:

> Yes.  It won't handle
>
>  if (x > 1)
>   ...
>  tem = x > 1;
>
> or
>
>  if (x > 1)
>   ...
>  if (x > 1)
>
> though maybe we could teach PRE to do the insertion by properly
> putting x > 1 into EXP_GEN in compute_avail (but not into AVAIL_OUT).
> Not sure about this though.  Currently we don't do anything to
> GIMPLE_COND operands (which seems odd anyway, we should
> at least add the operands to EXP_GEN).

I did an experiment which shows by setting cond_expr in EXP_GEN
properly, PRE could insert expression in following case:

//necessary declaration of variable a/b/g
int tmp;
if (x_cond)
  tmp = a > 2;
else
  tmp = b;
if (a > 2)
  g = tmp;

But the problem you mention : "PRE separates conditional expression
and jump to far" still exists in this kind of cases.
Now I doubt the benefit to make PRE handle cond_expr, because in back
end, machines normally have only one flag to store the result.

And for other cases like:
if (a > 2)
...
if (a > 2)

Current logic of insertion(in do_regular_insertion) simply won't
insert expression before the first GIMPLE_COND statement, because it
only considers basic blocks have multiple predecessors and the
expression are partial redundant.
Anyway I think this can be done by implementing new insertion strategy
for GIMPLE_COND.

-- 
Best Regards.


question on inconsistent generated codes for builtin calls

2012-01-13 Thread Amker.Cheng
Hi,
I noticed gcc generates inconsistent codes for same function for builtin calls.

compile following program:
--
#include 
int a(float x) {
  return sqrtf(x);
}
int b(float x) {
  return sqrtf(x);
}

With command:
arm-none-eabi-gcc -mthumb -mhard-float -mfpu=fpv4-sp-d16
-mcpu=cortex-m4 -O0 -S a.c -o a.S

The generated assembly codes is like:
--
a:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
push{r7, lr}
sub sp, sp, #8
add r7, sp, #0
fstss0, [r7, #4]
fldss15, [r7, #4]
fsqrts  s15, s15
fcmps   s15, s15
fmstat
beq .L2
fldss0, [r7, #4]
bl  sqrtf
fcpys   s15, s0
.L2:
ftosizs s15, s15
fmrsr3, s15 @ int
mov r0, r3
add r7, r7, #8
mov sp, r7
pop {r7, pc}
.size   a, .-a
.align  2
.global b
.thumb
.thumb_func
.type   b, %function
b:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
push{r7, lr}
sub sp, sp, #8
add r7, sp, #0
fstss0, [r7, #4]
fldss0, [r7, #4]
bl  sqrtf
fcpys   s15, s0
ftosizs s15, s15
fmrsr3, s15 @ int
mov r0, r3
add r7, r7, #8
mov sp, r7
pop {r7, pc}
.size   b, .-b


The cause is in function expand_builtin, gcc checks following conditions:
--
  /* When not optimizing, generate calls to library functions for a certain
 set of builtins.  */
  if (!optimize
  && !called_as_built_in (fndecl)
  && DECL_ASSEMBLER_NAME_SET_P (fndecl)
  && fcode != BUILT_IN_ALLOCA
  && fcode != BUILT_IN_ALLOCA_WITH_ALIGN
  && fcode != BUILT_IN_FREE)
return expand_call (exp, target, ignore);

The control flow is:
1, DECL_ASSEMBLER_NAME_SET_P (fndecl) is false at the first time when
compiling a;
2, It is then set in following codes when expanding sqrtf call in function a;
3, When compiling function b, gcc checks DECL_ASSEMBLER_NAME_SET_P (fndecl)
 again and this time it's true;

I am a little confused why we check DECL_ASSEMBLER_NAME_SET_P here.
Does it have special meaning?

Thanks in advance.

-- 
Best Regards.


Re: question on inconsistent generated codes for builtin calls

2012-01-13 Thread Amker.Cheng
On Fri, Jan 13, 2012 at 5:33 PM, Richard Guenther
 wrote:
>
> No, I think the check is superfluous and should be removed.  I also wonder
> why we exempt BUILT_IN_FREE here ... can you dig in SVN history a bit?
> For both things?

Thanks for clarifying. I will look into it.

-- 
Best Regards.


Re: question on inconsistent generated codes for builtin calls

2012-01-15 Thread Amker.Cheng
On Fri, Jan 13, 2012 at 10:17 PM, Amker.Cheng  wrote:
> On Fri, Jan 13, 2012 at 5:33 PM, Richard Guenther
>  wrote:
>>
>> No, I think the check is superfluous and should be removed.  I also wonder
>> why we exempt BUILT_IN_FREE here ... can you dig in SVN history a bit?
>> For both things?

Hi Richard,
The BUILT_IN_FREE was introduced in r138362 fixing PR36970,
in which gcc did not give warning on freeing non-heap memory, as in program:

main ()
{
 char array[100];
 free (array);
}

I will run make check to see whether it's ok we do not check
DECL_ASSEMBLER_NAME_SET_P and send a patch then.

BTW, should I create a bug for this?

Thanks.


question on bitmap_set_subtract unction in pre

2012-02-05 Thread Amker.Cheng
Hi,
In PRE, function compute_antic_aux uses bitmap_set_subtract to compute
value/expression set subtraction.

The comment of bitmap_set_subtract says it subtracts all the values
and expressions contained in ORIG from DEST.

But the implementation as following
---
static bitmap_set_t
bitmap_set_subtract (bitmap_set_t dest, bitmap_set_t orig)
{
  bitmap_set_t result = bitmap_set_new ();
  bitmap_iterator bi;
  unsigned int i;

  bitmap_and_compl (&result->expressions, &dest->expressions,
&orig->expressions);

  FOR_EACH_EXPR_ID_IN_SET (result, i, bi)
{
  pre_expr expr = expression_for_id (i);
  unsigned int value_id = get_expr_value_id (expr);
  bitmap_set_bit (&result->values, value_id);
}

  return result;
}

Does it just subtract the expressions, rather than values. And It
resets values according to the resulting expression.

I am a little confused here. Any explanation?

Thanks very much.
-- 
Best Regards.


Re: question on bitmap_set_subtract unction in pre

2012-02-07 Thread Amker.Cheng
On Mon, Feb 6, 2012 at 7:28 PM, Richard Guenther
 wrote:
> It's probably to have the SET in some canonical form - the resulting
I am wondering how the canonical form is maintained, since according
to the paper:
For an antileader set, it does not matter which expression represents
a value, as long as that value is live.
Could you show me where is the code maintaining such attributes?

> values are simply re-computed from the expression subtraction
> (multiple expressions may have the same value, so in
> { a, b } { 0 } - { a } { 0 } you need to either compute { } { } or { b } { 0 }
> neither which you can reach by simply subtracting both bitmaps.
Take this example, Shouldn't the expected result be:
   {b}{0} if a is defined by some known expr;
   {} {} if a is defined by some unknown expr;
which not as in gcc now. Following words are from the paper:

A temporary potentially in ANTIC_IN becomes dead if it is assigned to.
For an antileader set, it does not matter which expression represents a value,
  so long as that value is live. A temporary potentially in ANTIC_IN becomes
  dead if it is assigned to. If the assignment is from something we can make an
  expression for (as opposed to ?), that expression replaces the
temporary as the
  antileader. If the assignment is from ?, then the value is no longer
represented
  at all. Furthermore, any other expression that has that (no longer
represented)
  value as an operand also becomes dead.

In the previous expression subtraction, I don't see value depending on
tmp which is defined by unknown operation like tmp<-? is handled.

Still confused and most likely I have missed something important.
Please help, thanks very much.

-- 
Best Regards.


Question about the difference between two instruction scheduling passes

2009-08-19 Thread Amker.Cheng
Hi all:
   I'm currently studying implementation of instruction sched in gcc.

it is possible to schedule insns directly from queue in case
there is nothing better to do and there are still vacant dispatch slots
in the current cycle.

Gcc only does this work in the second pass, but what's the point?
Is it wrong or just not necessary  in the first sched pass?

Thanks.
-- 
Best Regards.


Is Non-Blocking cache supported in GCC?

2009-09-17 Thread Amker.Cheng
Hi all:
Recently I found two relative old papers about non-blocking cache,
etc. which are :

1) Reducing memory latency via non-blocking and prefetching
caches.  BY Tien-Fu Chen and Jean-Loup Baer.
2) Data Prefetching:A Cost/Performance Analysis   BY Chris Metcalf

It seems the hardware facility does have the potential to improve the
performance with
compiler's assistance(especially instruction scheduling). while on the
other hand, lifting ahead
load instructions may resulting in increasing register pressure.

So I'm thinking :
1, Has anyone from gcc folks done any investigation on this topic yet,
or any statistic data based on gcc available?
2, Does GCC(in any release version) supports it in any targets(such as
mips 24ke) with this hardware feature?
If not currently, does it possible to support it by using target
definition macros and functions?

Any tips will be highly appreciated, thanks.
-- 
Best Regards.


Re: Is Non-Blocking cache supported in GCC?

2009-09-18 Thread Amker.Cheng
On Sat, Sep 19, 2009 at 1:17 AM, Janis Johnson  wrote:
> On Thu, 2009-09-17 at 21:48 -0700, Ian Lance Taylor wrote:
>
> There's also a prefetch built-in function; see
>
> http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins
>
> It's been in GCC since 3.1.
>
> Janis
>
>
Thank you all, It seems prefetch is more useful than non-blocking, no wonder gcc
takes advantage of prefetch, rather than non-blocking.

-- 
Best Regards.


question about speculative scheduling in gcc

2009-09-19 Thread Amker.Cheng
Hi :
I'm puzzled when looking into speculative scheduling in gcc, the 4.2.4 version.

First, I noticed the document describing IBM haifa instruction
scheduler(as PowerPC Reference Compiler Optimization Project).

It presents that the instruction motion from bb s(dominated by t)
to t is speculative when split_blocks(s, t) not empty.

Second, There is SCED_FLAGS like DO_SPECULATION in codes.

Here goes questions.
1, Does the DO_SPECULATION flag constrol whether do the
mentioned speculative motion or not?
2, For mips target, which has the DO_SPECULATION bit cleared,
gcc still does speculative motion when scheduling(first pass),
so it seems the answer of question 1 is negative, but then
what the DO_SPECULATION flag for?

I must have missed something important, Please help out.
Thanks

-- 
Best Regards.


Re: question about speculative scheduling in gcc

2009-09-20 Thread Amker.Cheng
On Sun, Sep 20, 2009 at 3:43 PM, Maxim Kuvyrkov  wrote:
> Amker.Cheng wrote:
>>
>> Hi :
>> I'm puzzled when looking into speculative scheduling in gcc, the 4.2.4
>> version.
>>
>> First, I noticed the document describing IBM haifa instruction
>> scheduler(as PowerPC Reference Compiler Optimization Project).
>>
>> It presents that the instruction motion from bb s(dominated by t)
>> to t is speculative when split_blocks(s, t) not empty.
>>
>> Second, There is SCED_FLAGS like DO_SPECULATION in codes.
>
> These are two different types of speculative optimizations.
>
>>
>> Here goes questions.
>> 1, Does the DO_SPECULATION flag constrol whether do the
>>    mentioned speculative motion or not?
>
> DO_SPECULATION flag controls generation of IA64 data and control speculative
> instructions.  It is not used on other architectures.
>
> Speculative instruction moves from the split blocks are controlled by
> flag_schedule_speculative.
>
> --
> Maxim
>

Yes! I've just found it's used for IA64 and was merged into gcc in
version 4.2.0.

Thanks.


-- 
Best Regards.


what does the calling for min_insn_conflict_delay mean

2009-09-20 Thread Amker.Cheng
Hi :
   In function new_ready, it calls to min_insn_conflict_delay with
"min_insn_conflict_delay (curr_state, next, next)".
But the function's comments say that it returns minimal delay of issue of
the 2nd insn after issuing the 1st in given state.
Why the last two parameter for the call are both "next"?
seems conflict with the comments.

Thanks.

-- 
Best Regards.


Re: what does the calling for min_insn_conflict_delay mean

2009-09-23 Thread Amker.Cheng
On Tue, Sep 22, 2009 at 11:50 PM, Vladimir Makarov  wrote:
> Ian Lance Taylor wrote:
>>
>> "Amker.Cheng"  writes:
>>
>>
>>>
>>>   In function new_ready, it calls to min_insn_conflict_delay with
>>> "min_insn_conflict_delay (curr_state, next, next)".
>>> But the function's comments say that it returns minimal delay of issue of
>>> the 2nd insn after issuing the 1st in given state.
>>> Why the last two parameter for the call are both "next"?
>>> seems conflict with the comments.
>>>
>>
>>
>
> Amker, thanks for finding this issue.
It's great pleasure if can help anything.

>>
>> This change dates back to the first DFA scheduler patch.  It does seem a
>> little odd, particularly as the call in new_ready is the only use of
>> min_insn_conflict_delay.  CC'ing vmakarov in case he remembers anything
>> about this old code.
>>
>
> I've not remembered this.  I guess  it was a result of long period of
> transition from the old pipeline hazard recognizier to the DFA one which
> required to rewrite all old pipeline descriptions.
>
> Also after starring at this code for some time,  I don't like this code.
>  Now I'd use min_issue_delay (curr_state, next) which is delay of  issuing
>  next in the current function unit reservation state instead of
>  min_insn_conflict_delay (curr_state, next, next) which is a delay of
> issuing the first insn (next) after issuing the second insn (next) on a free
> processor (when all function units are free).  Probably it was a typo.
>  Although I think that such change (in many other conditions to move insn
> speculatively to the ready list) will not give a visible improvement for
> most processors, I'll try it.
>
> It looks to me that probably I had also some plans for usage of
> min_insn_conflict_delay, but I forgot them because it was long ago.
>
>

Is it the delay of issuing next in the current reservation state which
expected here?

seems the call to min_insn_conflict_delay does nothing harm, except
may result in
more or less speculative motions(which are all valid ones).

-- 
Best Regards.


Problem when computing memory dependencies for scheduling pass1

2009-09-28 Thread Amker.Cheng
Hi all:
   I have found something strange when scheduling instructions.
considering following piece of code:
-c start
int func(float x)
{
  int r = 0;
  r = (*(unsigned int*)&x) >> 23;
  return r;
}
-c end

the return value is different when compiling with or without optimization.
Have tested on 4.2.4 and 4.3.3 on mips, 4.3.2(ubuntu) on x86 and
results are the same.
Is this a bug, or something wrong with the example code?

following is output for mips target, hope it can help.

commands:
mipsel-elf-gcc -march=mips1 -EL -G0 -mabi=32 -S test/dummy.c -o
test/dummy.S -fdump-rtl-all -fsched-verbose=9 -v -O2


the as output is :
-as start
.section .mdebug.abi32
.previous
.text
.align  2
.globl  func
.entfunc
func:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.setnomacro

lw  $2,0($sp)
sw  $4,0($sp)
j   $31
srl $2,$2,23

.setmacro
.setreorder
.endfunc
.size   func, .-func
.ident  "GCC: (GNU) 4.2.4"
--as end
it seems the load insn is scheduled before store, which resulting in
using of uninitialized data.

following is the dumped rtls before and after sched1:

before sched1:
---

(note 8 2 6 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn 6 8 7 2 (set (mem/c/i:SF (reg/f:SI 77 $arg) [3 x+0 S4 A32])
(reg:SF 4 $4 [ x ])) 233 {*movsf_softfloat} (nil)
(expr_list:REG_DEAD (reg:SF 4 $4 [ x ])
(nil)))

(note 7 6 12 2 NOTE_INSN_FUNCTION_BEG)

(insn 12 7 13 2 (set (reg:SI 196)
(mem:SI (reg/f:SI 77 $arg) [4 S4 A32])) 213 {*movsi_internal} (nil)
(nil))
-

after sched1:
---
(insn 12 17 6 2 (set (reg:SI 196)
(mem:SI (reg/f:SI 77 $arg) [4 S4 A32])) 213 {*movsi_internal} (nil)
(nil))

(insn 6 12 20 2 (set (mem/c/i:SF (reg/f:SI 77 $arg) [3 x+0 S4 A32])
(reg:SF 4 $4 [ x ])) 233 {*movsf_softfloat} (nil)
(expr_list:REG_DEAD (reg:SF 4 $4 [ x ])
(nil)))
-

have checked gcc's source, It seems the two mem operands in insn12 and insn6
is computed into two different alias sets, so maybe this problem have
something to do with
the force type cast?

The result is very strange, so please help and any comments will be
highly appreciated.

-- 
Best Regards.


Re: Problem when computing memory dependencies for scheduling pass1

2009-09-28 Thread Amker.Cheng
Thanks Eric Fisher, got the answer, Please ignore this message.

-- 
Best Regards.


Puzzles about implementation of bb-reorder pass

2009-10-28 Thread Amker.Cheng
Hi :
The bb-reorder pass is relative simple comparing with others, but still
I got following puzzles.
1 : the comment at top of the bb-reorder.c file says that :

   There are two parameters: Branch Threshold and Exec Threshold.
   If the edge to a successor of the actual basic block is lower than
   Branch Threshold or the frequency of the successor is lower than
   Exec Threshold the successor will be the seed in one of the next rounds.

  but when computing which_heap in function "find_traces_1_round", it uses
  push_to_next_round_p to decide whether the successor should go to next round,
  which takes only exec_th as argument, not branch_th.

  Is this  inconsistent ?

2 : when checking for situation :

 A
   /  |
  B  |
   \  |
 C
 gcc uses the condition
EDGE_FREQUENCY (AB) + EDGE_FREQUENCY (BC)
>= EDGE_FREQUENCY (AC).
 what does "EDGE_FREQUENCY (AB) + EDGE_FREQUENCY (BC)"
 stand for? Since edge B is dominated by A and C is the only successor,
 the frequency of path(ABC) is less than path(AC), I think.

3 : It is possible to merge two traces by copying exactly one basic block.
 gcc uses following code to take trace which has only one bb into consider:

if (bbd[e->dest->index].start_of_trace >= 0
&& traces[bbd[e->dest->index].start_of_trace].length
   == 1)
  {
best = e;
try_copy = true;
continue;
  }
Here is the question, what about that trace has already been merged and has
no free successor traces(traces which start bb is the successor of
the single bb).
in this situation next_bb is NULL and all we did is just copy a
already merged bb.
Is this right?

Please correct me and help me out, Thanks.

-- 
Best Regards.


mis-set value for trial in function fill_simple_delay_slots?

2009-11-22 Thread Amker.Cheng
Hi :

In function fill_simple_delay_slots, there is following codes:
>starts here
  /* If there are slots left to fill and our search was stopped by an
 unconditional branch, try the insn at the branch target.  We can
 redirect the branch if it works.

 Don't do this if the insn at the branch target is a branch.  */
  if (slots_to_fill != slots_filled
  && trial
  && JUMP_P (trial)
  && simplejump_p (trial)
  && (target == 0 || JUMP_LABEL (trial) == target)
  && ...)


Question about filling multi delay slots

2009-11-25 Thread Amker.Cheng
Hi All :
   It's possible to define multi delay slots for branch insns by using
define_delay,
and different slot should satisfy its own attribute test "delay-n".

   Here comes question, in function "fill_simple_delay_slots", seems
it only uses
slots_filled to record how many slots needs to fill, and puts slot
insns already found
in delay_list. I can't find any codes keeping the information about
which insn in
delay_list belongs to which slot(defined in "define_delay"). So, how does gcc
make sure that insns in delay_list go into right delay slot?

Thanks in advance.
-- 
Best Regards.


Re: Question about filling multi delay slots

2009-12-01 Thread Amker.Cheng
On Tue, Dec 1, 2009 at 5:31 AM, Jeff Law  wrote:
> On 11/25/09 07:34, Amker.Cheng wrote:

>
> First, it's worth noting very few targets support multiple delay slots and
> as a result that code isn't tested nearly as well as handling of single
> delay slots.
>
> I'm pretty sure we assume that the first insn we add to the delay list
> always goes in the first slot, 2nd insn in the 2nd slot and so-on.
>
> Jeff
>
>
>
Thanks for explanation, I will take closer look into at these codes.


-- 
Best Regards.


question about replace_in_call_usage in regmove.c

2010-01-01 Thread Amker.Cheng
Hi :
  In regmove.c there is function "replace_in_call_usage" called in
fixup_match_1,
It replaces dst register by src in call_insn, I suspect whether it is necessary
Since comment of CALL_INSN_FUNCTION_USAGE says that no pseudo register
can appear in it and seems src is pseudo register.

further more, no replace(dst->src) is done when building bootstrap
gcc-4.2.4, which
confirmed my understanding.

Is it right or I've missed something important? Please help.

Thanks in advance.

-- 
Best Regards.


Puzzle about mips pipeline description

2010-03-08 Thread Amker.Cheng
Hi All:
  In gcc internal, section 16.19.8, there is a rule about
"define_insn_reservation" like:
"`condition` defines what RTL insns are described by this
construction. You should re-
member that you will be in trouble if `condition` for two or more
different `define_insn_
reservation` constructors if TRUE for an insn".

  While in mips.md, pipeline description for each processor are
included along with
generic.md, which providing a fallback for processor without specific
pipeline description.

  Here is the PUZZLE: Won't `define_insn_reservation` constructors
from both specific
processor's and the generic md file break the rule mentioned before?
For example, It seems
conditions for the r3k_load(from 3000.md) and generic_load(from
generic.md) are both TRUE
for lw insn.

Further more,
In those md files for specific processors, It is said that these
description are supposed to
override parts of generic md file, but i don't know how it works
without reading codes in
genautomata.c.

Please help me out, Thanks very much.

-- 
Best Regards.


Question on mips multiply patterns in md file

2010-03-15 Thread Amker.Cheng
Hi :
  I am studying multiplication-accumulate patterns for mips
 and noticed there are some changes when IRA was merged.

  There are two pattern which confused me, as :

1:  In pattern "*mul_acc_si", there's constraint like "*?*?".
what does this supposed to do?
I could not connect "*?" with document on constraints
 in gcc internal document, and totally have no idea about it.


2:  there is a split pattern for "*mul_acc_si" as following:

(define_split
  [(set (match_operand:SI 0 "d_operand")
(plus:SI (mult:SI (match_operand:SI 1 "d_operand")
  (match_operand:SI 2 "d_operand"))
 (match_operand:SI 3 "d_operand")))
   (clobber (match_operand:SI 4 "lo_operand"))
   (clobber (match_operand:SI 5 "d_operand"))]
  "reload_completed"
  [(parallel [(set (match_dup 5)
   (mult:SI (match_dup 1) (match_dup 2)))
  (clobber (match_dup 4))])
   (set (match_dup 0) (plus:SI (match_dup 5) (match_dup 3)))]
  "")

this will generate integer multiply instruction with register write,
but what if the processor has only integer multiply instructions,
which only store results in HILO?

So, any tips? Thanks a lot.

-- 
Best Regards.


Re: Question on mips multiply patterns in md file

2010-03-16 Thread Amker.Cheng
> If you don't know anything about register class preferencing or reload as
> yet, then this is probably not going to make much sense to you, but it isn't
> anything important you need to worry about at this point.  It is a very
> minor performance optimization.
>
It makes sense to me now, though I haven't read codes for IRA and reloads yet.
Thanks for the detailed explanation.
>
> A define_split can only match something generated by a define_insn, and the
> mul_acc_si define_insn is testing "GENERATE_MADD_MSUB && !TARGET_MIPS16"
> so there is no serious problem.  We are just running a define_split that can
> never match anything.  This could be cleaned up a little by adding an
> appropriate condition to the define_split, or by combining the define_insn
> and define_split patterns into a define_insn_and_split pattern.

In upper words, you mean that define_split would only get chance to
split insn generated
by the corresponding pattern "define_insn \"*mul_acc_si\"", though the
split condition is
some kind of weak(with only "reload_completed"). Because that kind of
insn would only
be generated by the "define_insn \"*mul_acc_si\"" pattern.
Did I get it right? if so, i'm afraid this is actually not my question.

What wanna know is:
mips processors normally implement following kinds of mult/mult-acc insns:
mult: HILO <-- s * t
mul : HILO <-- s * t ; d <-- LO
madd  : HILO <-- HILO + s * t
madd2: HILO <-- HILO + s * t ;  d <-- HILO
cut here-
In my understanding, the macro GENERATE_MADD_MSUB is true when the processor has
madd insn, rather than madd2. And the macro "ISA_HAS_MUL3" is false if it has
no mul insn.

for this kind processor, gcc will
step 1 : generate insn using gen_mul3_internal, according to
pattern "mul3";
step 2 : the combiner try to combine by matching against pattern "*mul_acc_si";
step 3 : it's possible that gcc fail to get LO register allocated for
the combined "*mul_acc_si" insn;
step 4 : after reload, the combined insn will be split according to
the split pattern listed in previous mail.
step 5 : the split insn is actually a "mul3_internal" , but get
no LO allocated, which break the
constraints in "mul3_internal" pattern;

So, what should I do to handle this case? I see no methods except
adding new split pattern like:

(define_split
 [(set (match_operand:SI 0 "d_operand")
   (plus:SI (mult:SI (match_operand:SI 1 "d_operand")
 (match_operand:SI 2 "d_operand"))
(match_operand:SI 3 "d_operand")))
  (clobber (match_operand:SI 4 "lo_operand"))
  (clobber (match_operand:SI 5 "d_operand"))]
 "SPECIAL_PROCESSOR && reload_completed"
 [(parallel [(set (match_dup 4)
  (mult:SI (match_dup 1) (match_dup 2)))
 (clobber (match_dup 4))])
  (set (match_dup 5) (match_dup 4))
  (set (match_dup 0) (plus:SI (match_dup 5) (match_dup 3)))]
 "")

Thanks again, looking forward your new explanations.


-- 
Best Regards.


Re: Question on mips multiply patterns in md file

2010-03-18 Thread Amker.Cheng
> The reasoning here is
> that if splitting will result in worse code, then we shouldn't have
> accepted it in the first place.  If dropping this alternative results in
> register allocator failures for some strange reason, then we accept it
> and generate the 3 instruction sequence with a new define_split.

Thanks Jim.

I could not get your method well since don't know much about
the IRA and reload pass. Here comes the question,
Does it possible that the method would ever result in register
allocator failure?
In my understanding, doesn't reload pass would do whatever it can to make
all insns' constraints satisfied?


> If dropping this alternative results in the register allocator generating
> worse code for other surrounding operations, then it is better to accept
> it and add the new define_split.

By this , you mean I should go back to the define_split method if dropping
the alternative does results in bad insns generated by RA?


>
> Some experimentation might be necessary to see which change is the
> better solution.
Yes, I profiled MiBench and found gcc generates better codes by using
madd instruction; on the other hand, how bad the code is generated by
define_split still not closely checked.

Another thought on this topic, Maybe I should make two copy of mul_acc_si
like you said, with one remains the constraints, the other drop the "*?*?".
Does this is the same as your method suggested?


-- 
Best Regards.


Puzzle about CFG on rtl during delay slot schedule

2010-04-02 Thread Amker.Cheng
Hi :
   I'm wondering whether cfg is maintained properly during delay slot
scheduling,
Because when compiling libgcc/_divsc3.o, rtl dump in
libgcc2.c.198r.mach has following lines:

no bb for insn with uid = 293.
deleting insn with uid = 690.
deleting insn with uid = 904.
..

(note 298 905 303 [bb 25] NOTE_INSN_BASIC_BLOCK)

(note 303 298 304 [bb 26] NOTE_INSN_BASIC_BLOCK)
-cut here

after that pass, bb 25 still has il.rtl->head_ == insn_uid_690, which
has already deleted.
Seems the bb's head_/tail_ are not handled properly.

I traced cc1 and found it deleted insn_690 by function remove_insn,
It seems that the end the function takes BB_HEAD/BB_END into
consider, But the BLOCK_FOR_INSN(insn_690) is null, which results in
the problem.

BTW, the version working on is gcc-4.4.1, mips target.
So, any tips? Thanks very much.

-- 
Best Regards.


Fwd: Puzzle about CFG on rtl during delay slot schedule

2010-04-02 Thread Amker.Cheng
> The CFG is not maintained during delay slot scheduling. This is, in
> fact, a very old and well-known problem. Look for any e-mail on this
> list that mentions reorg.c :-)
>
Thanks, further more , It seems cfg are not maintained after delay
slot scheduling.
also find that problem just before final pass.


-- 
Best Regards.


Re: Puzzle about CFG on rtl during delay slot schedule

2010-04-03 Thread Amker.Cheng
> Cheng, can you explain what lead you to this "discovery", and what
> you're trying to achieve?

Thanks for all your enthusiastic explanation.
Well, we are now trying to find our processor's critical timing path
by running it at higher frequency than it was designed for.
One timing prob we found is in following insn sequence :
insn1 : insn_kind_a
insn2 : memory access

So, in order to find more timing prob,  we wanna modify gcc
to insert nop insn between that two insns.
unfortunately, insn1 could be in delay slot, I have to do that
job after delay slot scheduling, which results in the first
message.

BTW, the processor has no pipeline stall when branching,
so I think the nop is totally necessary for our sake.

Thanks again.

-- 
Best Regards.


Problem on handling fall-through edge in bb-reorder

2010-04-05 Thread Amker.Cheng
Hi All:
  I read codes in bb-reorder pass. normally it's fine to take the most
probable basic block as the downward bb.
unfortunately, the processor I'm working on is a little different.
It has no pipeline stall when branches are taken, but does introduce
stall when they are not taken.

take an example code like:
--
statement 0;
if likely(condition)
statement 1;
else
statement 2;

return;

gcc may generate :
---
  statement 0;
  if !(condition) branch to label x;
  statement 1;
  return;
label x:
  statement 2;
  return;

Which is less effective on my processor. I am wondering whether possible
to modify codes in bb-reorder making gcc to take the less probable basic
block as the downward bb?
So any tips? Thanks in advance.

-- 
Best Regards.


why mult generated for unsigned int multiply on mips?

2010-04-06 Thread Amker.Cheng
Hi :
  I noticed that on mips, the signed form instruction of multiply is
generated for
unsigned integer multiply operation.
for example, mult is used, rather than multu for following codes:

unsigned int x, y, z;
x = y * z;

Is it reasonable to do so? Thanks.
-- 
Best Regards.


Re: why mult generated for unsigned int multiply on mips?

2010-04-07 Thread Amker.Cheng
found the cause, sorry to disturb, please ignore this message.

-- 
Best Regards.


Re: why mult generated for unsigned int multiply on mips?

2010-04-08 Thread Amker.Cheng
> It would, however, be nice if you actually posted an answer to your
> (now solved) question. That way, any casual reader may learn something
> new.
>
Sorry for the unintentional offense, here comes the method:
for 2's complement binary number x31x30...x0,
unsigned value U = 2^(31)*x31 + 2^(30)*x30 + ... + 2^(0)*x0
signed value S = - 2^(31)*x31 + 2^(30)*x30 + ... + 2^(0)*x0
say V =  2^(30)*x30 + ... + 2^(0)*x0, and s = x31
so,  S = U - 2^(32)*s.

now think about two number U1, U2, the corresponding signed value are S1, S2.
S1 * S2 = (U1-2^32 *s1 ) * (U2-2^32 *s2)
 =  U1*U2 - 2^32*s2*U1 - 2^32*s1*U2 + 2^64*s1*s2
It's easy to prove that the lower 32 bit of S1*S2 is determined by the
lower part of U1*U2.

Maybe this is the reason gcc can safely use mult for unsigned
multiplication for mips.

Hope this is right and it's hard to edit equations in plain text -_-

-- 
Best Regards.


Puzzle:where does gcc_cv_as come from?

2009-03-02 Thread Amker.Cheng
Hi all:
  Currently I'm building cross gcc for mips32 on winXp+cygwin.
I tried both gcc 4.2.4 and 4.2.3 and there is a building problem with 4.2.4

gcc makefile normally issue shell command "echo 'exec
$(ORIGINAL_AS_FOR_TARGET) "$$@"' >> as ; \"
at around line 1370, but  ORIGINAL_AS_FOR_TARGET defined several lines
above is empty.
so I got some kind like "exec -options..." which should be "exec
assembler -options...",
of course this will fail.

I checked configure and found following codes and comments:


# ---
# Assembler & linker features
# ---

# Identify the assembler which will work hand-in-glove with the newly
# built GCC, so that we can examine its features.  This is the assembler
# which will be driven by the driver program.
#
# If build != host, and we aren't building gas in-tree, we identify a
# build->target assembler and hope that it will have the same features
# as the host->target assembler we'll be using.
gcc_cv_gas_major_version=
gcc_cv_gas_minor_version=
gcc_cv_as_gas_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/gas


if test "${gcc_cv_as+set}" = set; then
  :
else
  #other commands...
fi

ORIGINAL_AS_FOR_TARGET=$gcc_cv_as


Help: does define_peephole still work in gcc-4.2.4

2009-05-11 Thread Amker.Cheng
Hi all:
 Currently I am studying peephole optimization in gcc.
I defined a peephole using "define_peephole", but nothing happened.
It seems gcc does do the pattern match work in codes surrounded by
"HAVE_peephole",
but codes from "out-template" in that "define_peephole" are not
compiled into gcc at all.

I know "define_peephole" is deprecated, so not sure about whether
define_peephole still works in gcc-4.2.4,
or I just missed something important?

Thanks, any tips will be appreciated!
--
Best Regards.


Re: Help: does define_peephole still work in gcc-4.2.4

2009-05-11 Thread Amker.Cheng
It turns out there is a mistake in "out-template" of "define_peephole".
So, Sorry for disturbing!



-- 
Best Regards.


pattern "s_" not used when generating rtl for float comparison on mips?

2010-04-27 Thread Amker.Cheng
Hi :
There is a pattern "define_insn "s_"" in mips md file, like

(define_insn "s_"
  [(set (match_operand:CC 0 "register_operand" "=z")
(swapped_fcond:CC (match_operand:SCALARF 1 "register_operand" "f")
  (match_operand:SCALARF 2 "register_operand" "f")))]
  ""
  "c..\t%Z0%2,%1"
  [(set_attr "type" "fcmp")
   (set_attr "mode" "FPSW")])

I am wondering whether this insn pattern would ever be used when generating
float comparison, Since we use cmp and branch expand to do the job
And comparison operation are normally followed by a branch.
Am i right?
Any idea? Thanks for helping.

-- 
Best Regards.


Re: pattern "s_" not used when generating rtl for float comparison on mips?

2010-04-27 Thread Amker.Cheng
>
> You can get the RTL for these patterns when expanding stores like
>
>   a = (b < c);
>
> In this case, GCC tries to avoid a conditional branch and (I suppose you are
> on GCC <4.5) instead of cmp and b you go through cmp and
> s.  cmp does nothing but stashing away its operands, while
> s expands RTL for both the comparison and the above insn.

Thanks, and yes, I'm using GCC 4.4,
But gcc didn't work in this way for me, I tried piece of code like:

extern float a, b;
extern int c;
int main(void)
{
  c = (a < b);
  return 0;
}

after tracing cc1, found gcc would also do it with set/compare/jump/set code at
the end of function do_store_flag, i.e., unsing cmp and b sequence.

-- 
Best Regards.


Re: pattern "s_" not used when generating rtl for float comparison on mips?

2010-04-29 Thread Amker.Cheng
> Indeed, looking at GCC 4.5 there's no cstore expander for floating-point
> variables.  Maybe you can make a patch! :-)
>
yes, it seems gcc always generates set/compare/jump/set sequence,
then optimizes it out in if-convert pass. Maybe it was left behind by
early mips1, which has no conditional move instructions.

it is some kinda related with my current work, I'll try to see if I could
help with it after more study.

Thanks.

-- 
Best Regards.


split lui_movf pattern on mips?

2010-04-29 Thread Amker.Cheng
HI:
   There is comment on lui_movf in mips.md like following,

;; because we don't split it.  FIXME: we should split instead.

I can split it into a move and a condmove(movesi_on_cc) insns , like

(define_split
 [(set (match_operand:CC 0 "d_operand" "")
   (match_operand:CC 1 "fcc_reload_operand" ""))]
 "reload_completed && ISA_HAS_8CC && TARGET_HARD_FLOAT && ISA_HAS_CONDMOVE
 && !CANNOT_CHANGE_MODE_CLASS(CCmode, SImode,

REGNO_REG_CLASS(REGNO(operands[0])))"
 [(set (match_dup 2) (match_dup 3))
  (set (match_dup 2)
   (if_then_else:SI
  (eq:SI (match_dup 1)
 (match_dup 4))
  (match_dup 2)
  (match_dup 4)))]
 "
 {
   operands[2] = gen_rtx_REG(SImode, REGNO(operands[0]));
   operands[3] = GEN_INT(0x3f80);
   operands[4] = const0_rtx;
 }
 ")

But I have two questions.

Firstly, the lui_movf pattern is output as
"lui\t%0,0x3f80\n\tmovf\t%0,%.,%1" in mips_output_move,
why 0x3f80? is it some kind of magic number, or anything else important?

Secondly, I have to change mode of operands[0] into SImode when
splitting, otherwise there is no
insn pattern matching the first insn generated.
Since no new REG generated I assuming the mode transforming is OK
here, any suggestion?

Thanks.
-- 
Best Regards.


Re: split lui_movf pattern on mips?

2010-05-03 Thread Amker.Cheng
> It's the encoding of 1.0f (single precision).  The point is that we want
> something we can safely compare with 0.0f using floating-point instructions.
> "Safe" means "without generating any kind of exception", so a subnormal
> representation like 0x0001 isn't acceptable.  1.0f seems as good a
> value as any.
>

> Yes, this is OK.  Your split looks good, but I don't see any reason
> for the !CANNOT_CHANGE_MODE_CLASS condition.
>
> Couple of minor suggestions:
>
>  - There is no need for the double quotes around the { ... }.
>    Plain { ... } is better.  (Support for plain { ... } was
>    added a few years ago, so you can still see some older code
>    that uses "{ ... }".  But { ... } is better for new code.)
>
>  - It's generally better to restrict match_dups to things
>    that depend on the operands of the original insn.
>    In the above, it'd be better to replace (match_dup 4)
>    with (const_int 0) and then not set operands[4] in the
>    C code.  (match_dup 3) is OK as an exception because
>    read-rtl.c doesn't support hex constants yet...

Thanks, learned a lot from your detailed explanation.

-- 
Best Regards.


a peculiar fpload problem on an inferior processor

2010-05-06 Thread Amker.Cheng
Hi :
   Our processor has an errata that the direct fpu load cannot work right,
so I have to substitute instruction sequence "load_into_gpr ; move_gpr_into_fpr"
for direct fpload insn.
  Currently I thought of two potential methods as following:

method 1:
   step1 :  keep a scratch register when expanding fpload;
   step2 :  split insn fpload into "load_into_gpr ; move_gpr_into_fpr"
sequence by using the reserved scratch register;

method 2:
   generate "load_into_gpr ; move_gpr_into_fpr" when expanding directly.

I have only tried the first method, which end up with the errro "insn
does not satisfy its constraints".
after tracing cc1, found that the problematic insn was generated by
reloading, which trying to
spill float register into memory, which itself using direct fpload.

here is the question : Is it possible to replace all direct fpload
with "load_into_gpr ; move_gpr_into_fpr"
sequence. I doubt it since the reload pass might generate direct
fpload insn for spilling fpu register.

BTW, I prefer to do the replacement in gcc, rather than assembler,
since it might produce lots of pipeline stalls.

So, any advice? Thank you all.

-- 
Best Regards.


Re: a peculiar fpload problem on an inferior processor

2010-05-07 Thread Amker.Cheng
>  It is possible.  Your expander can handle it before reload; to handle it
> during and after reload, you need to implement a TARGET_SECONDARY_RELOAD hook.
>
> http://gcc.gnu.org/onlinedocs/gccint/Register-Classes.html#index-TARGET_005fSECONDARY_005fRELOAD-3974
>
Thanks Dave, It works, but I found that reload is not the only pass
which might generate fpload/fpstore instructions.
I am working with GCC 4.4(mips), there is function(mips_emit_move),
which is called in many pass after register allocation
and might generate fpload/fpstore.
For example, in pass pro_and_epilogue, it generates load/store for fpu
register which saved by function prologue/epilogue.

Seems I have to track down all calling of this function and make sure
it works in my way.

Thanks.

-- 
Best Regards.


Re: a peculiar fpload problem on an inferior processor

2010-05-07 Thread Amker.Cheng
>  Ah, I forgot pro/epilogue generation, but I think that's the only other
> thing that happens after reload.  That is a special case: it has to generate
> strict rtl that directly matches the insns it wants.  You'll probably have to
> arrange for it to save at least one GPR early enough in the prologue sequence
> to be able to use it as a temp for your FP moves, and similar in the epilogue
> sequence.

Yes, Thanks for your help , Dave



-- 
Best Regards.


Re: a peculiar fpload problem on an inferior processor

2010-05-10 Thread Amker.Cheng
On Sat, May 8, 2010 at 2:52 PM, Amker.Cheng  wrote:
>>  Ah, I forgot pro/epilogue generation, but I think that's the only other
>> thing that happens after reload.  That is a special case: it has to generate
>> strict rtl that directly matches the insns it wants.  You'll probably have to
>> arrange for it to save at least one GPR early enough in the prologue sequence
>> to be able to use it as a temp for your FP moves, and similar in the epilogue
>> sequence.
>

Sorry to disturb again, concerning this problem, There is another case
have to be handled.
the reload pass also takes care of call saved registers by generating
save/restore insns,
which might generate direct fpload/fpstore instructions. (in
save_call_clobbered_regs, etc.)

I see no way to keep GPR for this case, except using the temporary
register of the ABI,
and it seems safe in this case since the temp register are only used
around calling insn.
Actually I am not very sure about this.

Any suggestion? Thanks.

-- 
Best Regards.


Is it safe to use $t0 when handling call clobbered registers (on MIPS)

2010-05-10 Thread Amker.Cheng
Hi :
  I'm working on a fpu which cannot work fpload insns right, so I have
to use a GPR
reg as temp reg to first load mem into GPR then move GPR into fpu register.

I have handled most cases but the case gcc handling call clobbered fpu
registers.
since it is in reload pass, I have no available GPR to use here.
I'm wondering whether I could use temporary registers such as
$t0...$t9 in this case.

It's safe as far as I can see, since the save/restore operation is
around calling insn,
and there are MIPS_PROLOGUE_TEMP and MIPS_EPILOGUE_TEMP which used
in the prologue/epilogue cases.

but I am not very sure about it, Any suggestion? Thank you all.

-- 
Best Regards.


mips secondary reload question

2010-05-12 Thread Amker.Cheng
Hi:
  as to page http://gcc.gnu.org/ml/gcc/2010-05/msg00091.html,
If the fpu register can not copied to/from memory directly, I have
to use intermediate GPR registers.

In fact, I return GP_REGS if copying x to a register in class FP_REGS
in any mode(including CCmode), this results in infinite recursive calling
of memory_move_secondary_cost.

After tracing cc1, I found the calling sequence is like:

memory_move_secondary_cost (CCmode, ST_REGS, 1)  -->
memory_move_secondary_cost (CCmode, FP_REGS, 1)  -->
memory_move_secondary_cost (CCmode, ST_REGS, 1)  -->
memory_move_secondary_cost (CCmode, FP_REGS, 1)  -->
... infinite recursive

It seems function default_secondary_reload always use ST_REGS as intermediate
register for FP_REGS:CCmode according to reload_incc pattern.
This is all what i found, and I have totally no idea about how reload
pass works .

any explanation?

Thanks.

-- 
Best Regards.


GCC4.3.4 downside against GCC3.4.4 on mips?

2010-05-25 Thread Amker.Cheng
Hi all,
  I compared assembly files of a function compiled by GCC4.3.4 and GCC3.4.4.
The function focuses on array computation and has no branch, or any
loop structure,
The command line is like "-march=mips32r2 -O3", and here is the
instruction statics:

total: 1879 : 1534
 addiu  :6   :6
 addu  :  216  :  129
  jr   :1   :1
 lui  :5:5
  lw :  396  :  353
madd  :   41   :0
mfhi:   80   :   80
mflo:  121  :   86
move  :0:   21
mtlo   :   39   :0
 mul   :   85   :0
mult   :   18   :   80
   multu  :   64   :0
  or:   80   :   80
 sll :   80  :   80
 sra   :   79   :   47
 srl:   80   :   80
subu  :   80   :   80
  sw   :  408  :  406

Considering there is no any branch or loop structure ,It seems result
of GCC3.4.4
is much better, since generating much less instructions.

secondly, GCC4.3.4 does consume less stack slots(1224 bytes against 1408).

So, any comments? Thanks in advance.
-- 
Best Regards.


Re: GCC4.3.4 downside against GCC3.4.4 on mips?

2010-05-27 Thread Amker.Cheng
> Posting some random numbers without a test-case and precise command line
> parameters for both compilers makes the numbers useless, IMHO. You also
> only mention instruction counts. Have you actually benchmarked the
> resulting code? CPUs are complicated and what you might perceive as worse
> code might actually be superior thanks to scheduling and internal CPU
> parallelism etc.

Thanks for reminding.
After some investigation, I could demonstrate the issue by following
piece of code:
-begin here---
extern int *p[5];

# define REAL_RADIX_224
# define REAL_MUL_2(x, y)(((long long)(x) * (long long)(y)) >>
REAL_RADIX_2)


void func(int *b1, int *b2)
{
  int c0 = p[3][0];
  int c1 = p[3][1];

  b2[0x18] = b1[0x18] + b1[0x1B];
  b2[0x1B] = REAL_MUL_2((b1[0x18] - b1[0x1B]) , c0);

  b2[0x19] = b1[0x19] + b1[0x1A];
  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

  b2[0x1C] = b1[0x1C] + b1[0x1F];
  b2[0x1F] = REAL_MUL_2((b1[0x1F] - b1[0x1C]) , c0);

  b2[0x1D] = b1[0x1D] + b1[0x1E];
  b2[0x1E] = REAL_MUL_2((b1[0x1E] - b1[0x1D]) , c1);
}
-cut here---

It seems GCC4.3.4 always expands the long long multiplication into
three long multiplications, like
-begin here---
#  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

lw  $6,104($4)
lw  $2,100($4)
subu$2,$2,$6
mult$11,$2
sra $6,$2,31
madd$6,$9
mflo$6
multu   $2,$9
mfhi$3
addu$3,$6,$3
sll $6,$3,8
mflo$2
srl $7,$2,24
or  $7,$6,$7
sw  $7,104($5)
-cut here---

while GCC3.4.4 treats the long long multiplication just like simple
ones, which generates only one
mult insn for each statement, like
-begin here---
#  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

lw  $2,100($4)
lw  $7,104($4)
subu$3,$2,$7
mult$3,$9
mflo$6
mfhi$25
srl $15,$6,24
sll $24,$25,8
or  $14,$15,$24
sw  $14,104($5)
-cut here---

In my understanding, It‘s not necessary using three mult insn to implement
long long mult, since the operands are converted from int type.

And as before, the compiling options are like "-march=mips32r2  -O3"

Thanks.

-- 
Best Regards.


Puzzle about macro MIPS_PROLOGUE_TEMP_REGNUM

2010-06-04 Thread Amker.Cheng
Hi :
   I found the temp register used for saving registers when expanding
prologue is defined by
macro MIPS_PROLOGUE_TEMP_REGNUM on mips target, like:

#define MIPS_PROLOGUE_TEMP_REGNUM \
  (cfun->machine->interrupt_handler_p ? K0_REG_NUM : GP_REG_FIRST + 3)

I don't understand why using registers starting from $3?
in my application, I have to save DFmode fpu regs through gpr regs,
that is $3,$4 in this case,
just like :
mfc1  $3,  $fpr
sw $3,  addr
mfc1  $4,  $fpr+1
sw $4,  addr+4

apparently this would crush the argument in $4.

Here is question,
why don't use $8 for MIPS_PROLOGUE_TEMP_REGNUM like EPILOGUE_TEMP?
Or have I done something wrong?

So, any clarification? Thanks in advance.
-- 
Best Regards.


Re: Puzzle about macro MIPS_PROLOGUE_TEMP_REGNUM

2010-06-06 Thread Amker.Cheng
>
> It's not "starting from $3".  It's $3 and nothing else ;-)  It's not
> intended to be used as (MIPS_PROLOGUE_TEMP_REGNUM + N).
>
> $3 was chosen because it's a MIPS16 register, and can therefore
> be used for both MIPS16 and normal-mode code.  $2 used to be the
> static chain register, which left $3 as the only free call-clobbered
Thank all of you for explanation.

> MIPS16 register.  I changed the static chain register to $15 to avoid
> a clash with the MIPS16 gp-load sequence:
>
>    http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00622.html
>
> so $2 is probably free now too.
Seems $2 is used for gp load in MIPS16 defined by MIPS16_PIC_TEMP_REGNUM,
which should not conflict with MIPS_PROLOGUE_TEMP_REGNUM either.

Mips target uses mips_split_doubleword_move in mips_save_reg
 to implement double float reg saving.
Seems I have to provide a special pattern using exactly
the only (MIPS_PROLOGUE_TEMP_REGNUM) register,
rather than paired registers starting from it.

But, more patterns might result in consuming more memory, time.
Since my application is some kinda very unique(o32 abi and no MIPS16),
maybe I could use some paired temporary register in this purpose,
like $8-$15, $24-$25.

Thanks.
-- 
Best Regards.


a typo in ira-emit.c?

2010-06-09 Thread Amker.Cheng
Hi :
I am studying ira right now, there is following code in change_loop

  if (parent_allocno == NULL
  || REGNO (ALLOCNO_REG (parent_allocno)) == REGNO (original_reg))
{
  if (internal_flag_ira_verbose > 3 && ira_dump_file)
fprintf (ira_dump_file, "  %i vs parent %i:",
 ALLOCNO_HARD_REGNO (allocno),
 ALLOCNO_HARD_REGNO (parent_allocno));
  set_allocno_reg (allocno, create_new_reg (original_reg));
}
Is it possible that parent_allocno == NULL here? or the fprintf might broken.

Thanks.

-- 
Best Regards.


Re: a typo in ira-emit.c?

2010-06-09 Thread Amker.Cheng
>
> Yes, I think it can be NULL in some complicated cases when a loop exit edge
> comes not in the parent loop.
By that, you mean the case an regno lives on edges which transfer
between adjacent loops,
and not lives in parent loop?
So, the fprintf would access null pointer in this case.

Thanks for explanation.
-- 
Best Regards.


subreg against register allocation?

2010-06-14 Thread Amker.Cheng
Hi :
I am studying IRA right now (GCC4.4.1,mips32 target),
for following piece of code:

long long func(int a, int b)
{
  long long r = (long long)a * (long long)b;

  return r;
}

the asm generated on mips is like:

mult$5,$4
mfhi$5
mflo$2
j   $31
move$3,$5   <--unnecessary move insn

Please note the unnecessary move insn.

RTL list before subreg1 and IRA pass are like:

before subreg1
(insn 7 4 8 2 mult-problem.c:2 (set (reg:DI 196)
(mult:DI (sign_extend:DI (reg/v:SI 195 [ b ]))
(sign_extend:DI (reg/v:SI 194 [ a ] 50 {mulsidi3_32bit} (nil))

(insn 8 7 12 2 mult-problem.c:2 (set (reg:DI 193 [  ])
(reg:DI 196)) 282 {*movdi_32bit} (nil))

(insn 12 8 18 2 mult-problem.c:6 (set (reg/i:DI 2 $2)
(reg:DI 193 [  ])) 282 {*movdi_32bit} (nil))

before IRA
(insn 7 4 25 2 mult-problem.c:2 (set (reg:DI 196)
(mult:DI (sign_extend:DI (reg:SI 5 $5 [ b ]))
(sign_extend:DI (reg:SI 4 $4 [ a ] 50 {mulsidi3_32bit}
(expr_list:REG_DEAD (reg:SI 5 $5 [ b ])
(expr_list:REG_DEAD (reg:SI 4 $4 [ a ])
(nil

(insn 25 7 26 2 mult-problem.c:6 (set (reg:SI 2 $2)
(subreg:SI (reg:DI 196) 0)) 287 {*movsi_internal} (nil))

(insn 26 25 18 2 mult-problem.c:6 (set (reg:SI 3 $3 [+4 ])
(subreg:SI (reg:DI 196) 4)) 287 {*movsi_internal}
(expr_list:REG_DEAD (reg:DI 196)
(nil)))
---end


Seems DImode split  prevents IRA allocating $2/$3 directly
by introducing conflicts between $196 and $2/3 in (insn 25/26).

Wondering whether possible to handle multi-word mode with more accuracy,
in either subreg or IRA pass?

Thanks in advance.

-- 
Best Regards.


Re: subreg against register allocation?

2010-06-14 Thread Amker.Cheng
Thanks for explanation.

here are three more questions
1 , If I am talking the right thing, there are two insns like
   "*mulsi3_1" and "*smulsi3_highpart_insn",
 which set two parts of DImode pseudo regs of DImode mult.

Since both parts pf result are used in the original example,
I am not sure how to make split pattern to handle this case
without generating two duplicate mult insns in parallel.

2 , If I could set the two parts of result in parallel insn, I also have to
handle mips specific constraints in this case, i.e, constraints
for HI/LO registers.
Unfortunately, There is no "h" constraint now according to patch
http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01750.html

It is not possible to write hi reg without clobbering the lo reg now,
How should I handle this?

3 , Since I am studying IRA right now, I am very curious about whether
possible to solve this in IRA. e.g, by shrinking live ranges
of multi-word pseudo regs?

PS, maybe I am talking gibberish, Sorry If not clear enough.
Thanks.
-- 
Best Regards.


question on function change_loop in IRA

2010-06-22 Thread Amker.Cheng
Hi:
  At last of function change_loop, gcc try to change ALLOCNO_REG of
local allocno.
In the loop, ALLOCNO_SOMEWHERE_RENAMED_P (allocno) is set if allocno is not
caps.
Don't understand why the flag is set here. Doesn't all local allocnos'
flag are set in this
loop? seems conflicting with function set_allocno_somewhere_renamed_p
and comments
about that flag in ira-int.h

Any tips? Thanks in advance.

-- 
Best Regards.


Re: GCC4.3.4 downside against GCC3.4.4 on mips?

2010-07-11 Thread Amker.Cheng
>>>
>>> while GCC3.4.4 treats the long long multiplication just like simple
>>> ones, which generates only one
>>> mult insn for each statement, like
>>>
>>> In my understanding, It‘s not necessary using three mult insn to implement
>>> long long mult, since the operands are converted from int type.
>>
>> This is more helpful.  It is a known case in which GCC 4.x generates worse
>> code.
>
> Should be fixed with 4.6.

Hi, I tested this problem on GCC4.6 snapshot, and it works.
But I could not find the specific patch or record in buglist,
could you help? thanks very much.

-- 
Best Regards.


question about float insns like ceil/floor on mips machine

2010-07-19 Thread Amker.Cheng
Hi:
  I found although there are standard pattern names such as "ceilm2/floorm2",
there is no insn pattern in mips.md for such float insns on mips target.
further more, there is no ceil/floor rtl code in rtl.def either.

based on these facts, I assuming those float insns are not supported by gcc,
but don't know why, seems not difficult to add such insns.

Did I miss anything important? please help, thanks.

-- 
Best Regards.


why are multiply-accumulate insns not used when -mfp32 on mips

2010-07-20 Thread Amker.Cheng
HI:
   found mult-acc insns like madd.s/d are only used when -mfp64 is specified,
as to codes, there macros defined as:

#define ISA_HAS_FP4 ((ISA_MIPS4 \
  || (ISA_MIPS32R2 && TARGET_FLOAT64)   \
<--only float 64
  || ISA_MIPS64 \
  || ISA_MIPS64R2)  \
 && !TARGET_MIPS16)

#define ISA_HAS_FP_MADD4_MSUB4  ISA_HAS_FP4

why not use madd when fp32? Is there anything special with fp32?

any clarification? Thanks very much

-- 
Best Regards.


A minor mistake in cse_main?

2010-08-17 Thread Amker.Cheng
Hi :
  In function cse_main, gcc processes ebb path by path.
firstly, gcc finds the first bb of path in the reverse post order queue,
plus if the bb is still not visited.
then gcc finds all paths starting with that first bb.

the corresponding code is like:

  do
{
  bb = BASIC_BLOCK (rc_order[i++]);
}
  while (TEST_BIT (cse_visited_basic_blocks, bb->index)
 && i < n_blocks); <---i might be
equal to n_blocks at last

  while (cse_find_path (bb, &ebb_data, flag_cse_follow_jumps))
  //...other codes

But this code might result in unwanted operation. looking into one .cse2 dump
file i've encountered, the paths information like:

;; Following path with 37 sets: 2
;; Following path with 23 sets: 3
;; Following path with 11 sets: 4 5
;; Following path with 9 sets: 6 7 9
deferring rescan insn with uid = 163.
;; Following path with 8 sets: 6 7 8 <---basic block 8
first handled here
;; Following path with 19 sets: 10 11
;; Following path with 2 sets: 8   <---handled again

Apparently, basic block 8 in the last path has already been
processed(in path 6, 8, 9).
the problem is that both conditions of the do-while statement could be
false, and gcc
does not break out from here.

for more information, the reverse post order (rc_order) for that case
is dumped :
  rc_order [0] = 2
  rc_order [1] = 3
  rc_order [2] = 4
  rc_order [3] = 5
  rc_order [4] = 6
  rc_order [5] = 7
  rc_order [6] = 9
  rc_order [7] = 10
  rc_order [8] = 11
  rc_order [9] = 8 < the
last basic block is 8


Seems gcc should break after do-while statement if `i' and `b_blocks' are equal.
Any comments? Thanks.

-- 
Best Regards.


question on points-to analysis

2010-09-09 Thread Amker.Cheng
Hi,
I am studying gcc's points-to analysis right now and encountered a question.
In paper "Off-line Variable Substitution for Scaling Points-to
Analysis", section 3.2
It says that we should not substitute a variable with other if it is
taken address.
But in GCC's implementation, it units pointer but not location
equivalent variables
in function unite_pointer_equivalences.
I am puzzled why gcc does this operation and How gcc keeps accuracy of points-to
information after doing this.

Further more, I did not found any words about this in paper
"Exploiting Pointer and Location Equivalence to Optimize Pointer
Analysis", which
according comments in gcc, is the basis of GCC's implementation.

Any tips?Thanks in advance.

-- 
Best Regards.


Re: question on points-to analysis

2010-09-11 Thread Amker.Cheng
> In theory, this is true, but a lot of the optimizations decrease
> accuracy at a cost of making the problem solvable in a reasonable
> amount of time.
> By performing it after building initial points-to sets, the amount of
> accuracy loss is incredibly small.
> The only type of constraint that will generate inaccuracy at that
> point is a complex address taken with offset one, which is pretty
> rare.
> On the other hand, *not* doing it will make the problem take forever to solve 
> :)
>
> What's better, something that gives correct but slightly conservative
> answers in 10s, or something that gives correct and 1% less
> conservative answers in 200s?
>

Got it, Thanks for Richard's quick reply and Daniel's detailed explanation.
I need to dig deep to understand the codes.


-- 
Best Regards.


question on ssa representation of aggregates

2010-10-22 Thread Amker.Cheng
Hi :
   In paper "Memory SSA-A Unified Approach for Sparsely Representing
Memory Operations",
section 2.2, it says :

"Whenever possible, compiler will create symbolic names to represent distinct
regions inside aggregates(called structure field tags or SFT). For instance,
in Figure 2(b), GCC will create three SFT symbols for this structure, namely
SFT.0 for A.x, SFT.1 for A.b and SFT.2 for A.a"

I tried GCC4.4.1(mips target) with following piece of code,
---start
struct tag_1
{
  int *i;
  int *j;
  int *x;
  int y;
}a;
struct tag_2
{
  struct tag_1 t1[100];
  int x[200];
  int *y;
}s;
int func(int **p)
{
int *c = *p;
if (a.y > 0)
  s.y = *p1;
else
  *c = *s.y;

  return 0;
}
---end
The "055t.alias" dumped are like,
---start
func (int * * p)
{
  int * c;
  int * gp.2;
  int g.1;
  int D.1352;
  int * D.1351;
  int * D.1349;
  int * * p1.0;
  int D.1345;

:
  # VUSE 
  c_2 = *p_1(D);
  # VUSE 
  D.1345_3 = a.y;
  if (D.1345_3 > 0)
goto ;
  else
goto ;

:
  # VUSE 
  p1.0_4 = p1;
  # VUSE 
  D.1349_5 = *p1.0_4;
  # s_18 = VDEF 
  s.y = D.1349_5;
  goto ;

:
  # VUSE 
  D.1351_6 = s.y;
  # VUSE 
  D.1352_7 = *D.1351_6;
  # g_21 = VDEF 
  # a_22 = VDEF 
  # s_23 = VDEF 
  # SMT.14_24 = VDEF 
  *c_2 = D.1352_7;
---end.

it seems structure a and s are treated as array variables, no SFT is created.

Did I miss anything or the implementation is different? Thanks.
-- 
Best Regards.


Re: question on ssa representation of aggregates

2010-10-22 Thread Amker.Cheng
> The implementation of this stuff changes fairly regularly.  The people
> who like this kind of thing are still honing in on the best way to
> handle aliasing information.  Richard Guenther is the main guy working
> in this area today.

thanks very much for clarification.


-- 
Best Regards.