Re: prevent optimisation of variables on SSA

2008-08-18 Thread Martin Schindewolf

Hi,
I will try to explain in more detail:
I am trying to implement support for transactions in GCC. In particular 
for a Software Transactional Memory library called tinySTM. Since 
transactions have the nice property that they can abort and execute 
again, we need some kind of checkpointing at the beginning of a 
transaction. The checkpointing is done two fold: on one hand setjmp is 
used to capture the register state on the other hand temporary variables 
are introduced by GCC to capture the value of live-in variables to the 
transaction and play the values back to the original variables in case 
of an abort. In terms of IR code a begin of a transaction is translated 
into this:
 
 txn_handle.12_4 = __builtin_stm_new ();

 jmp_buf.13_5 = __builtin_stm_get_env (txn_handle.12_4);
 ssj_value.14_6 = _setjmp (jmp_buf.13_5);
 if (ssj_value.14_6 == 0)
   goto  ();
 else
   goto  ();
 # SUCC: 4 (false) 3 (true)

 # BLOCK 3
 # PRED: 2 (true)
:;
 __builtin_printf ("SAVE_BB\n");
 txn_save_m.19_15 = m_1;
 txn_save_a.20_20 = a_2;
 txn_save_b.21_25 = b_3;
 fake_var.18_13 = __builtin_stm_dummy ("RECOVER_BB\n");
 if (fake_var.18_13)
   goto  ();
 else
   goto ;
 # SUCC: 4 (true) 5 (false)

 # BLOCK 4
 # PRED: 2 (false) 3 (true)
 # txn_save_b.21_26 = PHI 
 # txn_save_a.20_21 = PHI 
 # txn_save_m.19_16 = PHI 
:;
 __builtin_printf ("RECOVER_BB\n");
 m_17 = txn_save_m.19_16;
 a_22 = txn_save_a.20_21;
 b_27 = txn_save_b.21_26;
 # SUCC: 5 (fallthru)

 # BLOCK 5
 # PRED: 4 (fallthru) 3 (false)
 # m_18 = PHI 
 __builtin_stm_start (txn_handle.12_4, jmp_buf.13_5, &0);


the first part is acquiring a handle for the transactions from the STM, 
then a jump buffer is created by the STM and the setjmp is executed. 
Since the longjmp executed by the STM (in case of an abort) returns to 
this location, we simply evaluate the return value of the setjmp call 
and know what to execute. In case of returning from the call to setjmp 
we want to capture the values of the live-in variables, which are a,b 
and m in this example. In case we returned from a longjmp we want to 
restore the values from the temporary variables (txn_save_a/b/m) to the 
original variables (a/b/m).
My problem with the SSA optimisations is that (after the SSA 
final_cleanup pass) in this example the initialisation of the variables 
a and b is moved into the SAVE_BB and RESTORE_BB basic blocks - both 
look like this:

 a = 5;
 b = 10;
and the txn_save_* variables are optimised "away". Is there any way to 
prevent this kind of optimisations on SSA form for a particular group of 
variables? Here I would like to see the txn_save_* variables behave as a 
container for the values of the real variables and be sure that these 
variables are not touched by any optimisations.

Thank you very much for your ideas!

Regards,
Martin




Re: prevent optimisation of variables on SSA

2008-08-18 Thread Richard Guenther
On Mon, Aug 18, 2008 at 1:15 PM, Martin Schindewolf <[EMAIL PROTECTED]> wrote:
> Hi,
> I will try to explain in more detail:
> I am trying to implement support for transactions in GCC. In particular for
> a Software Transactional Memory library called tinySTM. Since transactions
> have the nice property that they can abort and execute again, we need some
> kind of checkpointing at the beginning of a transaction. The checkpointing
> is done two fold: on one hand setjmp is used to capture the register state
> on the other hand temporary variables are introduced by GCC to capture the
> value of live-in variables to the transaction and play the values back to
> the original variables in case of an abort. In terms of IR code a begin of a
> transaction is translated into this:
>   txn_handle.12_4 = __builtin_stm_new ();
>  jmp_buf.13_5 = __builtin_stm_get_env (txn_handle.12_4);
>  ssj_value.14_6 = _setjmp (jmp_buf.13_5);
>  if (ssj_value.14_6 == 0)
>   goto  ();
>  else
>   goto  ();
>  # SUCC: 4 (false) 3 (true)
>
>  # BLOCK 3
>  # PRED: 2 (true)
> :;
>  __builtin_printf ("SAVE_BB\n");
>  txn_save_m.19_15 = m_1;
>  txn_save_a.20_20 = a_2;
>  txn_save_b.21_25 = b_3;
>  fake_var.18_13 = __builtin_stm_dummy ("RECOVER_BB\n");
>  if (fake_var.18_13)
>   goto  ();
>  else
>   goto ;
>  # SUCC: 4 (true) 5 (false)
>
>  # BLOCK 4
>  # PRED: 2 (false) 3 (true)
>  # txn_save_b.21_26 = PHI 
>  # txn_save_a.20_21 = PHI 
>  # txn_save_m.19_16 = PHI 
> :;
>  __builtin_printf ("RECOVER_BB\n");
>  m_17 = txn_save_m.19_16;
>  a_22 = txn_save_a.20_21;
>  b_27 = txn_save_b.21_26;
>  # SUCC: 5 (fallthru)
>
>  # BLOCK 5
>  # PRED: 4 (fallthru) 3 (false)
>  # m_18 = PHI 
>  __builtin_stm_start (txn_handle.12_4, jmp_buf.13_5, &0);
> 
>
> the first part is acquiring a handle for the transactions from the STM, then
> a jump buffer is created by the STM and the setjmp is executed. Since the
> longjmp executed by the STM (in case of an abort) returns to this location,
> we simply evaluate the return value of the setjmp call and know what to
> execute. In case of returning from the call to setjmp we want to capture the
> values of the live-in variables, which are a,b and m in this example. In
> case we returned from a longjmp we want to restore the values from the
> temporary variables (txn_save_a/b/m) to the original variables (a/b/m).
> My problem with the SSA optimisations is that (after the SSA final_cleanup
> pass) in this example the initialisation of the variables a and b is moved
> into the SAVE_BB and RESTORE_BB basic blocks - both look like this:
>  a = 5;
>  b = 10;
> and the txn_save_* variables are optimised "away". Is there any way to
> prevent this kind of optimisations on SSA form for a particular group of
> variables? Here I would like to see the txn_save_* variables behave as a
> container for the values of the real variables and be sure that these
> variables are not touched by any optimisations.

Well, if you are inserting this abnormal control flow after going into SSA
then you need to make sure all SSA variables affected are marked
as SSA_NAME_OCCURS_IN_ABNORMAL_PHI and make sure that
SSA names from a single base variable not have overlapping life-ranges.

Which means what you are doing is not really going to be easy in
SSA form and you should rather try to do it before going into SSA.
And I would suggest you use the exception handling machinery of
GCC to do it and not insert setjmp/longjmp calls yourself or try to
track the complete program state.

Richard.

> Thank you very much for your ideas!
>
> Regards,
> Martin
>
>
>


Creating own Backend: Segmentation fault in mark_jump_label_1

2008-08-18 Thread Balthasar Biedermann
Hello everybody,

I create a gcc-backend. I have already created the *.md, *.h and *.c
files and I have compiled the gcc which includes backend.

But when I try to compile a simple c-File with my gcc I get a
Segmentation fault. I tried to debug it but I don't get the point.

The Error occures in mark_jump_label_1. It is called with a null pointer
as the rtx x.

gdb told me:

Program received signal SIGSEGV, Segmentation fault.
0x081c5d48 in mark_jump_label_1 (x=0x0, insn=0xb7b77118, in_mem=0 '\0',
is_target=0 '\0') at ../.././gcc/jump.c:987
987   RTX_CODE code = GET_CODE (x);
(gdb) where
#0  0x081c5d48 in mark_jump_label_1 (x=0x0, insn=0xb7b77118, in_mem=0
'\0', is_target=0 '\0') at ../.././gcc/jump.c:987
#1  0x081c60e0 in mark_jump_label_1 (x=0xb7b70e28, insn=0xb7b77118,
in_mem=0 '\0', is_target=0 '\0') at ../.././gcc/jump.c:1108
#2  0x081c5d2f in mark_jump_label (x=0xb7b70e28, insn=0xb7b77118,
in_mem=0) at ../.././gcc/jump.c:974
#3  0x081c4af4 in mark_all_labels (f=0xb7bdb120) at ../.././gcc/jump.c:194
#4  0x081c48f7 in rebuild_jump_labels (f=0xb7bdb120) at
../.././gcc/jump.c:86
#5  0x08418fdf in tree_expand_cfg () at ../.././gcc/cfgexpand.c:1935
#6  0x08203198 in execute_one_pass (pass=0x856a640) at
../.././gcc/passes.c:1122
#7  0x082032e4 in execute_pass_list (pass=0x856a640) at
../.././gcc/passes.c:1175
#8  0x082ca0fd in tree_rest_of_compilation (fndecl=0xb7bd7a10) at
../.././gcc/tree-optimize.c:404
#9  0x083db7bd in cgraph_expand_function (node=0xb7b73280) at
../.././gcc/cgraphunit.c:1157
#10 0x083da461 in cgraph_assemble_pending_functions () at
../.././gcc/cgraphunit.c:523
#11 0x083da72d in cgraph_finalize_function (decl=0xb7bd7a10, nested=0
'\0') at ../.././gcc/cgraphunit.c:641
#12 0x0805a470 in finish_function () at ../.././gcc/c-decl.c:6803
#13 0x080a7f9b in c_parser_declaration_or_fndef (parser=0xb7be2b28,
fndef_ok=1 '\001', empty_ok=1 '\001', nested=0 '\0', start_attr_ok=1 '\001')
at ../.././gcc/c-parser.c:1424
#14 0x080a7920 in c_parser_external_declaration (parser=0xb7be2b28) at
../.././gcc/c-parser.c:1179
#15 0x080a762c in c_parser_translation_unit (parser=0xb7be2b28) at
../.././gcc/c-parser.c:1081
#16 0x080b44c1 in c_parse_file () at ../.././gcc/c-parser.c:8011
#17 0x0809b675 in c_common_parse_file (set_yydebug=0) at
../.././gcc/c-opts.c:1291
#18 0x082861fa in compile_file () at ../.././gcc/toplev.c:1042
#19 0x08287bc4 in do_compile () at ../.././gcc/toplev.c:2247
#20 0x08287c26 in toplev_main (argc=2, argv=0xbfa608c4) at
../.././gcc/toplev.c:2279
#21 0x080bba42 in main (argc=Cannot access memory at address 0x0
) at ../.././gcc/main.c:35


Which Target Makro or which machine description could create such an error?

Thank you for your help!

Balthasar



signature.asc
Description: OpenPGP digital signature


GCC 4.3.2 Status Report (2008-08-18)

2008-08-18 Thread Joseph S. Myers
Status
==

Although the number of open regression PRs is increasing, probably as
more people start to use 4.3 branch, there are currently no open P1
PRs and it does not look like there are any serious PRs open for
regressions relative to 4.3.0 or 4.3.1.  Thus, I propose to make
4.3.2-rc1 tomorrow, with a release or -rc2 following about a week
later.  The process of fixing regressions can continue for subsequent
4.3 releases, which I expect at approximately two-month intervals
until 4.4.0 is released.

Starting 17:00 UTC tomorrow, all changes to 4.3 branch will need to be
approved by a release manager until the 4.3.2 release is out.  (Fixes
that might be appropriate while in this state would include those for
wrong-code or rejects-valid regressions relative to 4.3.0 or 4.3.1, or
fixes for such regressions relative to previous releases that can be
seen to be particularly safe.)

Quality Data


Priority  # Change from Last Report
--- ---
P10 -  3
P2  116 +  3
P37 +  7
--- ---
Total   123 +  7

Previous Report
===

http://gcc.gnu.org/ml/gcc/2008-08/msg00132.html

I will send the next report for the 4.3 branch at the time of the
4.3.2 release or -rc2.

-- 
Joseph S. Myers
[EMAIL PROTECTED]


vectorizer question

2008-08-18 Thread VandeVondele Joost


The attached testcase yields (on a core2 duo, gcc trunk):


gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90
time ./a.out

real0m3.414s


ifort -xT -O3  test.f90
time ./a.out

real0m1.556s

The assembly contains:

ifort   gfortran
mulpd 140  0
mulsd   0280

so the reason seems that ifort vectorizes the following code (full 
testcase attached):


SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg)

 IMPLICIT NONE
 INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
 integer, PARAMETER :: lp=6
real(wp), INTENT(OUT):: res
integer, INTENT(IN) :: cmax,kg,jg
real(wp), INTENT(IN):: pol_x(0:lp,-cmax:cmax)
real(wp), INTENT(IN):: pol_y(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: pol_z(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
real(wp) ::  coef_xy(2,(lp+1)*(lp+2)/2)
real(wp) ::  coef_x(4,0:lp)

[...]
coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,12)*pol_y(1,1,jg)
coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,12)*pol_y(2,1,jg)
coef_x(1:2,5)=coef_x(1:2,5)+coef_xy(1:2,13)*pol_y(1,1,jg)
coef_x(3:4,5)=coef_x(3:4,5)+coef_xy(1:2,13)*pol_y(2,1,jg)
coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,14)*pol_y(1,2,jg)
coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,14)*pol_y(2,2,jg)
coef_x(1:2,1)=coef_x(1:2,1)+coef_xy(1:2,15)*pol_y(1,2,jg)
coef_x(3:4,1)=coef_x(3:4,1)+coef_xy(1:2,15)*pol_y(2,2,jg)
coef_x(1:2,2)=coef_x(1:2,2)+coef_xy(1:2,16)*pol_y(1,2,jg)
coef_x(3:4,2)=coef_x(3:4,2)+coef_xy(1:2,16)*pol_y(2,2,jg)
coef_x(1:2,3)=coef_x(1:2,3)+coef_xy(1:2,17)*pol_y(1,2,jg)
coef_x(3:4,3)=coef_x(3:4,3)+coef_xy(1:2,17)*pol_y(2,2,jg)
coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,18)*pol_y(1,2,jg)
coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,18)*pol_y(2,2,jg)
coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,19)*pol_y(1,3,jg)
coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,19)*pol_y(2,3,jg)
[...]

either it is able to interpret the short vectors as such, or it realizes 
that these very short implicit loops are nevertheless favourable for 
vectorization.


Is there a trick to get gcc vectorize these loops, or is there some 
technology missing for this ?


Should I file a PR for this (this is somewhat similar to PR31079 and 
PR31021)?


Thanks in advance,

Joost
SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg)

 IMPLICIT NONE
 INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
 integer, PARAMETER :: lp=6
real(wp), INTENT(OUT):: res
integer, INTENT(IN) :: cmax,kg,jg
real(wp), INTENT(IN):: pol_x(0:lp,-cmax:cmax)
real(wp), INTENT(IN):: pol_y(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: pol_z(1:2,0:lp,-cmax:0)
real(wp), INTENT(IN):: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
real(wp) ::  coef_xy(2,(lp+1)*(lp+2)/2)
real(wp) ::  coef_x(4,0:lp)

coef_xy=0.0_wp
coef_xy(:,1)=coef_xy(:,1)+coef_xyz(1)*pol_z(:,0,kg)
coef_xy(:,2)=coef_xy(:,2)+coef_xyz(2)*pol_z(:,0,kg)
coef_xy(:,3)=coef_xy(:,3)+coef_xyz(3)*pol_z(:,0,kg)
coef_xy(:,4)=coef_xy(:,4)+coef_xyz(4)*pol_z(:,0,kg)
coef_xy(:,5)=coef_xy(:,5)+coef_xyz(5)*pol_z(:,0,kg)
coef_xy(:,6)=coef_xy(:,6)+coef_xyz(6)*pol_z(:,0,kg)
coef_xy(:,7)=coef_xy(:,7)+coef_xyz(7)*pol_z(:,0,kg)
coef_xy(:,8)=coef_xy(:,8)+coef_xyz(8)*pol_z(:,0,kg)
coef_xy(:,9)=coef_xy(:,9)+coef_xyz(9)*pol_z(:,0,kg)
coef_xy(:,10)=coef_xy(:,10)+coef_xyz(10)*pol_z(:,0,kg)
coef_xy(:,11)=coef_xy(:,11)+coef_xyz(11)*pol_z(:,0,kg)
coef_xy(:,12)=coef_xy(:,12)+coef_xyz(12)*pol_z(:,0,kg)
coef_xy(:,13)=coef_xy(:,13)+coef_xyz(13)*pol_z(:,0,kg)
coef_xy(:,14)=coef_xy(:,14)+coef_xyz(14)*pol_z(:,0,kg)
coef_xy(:,15)=coef_xy(:,15)+coef_xyz(15)*pol_z(:,0,kg)
coef_xy(:,16)=coef_xy(:,16)+coef_xyz(16)*pol_z(:,0,kg)
coef_xy(:,17)=coef_xy(:,17)+coef_xyz(17)*pol_z(:,0,kg)
coef_xy(:,18)=coef_xy(:,18)+coef_xyz(18)*pol_z(:,0,kg)
coef_xy(:,19)=coef_xy(:,19)+coef_xyz(19)*pol_z(:,0,kg)
coef_xy(:,20)=coef_xy(:,20)+coef_xyz(20)*pol_z(:,0,kg)
coef_xy(:,21)=coef_xy(:,21)+coef_xyz(21)*pol_z(:,0,kg)
coef_xy(:,22)=coef_xy(:,22)+coef_xyz(22)*pol_z(:,0,kg)
coef_xy(:,23)=coef_xy(:,23)+coef_xyz(23)*pol_z(:,0,kg)
coef_xy(:,24)=coef_xy(:,24)+coef_xyz(24)*pol_z(:,0,kg)
coef_xy(:,25)=coef_xy(:,25)+coef_xyz(25)*pol_z(:,0,kg)
coef_xy(:,26)=coef_xy(:,26)+coef_xyz(26)*pol_z(:,0,kg)
coef_xy(:,27)=coef_xy(:,27)+coef_xyz(27)*pol_z(:,0,kg)
coef_xy(:,28)=coef_xy(:,28)+coef_xyz(28)*pol_z(:,0,kg)
coef_xy(:,1)=coef_xy(:,1)+coef_xyz(29)*pol_z(:,1,kg)
coef_xy(:,2)=coef_xy(:,2)+coef_xyz(30)*pol_z(:,1,kg)
coef_xy(:,3)=coef_xy(:,3)+coef_xyz(31)*pol_z(:,1,kg)
coef_xy(:,4)=coef_xy(:,4)+coef_xyz(32)*pol_z(:,1,kg)
coef_xy(:,5)=coef_xy(:,5)+coef_xyz(33)*pol_z(:,1,kg)
coef_xy(:,6)=coef_xy(:,6)+coef_xyz(34)*pol_z(:,1,kg)
coef_xy(:,8)=coef_xy(:,8)+coef_xyz(35)*pol_z(:,1,kg)
coef_xy(:,9)=coef_xy(:,9)+coef_xyz(36)*pol_z(:,1,kg)

Re: vectorizer question

2008-08-18 Thread Richard Guenther
2008/8/18 VandeVondele Joost <[EMAIL PROTECTED]>:
>
> The attached testcase yields (on a core2 duo, gcc trunk):
>
>> gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90
>> time ./a.out
>
> real0m3.414s
>
>> ifort -xT -O3  test.f90
>> time ./a.out
>
> real0m1.556s
>
> The assembly contains:
>
>ifort   gfortran
> mulpd 140  0
> mulsd   0280
>
> so the reason seems that ifort vectorizes the following code (full testcase
> attached):
>
> SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg)
>
>  IMPLICIT NONE
>  INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
>  integer, PARAMETER :: lp=6
>real(wp), INTENT(OUT):: res
>integer, INTENT(IN) :: cmax,kg,jg
>real(wp), INTENT(IN):: pol_x(0:lp,-cmax:cmax)
>real(wp), INTENT(IN):: pol_y(1:2,0:lp,-cmax:0)
>real(wp), INTENT(IN):: pol_z(1:2,0:lp,-cmax:0)
>real(wp), INTENT(IN):: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
>real(wp) ::  coef_xy(2,(lp+1)*(lp+2)/2)
>real(wp) ::  coef_x(4,0:lp)
>
> [...]
>coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,12)*pol_y(1,1,jg)
>coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,12)*pol_y(2,1,jg)
>coef_x(1:2,5)=coef_x(1:2,5)+coef_xy(1:2,13)*pol_y(1,1,jg)
>coef_x(3:4,5)=coef_x(3:4,5)+coef_xy(1:2,13)*pol_y(2,1,jg)
>coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,14)*pol_y(1,2,jg)
>coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,14)*pol_y(2,2,jg)
>coef_x(1:2,1)=coef_x(1:2,1)+coef_xy(1:2,15)*pol_y(1,2,jg)
>coef_x(3:4,1)=coef_x(3:4,1)+coef_xy(1:2,15)*pol_y(2,2,jg)
>coef_x(1:2,2)=coef_x(1:2,2)+coef_xy(1:2,16)*pol_y(1,2,jg)
>coef_x(3:4,2)=coef_x(3:4,2)+coef_xy(1:2,16)*pol_y(2,2,jg)
>coef_x(1:2,3)=coef_x(1:2,3)+coef_xy(1:2,17)*pol_y(1,2,jg)
>coef_x(3:4,3)=coef_x(3:4,3)+coef_xy(1:2,17)*pol_y(2,2,jg)
>coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,18)*pol_y(1,2,jg)
>coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,18)*pol_y(2,2,jg)
>coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,19)*pol_y(1,3,jg)
>coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,19)*pol_y(2,3,jg)
> [...]
>
> either it is able to interpret the short vectors as such, or it realizes
> that these very short implicit loops are nevertheless favourable for
> vectorization.
>
> Is there a trick to get gcc vectorize these loops, or is there some
> technology missing for this ?
>
> Should I file a PR for this (this is somewhat similar to PR31079 and
> PR31021)?

It would be nice to have a stand-alone testcase for this, so please
file a bugreport.

Thanks,
Richard.


Re: GCC 4.3.2 Status Report (2008-08-18)

2008-08-18 Thread Eric Botcazou
> Although the number of open regression PRs is increasing, probably as
> more people start to use 4.3 branch, there are currently no open P1
> PRs and it does not look like there are any serious PRs open for
> regressions relative to 4.3.0 or 4.3.1.

PR rtl-opt/36998 is marked "blocker" though.

-- 
Eric Botcazou


Re: GCC 4.3.2 Status Report (2008-08-18)

2008-08-18 Thread Richard Guenther
On Mon, Aug 18, 2008 at 5:30 PM, Eric Botcazou <[EMAIL PROTECTED]> wrote:
>> Although the number of open regression PRs is increasing, probably as
>> more people start to use 4.3 branch, there are currently no open P1
>> PRs and it does not look like there are any serious PRs open for
>> regressions relative to 4.3.0 or 4.3.1.
>
> PR rtl-opt/36998 is marked "blocker" though.

But is "fixed" on the branch and the trunk.

Richard.


Re: vectorizer question

2008-08-18 Thread VandeVondele Joost


It would be nice to have a stand-alone testcase for this, so please
file a bugreport.



I've opened PR37150 for this.

Thanks,

Joost


Re: GCC 4.3.2 Status Report (2008-08-18)

2008-08-18 Thread Eric Botcazou
> But is "fixed" on the branch and the trunk.

Then it should probably not be "blocker" anymore.

-- 
Eric Botcazou


Re: GCC 4.3.2 Status Report (2008-08-18)

2008-08-18 Thread Richard Guenther
On Mon, Aug 18, 2008 at 5:39 PM, Eric Botcazou <[EMAIL PROTECTED]> wrote:
>> But is "fixed" on the branch and the trunk.
>
> Then it should probably not be "blocker" anymore.

"blocker" doesn't have any meaning for the release management process,
but yes, feel free to lower that.

Richard.


Re: [PATCH]: GCC Scheduler support for R10000 on MIPS

2008-08-18 Thread David Daney

Kumba wrote:

Richard Sandiford wrote:


OK otherwise.  Do you have a copyright assignment on file?


Nope.  Is there something I need to fill out and e-mail to someone?



Yes there is.  I'm not sure if Richard can cause them to be sent to you, but certainly requesting copyright assignment documents on gcc@gcc.gnu.org would work.  It can often take many weeks to get them processed, so starting as soon as possible would be a good idea. 

Do I need to put my name and the name of the author of the very original 
gcc-3.0 patch in this file as well?


It would depend on if any of the original patch code remains.  If so, probably 
a copyright assignment for the original author would be required as well (at 
least that is my understanding).


David Daney



Re: [PATCH][RFT] Optimization pass-pipeline re-organization [3/n]

2008-08-18 Thread Jeff Law

Richard Guenther wrote:


The most interesting pass change is the removal of the first
DOM/phi-cprop pair.  DOM mostly deals with jump-threading at this
place and for tramp3d catches 473 threads on top of the 2555 ones
performed by the VRP pass that runs right before the first DOM.
  
Ultimately, jump threading is all I really want DOM to do long term 
anyway -- the redundancy elimination performed by DOM can be better 
handled elsewhere.  The interesting question is there redundancy 
elimination left in DOM that is unique and if not, can we get the same 
overall effect if we remove the redundancy elimination from DOM and 
instead using existing passes for redundancy elimination.  Same overall 
effect doesn't mean identical code, but reasonably close and without 
incurring a significant compile-time hit.


Regardless, I fully support eliminating the number of calls into DOM as 
well as trimming the set of transformations performed by DOM.


jeff



Re: [PATCH][RFT] Optimization pass-pipeline re-organization [3/n]

2008-08-18 Thread Richard Guenther
On Mon, 18 Aug 2008, Jeff Law wrote:

> Richard Guenther wrote:
> > 
> > The most interesting pass change is the removal of the first
> > DOM/phi-cprop pair.  DOM mostly deals with jump-threading at this
> > place and for tramp3d catches 473 threads on top of the 2555 ones
> > performed by the VRP pass that runs right before the first DOM.
> >   
> Ultimately, jump threading is all I really want DOM to do long term anyway --

Yes, I agree ..

> the redundancy elimination performed by DOM can be better handled elsewhere.
> The interesting question is there redundancy elimination left in DOM that is
> unique and if not, can we get the same overall effect if we remove the
> redundancy elimination from DOM and instead using existing passes for
> redundancy elimination.  Same overall effect doesn't mean identical code, but

... though I am not sure (I didn't investigate) how much of the
redundancy elimination code feeds the jump threading parts.  I was
thinking of moving the jump threading parts over to SCCVN instead, given
that VRP lacks capabilities regarding to symbolical conditions.

> reasonably close and without incurring a significant compile-time hit.
> 
> Regardless, I fully support eliminating the number of calls into DOM as well
> as trimming the set of transformations performed by DOM.

Thanks.  It seems the last DOM pass doesn't do very much as well, so
I'll be playing with removing that as a last step of cleaning up the
pipeline.  Even if that might need another run of FRE instead of DOM.

Richard.


[graphite] Get graphite backend working again

2008-08-18 Thread Tobias Grosser
Hi Sebastian, hi Jan, hi graphities,

while trying to work on graphite I saw, that the last changes made the
code generation fail for every case.

1. Cloog-PPL creates many unnecessary conditional statements.
=

Problem since: cloog-ppl change.

At the moment cloog ppl generates at least one unnecessary condition in 
every loop:

For example in (scop-0.c):

for (s_1=0;s_1<=T_4-1;s_1++) {
  if ((s_1 >= 0) && (s_1 <= T_4-1)) {
for (s_3=0;s_3<=199;s_3++) {
  if ((s_1 >= 0) && (s_1 <= T_4-1) && (s_3 >= 0) && (s_3 <= 199)) {
if ((s_1 >= 0) && (s_1 <= T_4-1) && (s_3 >= 0) && (s_3 <= 199)) {
  if ((s_1 >= 0) && (s_1 <= T_4-1) && (s_3 >= 0) && (s_3 <= 199)) {
S5(s_1,s_3) ;
  }
}
  }
}
  }
}


As the current code generation does not support conditional statements,
we fail to generate code for them.

Who will work on that one?
Sebastian, do you think cloog ppl can be fixed easily? Or can we support
conditional statements (guards) in graphite codegen. For me it seems,
that most of the infrastructure is already available.

2. Gloog expects that only one edge leads to the SCoP.
==

Problem since: scop_limit_scops(), that fixed parameter detection.

As at the moment every SCoP is surrounded by a loop:
--
We limit all SCoPs to SCoPs, that are completely surrounded by a loop. 

   Example:

   for (i  |
 { |
   for (j  |  SCoP 1
   for (k  |
 } |

   * SCoP frontier, as this line is not surrounded by any loop. *

   for (l  |  SCoP 2

   This is necessary as scalar evolution and parameter detection need a
   outermost loop to initialize parameters correctly.  
--

So the SCoP entry is always a loop header that has more than one
predecessor edge and again we fail to generate code, as graphite expects
at the moment only a single entry and exit edge for the SCoPs.

My thoughts about this representation:

I understand, that a SCoP should be single entry/single exit. But I am
not sure, if the representation with edges is the right one.

As far as I can see, we have something like this.

The original code:

1 3  <- 3 = difficult bb
|/
2<- 2 = condition header   -
|\ |
4 5  <- 5 = loop header|
| |\   |   SCoP
| |/   |   2 -> (8)
| 6|
| ||
| 7-
|/
8<- 8 = difficult bb

So we have {2, 4, 5, 6, 7} in our SCoP.

The single entry point is "bb_2". This means in the original code, all
entry edges {1->2, 3->2} lead to "bb_2". The single exit point is
"bb_8", as in the original code all edges leaving the SCoP {4->8, 7->8}
lead to "bb_8".

While in graphite, we freely move around all bbs in the SCoP and
generate a new CFG part. 


20  <- new generated to regenerate control flow
|\
| 5
| |\
| |/
| 6
| |
| 7
|/
21 <- new generated to regenerate control flow
|\
| 4
|/
2

So how do we connect this? We need to know all bbs, that lead to the
first bb in the new CFG part. (Here bb_20) These are scop->entry->preds.
(Here bb_2->preds = {bb_1, bb_3}).

And we have to know, what follows the last bbs of the new CFG. The next
bb
will be scop->exit. (Here bb_8)

So I think a single edge is not sufficient to represent the scop entry.
We could use a VEC of entry edges (Here {1->2, 3->2}) or we use a VEC of
bbs leading to the SCoP (Here {1, 3}) or we keep the current solution
and use the first bb in the original code (Here bb_2) and it's list
of predecessors (Here bb_2->preds = {1->2, 3->2}).

The exit seems to be easy. We need just a basic block to jump to. So we
keep this bb in scop->exit. (Here bb_8)

Is someone working on this one? Otherwise I will think about these.

See you
Tobias




[graphite] Get graphite backend working again

2008-08-18 Thread Tobias Grosser
Hi Sebastian, hi Jan, hi graphities,

while trying to work on graphite I saw, that the last changes made the
code generation fail for every case.

1. Cloog-PPL creates many unnecessary conditional statements.
=

Problem since: cloog-ppl change.

At the moment cloog ppl generates at least one unnecessary condition in 
every loop:

For example in (scop-0.c):

for (s_1=0;s_1<=T_4-1;s_1++) {
  if ((s_1 >= 0) && (s_1 <= T_4-1)) {
for (s_3=0;s_3<=199;s_3++) {
  if ((s_1 >= 0) && (s_1 <= T_4-1) && (s_3 >= 0) && (s_3 <= 199)) {
if ((s_1 >= 0) && (s_1 <= T_4-1) && (s_3 >= 0) && (s_3 <= 199)) {
  if ((s_1 >= 0) && (s_1 <= T_4-1) && (s_3 >= 0) && (s_3 <= 199)) {
S5(s_1,s_3) ;
  }
}
  }
}
  }
}


As the current code generation does not support conditional statements,
we fail to generate code for them.

Who will work on that one?
Sebastian, do you think cloog ppl can be fixed easily? Or can we support
conditional statements (guards) in graphite codegen. For me it seems,
that most of the infrastructure is already available.

2. Gloog expects that only one edge leads to the SCoP.
==

Problem since: scop_limit_scops(), that fixed parameter detection.

As at the moment every SCoP is surrounded by a loop:
--
We limit all SCoPs to SCoPs, that are completely surrounded by a loop. 

   Example:

   for (i  |
 { |
   for (j  |  SCoP 1
   for (k  |
 } |

   * SCoP frontier, as this line is not surrounded by any loop. *

   for (l  |  SCoP 2

   This is necessary as scalar evolution and parameter detection need a
   outermost loop to initialize parameters correctly.  
--

So the SCoP entry is always a loop header that has more than one
predecessor edge and again we fail to generate code, as graphite expects
at the moment only a single entry and exit edge for the SCoPs.

My thoughts about this representation:

I understand, that a SCoP should be single entry/single exit. But I am
not sure, if the representation with edges is the right one.

As far as I can see, we have something like this.

The original code:

1 3  <- 3 = difficult bb
|/
2<- 2 = condition header   -
|\ |
4 5  <- 5 = loop header|
| |\   |   SCoP
| |/   |   2 -> (8)
| 6|
| ||
| 7-
|/
8<- 8 = difficult bb

So we have {2, 4, 5, 6, 7} in our SCoP.

The single entry point is "bb_2". This means in the original code, all
entry edges {1->2, 3->2} lead to "bb_2". The single exit point is
"bb_8", as in the original code all edges leaving the SCoP {4->8, 7->8}
lead to "bb_8".

While in graphite, we freely move around all bbs in the SCoP and
generate a new CFG part. 


20  <- new generated to regenerate control flow
|\
| 5
| |\
| |/
| 6
| |
| 7
|/
21 <- new generated to regenerate control flow
|\
| 4
|/
2

So how do we connect this? We need to know all bbs, that lead to the
first bb in the new CFG part. (Here bb_20) These are scop->entry->preds.
(Here bb_2->preds = {bb_1, bb_3}).

And we have to know, what follows the last bbs of the new CFG. The next
bb
will be scop->exit. (Here bb_8)

So I think a single edge is not sufficient to represent the scop entry.
We could use a VEC of entry edges (Here {1->2, 3->2}) or we use a VEC of
bbs leading to the SCoP (Here {1, 3}) or we keep the current solution
and use the first bb in the original code (Here bb_2) and it's list
of predecessors (Here bb_2->preds = {1->2, 3->2}).

The exit seems to be easy. We need just a basic block to jump to. So we
keep this bb in scop->exit. (Here bb_8)

Is someone working on this one? Otherwise I will think about these.

See you
Tobias




RE: [graphite] Get graphite backend working again

2008-08-18 Thread Sjodin, Jan
> As the current code generation does not support conditional 
> statements,
> we fail to generate code for them.
> 
> Who will work on that one?

I am working on this.

- Jan





Re: [PATCH]: GCC Scheduler support for R10000 on MIPS

2008-08-18 Thread Kumba

David Daney wrote:

Yes there is.  I'm not sure if Richard can cause them to be sent to you, 
but certainly requesting copyright assignment documents on 
gcc@gcc.gnu.org would work.  It can often take many weeks to get them 
processed, so starting as soon as possible would be a good idea.


I'll submit a request outside of this thread later on then.  Thanks for the 
info!

As far as processing time goes, will that impact getting this patch committed? 
My understanding is end of August for new features for gcc-4.4.



It would depend on if any of the original patch code remains.  If so, 
probably a copyright assignment for the original author would be 
required as well (at least that is my understanding).


That's an interesting thought then.  I largely left his code intact while I used 
this patch throughout gcc-3.2, 3.3, and 3.4 (I simply updated it to apply to the 
subsequent releases in the 3.x series).  I only converted it to DFA in gcc-4.0, 
which was largely a complete rewrite, using the original patch as a guide to 
learn how DFA changed things around.  Plus, the core information comes from the 
Vr1 manual anyways, available off www.necel.com.  Considering the very 
original patch was posted once to gcc-patches years ago for gcc-3.0 
consideration, I would be surprised if the e-mail address of that submission 
still goes to him.


Thoughts?

--
Joshua Kinard
Gentoo/MIPS
[EMAIL PROTECTED]

"The past tempts us, the present confuses us, the future frightens us.  And our 
lives slip away, moment by moment, lost in that vast, terrible in-between."


--Emperor Turhan, Centauri Republic


RE: [graphite] Get graphite backend working again [scalar variable handling]

2008-08-18 Thread Tobias Grosser
Hi Jan,

On Mon, 2008-08-18 at 17:37 -0500, Sjodin, Jan wrote: 
> > As the current code generation does not support conditional 
> > statements,
> > we fail to generate code for them.
> > 
> > Who will work on that one?
> 
> I am working on this.

Great! 

I would like to improve the way how we handle scalar variables and ivs
during graphite transformation.
I am not sure, if I got it right and where to put my code in the backend.
So it would be great, if you could read over my ideas.

The problem:

In graphite we represent the complete loop structure using polyhedrons to
apply our transformations.
To go back to gimple we have to generate new loop structures and delete
the old loop structures.
One step is to remove old ivs and generate new ones. 


Example:

1. original bb:

D.1918_7 = a[i_22];
D.1919_8 = D.1918_7 + 1;
a[j_21] = D.1919_8;
j_9 = j_21 + 1;



2. Add new ivs

graphiteIV_1 = PHI (0, graphiteIV_2)
   
D.1918_7 = a[i_22];
D.1919_8 = D.1918_7 + 1;
a[j_21] = D.1919_8;
j_9 = j_21 + 1;

graphiteIV_2 = graphiteIV_1 + 1


3. Move ivs usages to the ivs.

graphiteIV_1 = PHI (0, graphiteIV_2)
 
D.1918_7 = a[i_22];
D.1919_8 = D.1918_7 + 1;
a[graphiteIV_1] = D.1919_8;
j_9 = j_21 + 1;

graphiteIV_2 = graphiteIV_1 + 1


4. remove ivs (Here {j_9, j_21})


D.1918_7 = a[i_22];157235101 disconnected
D.1919_8 = D.1918_7 + 1;
a[graphiteIV_1] = D.1919_8;


The problem seems to become more complicate, if there exist two or more
ivs.

How I would like to solve it:
=

During gimple->graphite transformation we ignore all scalar variables,
that we can represent using scev. (These contain also all the ivs)

Here {D.1918_7, D.1919_8} are not representable, as they depend on the
value of a[i_22]. Therefore we can not ignore them.
But {j_9, j_21} will be representable and we ignore them.

During graphite we have all ivs virtually removed.

The next step is during code generation:
We remove all scalar variables, we ignored before, and recreate for the
places, where they where used, new scalar variables, that depend on the
new ivs.

Why remove all and not just the ivs?


There are two points.

1. Detection ivs is much work. So I prefer this general algorithm, as I
hope it is easier to implement and runs faster.

2. If we can ignore the scalar variables, we can also ignore their
dependencies. So we have more freedom to schedule the bbs.


See you
Tobi

P.S.: To start working on my code I would like to get block-0.c running.
Jan, can I help you with anything?