[RFC]Better support for big-endian targets in GCC vectorizer

2014-05-27 Thread Bin.Cheng
Hi,

To attract more eyes, I should have used a scarier subject like "GCC's
vectorizer is heading in the wrong direction on big-endian targets".

The idea came from a simple vectorization case I ran into and a
discussion with Richard.  Given simple case like:

typedef signed short *__restrict__ pRSINT16;
void check_vector_S16 (pRSINT16 a, pRSINT16 b);
int foo (void)
{
  int i;
  signed char input1_S8[16] =
  { 127, 126, 125, 124, 123, 122, 121, 120,
119, 118, 117, 116, 115, 114, 113, 112};
  signed char input2_S8[16] =
  { 127, 125, 123, 121, 119, 117, 115, 113,
111, 109, 107, 105, 103, 101, 99,  97};
  signed short output_S16[16];
  signed short expected_S16[] =
  { 16129, 15750, 15375, 15004, 14637, 14274, 13915, 13560,
13209, 12862, 12519, 12180, 11845, 11514, 11187, 10864 };

  for (i=0; i<16; i++)
output_S16[i] = (signed short)input1_S8[i] * input2_S8[i];

  check_vector_S16 (expected_S16, output_S16);
  return 0;
}


It is vectorized at optimization level "O3", and the GIMPLE optimized
dump for aarch64 big-endian is like:

;; Function foo (foo, funcdef_no=0, decl_uid=2564, symbol_order=0)

foo ()
{
  vector(8) short int vect_patt_77.9;
  vector(16) signed char vect__54.8;
  vector(16) signed char vect__52.5;
  short int expected_S16[16];
  short int output_S16[16];
  signed char input2_S8[16];
  signed char input1_S8[16];

  :
  MEM[(signed char[16] *)&input1_S8] = { 127, 126, 125, 124, 123, 122,
121, 120, 119, 118, 117, 116, 115, 114, 113, 112 };
  MEM[(signed char[16] *)&input2_S8] = { 127, 125, 123, 121, 119, 117,
115, 113, 111, 109, 107, 105, 103, 101, 99, 97 };
  MEM[(short int *)&expected_S16] = { 16129, 15750, 15375, 15004,
14637, 14274, 13915, 13560 };
  MEM[(short int *)&expected_S16 + 16B] = { 13209, 12862, 12519,
12180, 11845, 11514, 11187, 10864 };
  vect__52.5_2 = MEM[(signed char[16] *)&input1_S8];
  vect__54.8_73 = MEM[(signed char[16] *)&input2_S8];
  vect_patt_77.9_72 = WIDEN_MULT_HI_EXPR ;
  vect_patt_77.9_71 = WIDEN_MULT_LO_EXPR ;
  MEM[(short int *)&output_S16] = vect_patt_77.9_72;
  MEM[(short int *)&output_S16 + 16B] = vect_patt_77.9_71;
  check_vector_S16 (&expected_S16, &output_S16);
  input1_S8 ={v} {CLOBBER};
  input2_S8 ={v} {CLOBBER};
  output_S16 ={v} {CLOBBER};
  expected_S16 ={v} {CLOBBER};
  return 0;

}


while dump for aarch64 little-endian is like:

;; Function foo (foo, funcdef_no=0, decl_uid=2564, symbol_order=0)

foo ()
{
  vector(8) short int vect_patt_77.9;
  vector(16) signed char vect__54.8;
  vector(16) signed char vect__52.5;
  short int expected_S16[16];
  short int output_S16[16];
  signed char input2_S8[16];
  signed char input1_S8[16];

  :
  MEM[(signed char[16] *)&input1_S8] = { 127, 126, 125, 124, 123, 122,
121, 120, 119, 118, 117, 116, 115, 114, 113, 112 };
  MEM[(signed char[16] *)&input2_S8] = { 127, 125, 123, 121, 119, 117,
115, 113, 111, 109, 107, 105, 103, 101, 99, 97 };
  MEM[(short int *)&expected_S16] = { 16129, 15750, 15375, 15004,
14637, 14274, 13915, 13560 };
  MEM[(short int *)&expected_S16 + 16B] = { 13209, 12862, 12519,
12180, 11845, 11514, 11187, 10864 };
  vect__52.5_2 = MEM[(signed char[16] *)&input1_S8];
  vect__54.8_73 = MEM[(signed char[16] *)&input2_S8];
  vect_patt_77.9_72 = WIDEN_MULT_LO_EXPR ;
  vect_patt_77.9_71 = WIDEN_MULT_HI_EXPR ;
  MEM[(short int *)&output_S16] = vect_patt_77.9_72;
  MEM[(short int *)&output_S16 + 16B] = vect_patt_77.9_71;
  check_vector_S16 (&expected_S16, &output_S16);
  input1_S8 ={v} {CLOBBER};
  input2_S8 ={v} {CLOBBER};
  output_S16 ={v} {CLOBBER};
  expected_S16 ={v} {CLOBBER};
  return 0;

}


It's clear that WIDEN_MULT_LO_EXPR/WIDEN_MULT_HI_EXPR are switched
depending on big-endian/little-endian.  The corresponding code doing
the shuffle is like below in
tree-vect-stmts.c:supportable_widening_operation

  if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
{
  enum tree_code ctmp = c1;
  c1 = c2;
  c2 = ctmp;
}


There are some other similar cases in vectorizer and all of them look
suspicious since intuitively, vectorizer should neither care about
target endianess nor do such shuffle.  Anyway, this is how we do
vectorization currently.

Stick to the test case, below insns are expanded for the WIDEN_MULT
pair on big-endian:

;;WIDEN_MULT_HI_EXPR part
(insn:TI 39 50 16 (set (reg:V8HI 36 v4 [orig:106 vect_patt_77.9 ] [106])
(mult:V8HI (sign_extend:V8HI (vec_select:V8QI (reg:V16QI 33 v1 [82])
(parallel:V16QI [
(const_int 8 [0x8])
(const_int 9 [0x9])
(const_int 10 [0xa])
(const_int 11 [0xb])
(const_int 12 [0xc])
(const_int 13 [0xd])
(const_int 14 [0xe])
(const_int 15 [0xf])
])))
(sign_extend:V8HI (vec_select:V8QI (reg:V16QI 32 v0 [87])
  

Re: [RFC]Better support for big-endian targets in GCC vectorizer

2014-05-27 Thread Richard Biener
On Tue, May 27, 2014 at 10:24 AM, Bin.Cheng  wrote:
> Hi,
>
> To attract more eyes, I should have used a scarier subject like "GCC's
> vectorizer is heading in the wrong direction on big-endian targets".
>
> The idea came from a simple vectorization case I ran into and a
> discussion with Richard.  Given simple case like:
>
> typedef signed short *__restrict__ pRSINT16;
> void check_vector_S16 (pRSINT16 a, pRSINT16 b);
> int foo (void)
> {
>   int i;
>   signed char input1_S8[16] =
>   { 127, 126, 125, 124, 123, 122, 121, 120,
> 119, 118, 117, 116, 115, 114, 113, 112};
>   signed char input2_S8[16] =
>   { 127, 125, 123, 121, 119, 117, 115, 113,
> 111, 109, 107, 105, 103, 101, 99,  97};
>   signed short output_S16[16];
>   signed short expected_S16[] =
>   { 16129, 15750, 15375, 15004, 14637, 14274, 13915, 13560,
> 13209, 12862, 12519, 12180, 11845, 11514, 11187, 10864 };
>
>   for (i=0; i<16; i++)
> output_S16[i] = (signed short)input1_S8[i] * input2_S8[i];
>
>   check_vector_S16 (expected_S16, output_S16);
>   return 0;
> }
>
>
> It is vectorized at optimization level "O3", and the GIMPLE optimized
> dump for aarch64 big-endian is like:
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=2564, symbol_order=0)
>
> foo ()
> {
>   vector(8) short int vect_patt_77.9;
>   vector(16) signed char vect__54.8;
>   vector(16) signed char vect__52.5;
>   short int expected_S16[16];
>   short int output_S16[16];
>   signed char input2_S8[16];
>   signed char input1_S8[16];
>
>   :
>   MEM[(signed char[16] *)&input1_S8] = { 127, 126, 125, 124, 123, 122,
> 121, 120, 119, 118, 117, 116, 115, 114, 113, 112 };
>   MEM[(signed char[16] *)&input2_S8] = { 127, 125, 123, 121, 119, 117,
> 115, 113, 111, 109, 107, 105, 103, 101, 99, 97 };
>   MEM[(short int *)&expected_S16] = { 16129, 15750, 15375, 15004,
> 14637, 14274, 13915, 13560 };
>   MEM[(short int *)&expected_S16 + 16B] = { 13209, 12862, 12519,
> 12180, 11845, 11514, 11187, 10864 };
>   vect__52.5_2 = MEM[(signed char[16] *)&input1_S8];
>   vect__54.8_73 = MEM[(signed char[16] *)&input2_S8];
>   vect_patt_77.9_72 = WIDEN_MULT_HI_EXPR ;
>   vect_patt_77.9_71 = WIDEN_MULT_LO_EXPR ;
>   MEM[(short int *)&output_S16] = vect_patt_77.9_72;
>   MEM[(short int *)&output_S16 + 16B] = vect_patt_77.9_71;
>   check_vector_S16 (&expected_S16, &output_S16);
>   input1_S8 ={v} {CLOBBER};
>   input2_S8 ={v} {CLOBBER};
>   output_S16 ={v} {CLOBBER};
>   expected_S16 ={v} {CLOBBER};
>   return 0;
>
> }
>
>
> while dump for aarch64 little-endian is like:
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=2564, symbol_order=0)
>
> foo ()
> {
>   vector(8) short int vect_patt_77.9;
>   vector(16) signed char vect__54.8;
>   vector(16) signed char vect__52.5;
>   short int expected_S16[16];
>   short int output_S16[16];
>   signed char input2_S8[16];
>   signed char input1_S8[16];
>
>   :
>   MEM[(signed char[16] *)&input1_S8] = { 127, 126, 125, 124, 123, 122,
> 121, 120, 119, 118, 117, 116, 115, 114, 113, 112 };
>   MEM[(signed char[16] *)&input2_S8] = { 127, 125, 123, 121, 119, 117,
> 115, 113, 111, 109, 107, 105, 103, 101, 99, 97 };
>   MEM[(short int *)&expected_S16] = { 16129, 15750, 15375, 15004,
> 14637, 14274, 13915, 13560 };
>   MEM[(short int *)&expected_S16 + 16B] = { 13209, 12862, 12519,
> 12180, 11845, 11514, 11187, 10864 };
>   vect__52.5_2 = MEM[(signed char[16] *)&input1_S8];
>   vect__54.8_73 = MEM[(signed char[16] *)&input2_S8];
>   vect_patt_77.9_72 = WIDEN_MULT_LO_EXPR ;
>   vect_patt_77.9_71 = WIDEN_MULT_HI_EXPR ;
>   MEM[(short int *)&output_S16] = vect_patt_77.9_72;
>   MEM[(short int *)&output_S16 + 16B] = vect_patt_77.9_71;
>   check_vector_S16 (&expected_S16, &output_S16);
>   input1_S8 ={v} {CLOBBER};
>   input2_S8 ={v} {CLOBBER};
>   output_S16 ={v} {CLOBBER};
>   expected_S16 ={v} {CLOBBER};
>   return 0;
>
> }
>
>
> It's clear that WIDEN_MULT_LO_EXPR/WIDEN_MULT_HI_EXPR are switched
> depending on big-endian/little-endian.  The corresponding code doing
> the shuffle is like below in
> tree-vect-stmts.c:supportable_widening_operation
>
>   if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
> {
>   enum tree_code ctmp = c1;
>   c1 = c2;
>   c2 = ctmp;
> }
>
>
> There are some other similar cases in vectorizer and all of them look
> suspicious since intuitively, vectorizer should neither care about
> target endianess nor do such shuffle.  Anyway, this is how we do
> vectorization currently.
>
> Stick to the test case, below insns are expanded for the WIDEN_MULT
> pair on big-endian:
>
> ;;WIDEN_MULT_HI_EXPR part
> (insn:TI 39 50 16 (set (reg:V8HI 36 v4 [orig:106 vect_patt_77.9 ] [106])
> (mult:V8HI (sign_extend:V8HI (vec_select:V8QI (reg:V16QI 33 v1 [82])
> (parallel:V16QI [
> (const_int 8 [0x8])
> (const_int 9 [0x9])
> (const_int 10 [0xa])
> (const_int 11 [0xb])
>

GCC 4.7.4 Status Report (2014-05-27)

2014-05-27 Thread Richard Biener

Status
==

As GCC 4.9.0 is out it is time to retire the GCC 4.7 branch.  Following
the established process we will do a last release from the branch and
close it afterwards.  The plan is to do a release candidate for GCC 4.7.4
early next week.

The GCC 4.7 branch remains in regression and documentation fixes only mode
until I announce GCC 4.7.4 RC1 at which point the branch will be frozen.


Quality Data


Priority  #   Change from last report
---   ---
P11+   1
P2   88+   2
P3   44+  36 
---   ---
Total   133+  39 


Previous Report
===

https://gcc.gnu.org/ml/gcc/2013-04/msg00121.html


Re: How can I generate a new function at compile time?

2014-05-27 Thread Benedikt Huber
(Sorry for the duplicate.)

I managed to pass the needed parameters to the generated function.
However I cannot pin down the reason why the compilation fails.
It seems that the cfg is somehow broken, but I cannot tell how.
Do you have any debugging hints?

As far as I can tell, the cfg is changed during the generation of the preheader
(I do this to find the entry block easily.)
and in the function move_sese_region_to_fn.

I noticed that after pass 058t.copyrename2 the original function bar disappears
and the new function is replaced by _GLOBAL__N_bar.constprop, could this have
anything to do with the problem?

The pass runs just after the construction of cfg,  outline.c.011t.cfg.

/home/bhuber/sandbox/install/bin/gcc -O3 -I /home/bhuber/sandbox/src -c 
-fdump-tree-all-details -fdump-ipa-all-details -fdump-rtl-all-details 
-funinline-innermost-loops -Wall -Wextra /home/bhuber/sandbox/try/outline.c
/home/bhuber/sandbox/try/outline.c: In function '_GLOBAL__N_bar.constprop':
/home/bhuber/sandbox/try/outline.c:3:1: internal compiler error: in 
purge_dead_edges, at cfgrtl.c:3183
bar (int s, int r, unsigned * t, int * k, int * p, int * l)
^
0x67e7c4 purge_dead_edges(basic_block_def*) 
   ../../src/gcc/cfgrtl.c:3183 
0xe5a0d6 find_bb_boundaries
   ../../src/gcc/cfgbuild.c:522
0xe5a0d6 find_many_sub_basic_blocks(simple_bitmap_def*)
   ../../src/gcc/cfgbuild.c:604
0x66c0f5 execute
   ../../src/gcc/cfgexpand.c:5873
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

I attach the transformation pass and the small example program.

Thank you again for the help,
Benedikt

P.s. I am aware that this transformation is not safe in general,
however in this case it should work.


outline.c
Description: Binary data


uninline-innermost-loops.c
Description: Binary data


Re: How can I generate a new function at compile time?

2014-05-27 Thread Richard Biener
On Tue, May 27, 2014 at 5:03 PM, Benedikt Huber
 wrote:
> (Sorry for the duplicate.)
>
> I managed to pass the needed parameters to the generated function.
> However I cannot pin down the reason why the compilation fails.
> It seems that the cfg is somehow broken, but I cannot tell how.
> Do you have any debugging hints?
>
> As far as I can tell, the cfg is changed during the generation of the 
> preheader
> (I do this to find the entry block easily.)
> and in the function move_sese_region_to_fn.
>
> I noticed that after pass 058t.copyrename2 the original function bar 
> disappears
> and the new function is replaced by _GLOBAL__N_bar.constprop, could this have
> anything to do with the problem?

Unlikely.  You can disable that by using -fno-ipa-cp.

> The pass runs just after the construction of cfg,  outline.c.011t.cfg.
>
> /home/bhuber/sandbox/install/bin/gcc -O3 -I /home/bhuber/sandbox/src -c 
> -fdump-tree-all-details -fdump-ipa-all-details -fdump-rtl-all-details 
> -funinline-innermost-loops -Wall -Wextra /home/bhuber/sandbox/try/outline.c
> /home/bhuber/sandbox/try/outline.c: In function '_GLOBAL__N_bar.constprop':
> /home/bhuber/sandbox/try/outline.c:3:1: internal compiler error: in 
> purge_dead_edges, at cfgrtl.c:3183

the line doesn't match anything that would ICE on current trunk, but I suppose
it's the single_succ_p assert that triggers?

Either you really got until RTL generation or somehow cfgrtl cfg hooks are
still active while you are working in your pass.

Richard.

> bar (int s, int r, unsigned * t, int * k, int * p, int * l)
> ^
> 0x67e7c4 purge_dead_edges(basic_block_def*)
>../../src/gcc/cfgrtl.c:3183
> 0xe5a0d6 find_bb_boundaries
>../../src/gcc/cfgbuild.c:522
> 0xe5a0d6 find_many_sub_basic_blocks(simple_bitmap_def*)
>../../src/gcc/cfgbuild.c:604
> 0x66c0f5 execute
>../../src/gcc/cfgexpand.c:5873
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
> I attach the transformation pass and the small example program.
>
> Thank you again for the help,
> Benedikt
>
> P.s. I am aware that this transformation is not safe in general,
> however in this case it should work.


Re: How can I generate a new function at compile time?

2014-05-27 Thread Benedikt Huber

On 27 May 2014, at 17:09, Richard Biener  wrote:

> On Tue, May 27, 2014 at 5:03 PM, Benedikt Huber
>  wrote:
>> (Sorry for the duplicate.)
>> 
>> I managed to pass the needed parameters to the generated function.
>> However I cannot pin down the reason why the compilation fails.
>> It seems that the cfg is somehow broken, but I cannot tell how.
>> Do you have any debugging hints?
>> 
>> As far as I can tell, the cfg is changed during the generation of the 
>> preheader
>> (I do this to find the entry block easily.)
>> and in the function move_sese_region_to_fn.
>> 
>> I noticed that after pass 058t.copyrename2 the original function bar 
>> disappears
>> and the new function is replaced by _GLOBAL__N_bar.constprop, could this have
>> anything to do with the problem?
> 
> Unlikely.  You can disable that by using -fno-ipa-cp.
> 
>> The pass runs just after the construction of cfg,  outline.c.011t.cfg.
>> 
>> /home/bhuber/sandbox/install/bin/gcc -O3 -I /home/bhuber/sandbox/src -c 
>> -fdump-tree-all-details -fdump-ipa-all-details -fdump-rtl-all-details 
>> -funinline-innermost-loops -Wall -Wextra /home/bhuber/sandbox/try/outline.c
>> /home/bhuber/sandbox/try/outline.c: In function '_GLOBAL__N_bar.constprop':
>> /home/bhuber/sandbox/try/outline.c:3:1: internal compiler error: in 
>> purge_dead_edges, at cfgrtl.c:3183
> 
> the line doesn't match anything that would ICE on current trunk, but I suppose
> it's the single_succ_p assert that triggers?
Yes, that is right, it is

gcc_assert (single_succ_p (bb));

> 
> Either you really got until RTL generation or somehow cfgrtl cfg hooks are
> still active while you are working in your pass.

The pass that fails, according to the dump files is outline.c.174r.expand
So it already tries to generate RTL.
My problem is that there are so many passes in
between, that I do not know where to start looking.
Any idea?

> 
> Richard.
> 
>> bar (int s, int r, unsigned * t, int * k, int * p, int * l)
>> ^
>> 0x67e7c4 purge_dead_edges(basic_block_def*)
>>   ../../src/gcc/cfgrtl.c:3183
>> 0xe5a0d6 find_bb_boundaries
>>   ../../src/gcc/cfgbuild.c:522
>> 0xe5a0d6 find_many_sub_basic_blocks(simple_bitmap_def*)
>>   ../../src/gcc/cfgbuild.c:604
>> 0x66c0f5 execute
>>   ../../src/gcc/cfgexpand.c:5873
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> Please include the complete backtrace with any bug report.
>> See  for instructions.
>> 
>> I attach the transformation pass and the small example program.
>> 
>> Thank you again for the help,
>> Benedikt
>> 
>> P.s. I am aware that this transformation is not safe in general,
>>however in this case it should work.



Re: How can I generate a new function at compile time?

2014-05-27 Thread Richard Biener
On Tue, May 27, 2014 at 5:17 PM, Benedikt Huber
 wrote:
>
> On 27 May 2014, at 17:09, Richard Biener  wrote:
>
>> On Tue, May 27, 2014 at 5:03 PM, Benedikt Huber
>>  wrote:
>>> (Sorry for the duplicate.)
>>>
>>> I managed to pass the needed parameters to the generated function.
>>> However I cannot pin down the reason why the compilation fails.
>>> It seems that the cfg is somehow broken, but I cannot tell how.
>>> Do you have any debugging hints?
>>>
>>> As far as I can tell, the cfg is changed during the generation of the 
>>> preheader
>>> (I do this to find the entry block easily.)
>>> and in the function move_sese_region_to_fn.
>>>
>>> I noticed that after pass 058t.copyrename2 the original function bar 
>>> disappears
>>> and the new function is replaced by _GLOBAL__N_bar.constprop, could this 
>>> have
>>> anything to do with the problem?
>>
>> Unlikely.  You can disable that by using -fno-ipa-cp.
>>
>>> The pass runs just after the construction of cfg,  outline.c.011t.cfg.
>>>
>>> /home/bhuber/sandbox/install/bin/gcc -O3 -I /home/bhuber/sandbox/src -c 
>>> -fdump-tree-all-details -fdump-ipa-all-details -fdump-rtl-all-details 
>>> -funinline-innermost-loops -Wall -Wextra /home/bhuber/sandbox/try/outline.c
>>> /home/bhuber/sandbox/try/outline.c: In function '_GLOBAL__N_bar.constprop':
>>> /home/bhuber/sandbox/try/outline.c:3:1: internal compiler error: in 
>>> purge_dead_edges, at cfgrtl.c:3183
>>
>> the line doesn't match anything that would ICE on current trunk, but I 
>> suppose
>> it's the single_succ_p assert that triggers?
> Yes, that is right, it is
>
> gcc_assert (single_succ_p (bb));
>
>>
>> Either you really got until RTL generation or somehow cfgrtl cfg hooks are
>> still active while you are working in your pass.
>
> The pass that fails, according to the dump files is outline.c.174r.expand
> So it already tries to generate RTL.
> My problem is that there are so many passes in
> between, that I do not know where to start looking.
> Any idea?

What code-base are you developing on?  Do you build with checking
enabled (--enable-checking, the default on trunk but not on release branches).

Richard.

>>
>> Richard.
>>
>>> bar (int s, int r, unsigned * t, int * k, int * p, int * l)
>>> ^
>>> 0x67e7c4 purge_dead_edges(basic_block_def*)
>>>   ../../src/gcc/cfgrtl.c:3183
>>> 0xe5a0d6 find_bb_boundaries
>>>   ../../src/gcc/cfgbuild.c:522
>>> 0xe5a0d6 find_many_sub_basic_blocks(simple_bitmap_def*)
>>>   ../../src/gcc/cfgbuild.c:604
>>> 0x66c0f5 execute
>>>   ../../src/gcc/cfgexpand.c:5873
>>> Please submit a full bug report,
>>> with preprocessed source if appropriate.
>>> Please include the complete backtrace with any bug report.
>>> See  for instructions.
>>>
>>> I attach the transformation pass and the small example program.
>>>
>>> Thank you again for the help,
>>> Benedikt
>>>
>>> P.s. I am aware that this transformation is not safe in general,
>>>however in this case it should work.
>


Re: How can I generate a new function at compile time?

2014-05-27 Thread Benedikt Huber

On 27 May 2014, at 17:25, Richard Biener  wrote:

> On Tue, May 27, 2014 at 5:17 PM, Benedikt Huber
>  wrote:
>> 
>> On 27 May 2014, at 17:09, Richard Biener  wrote:
>> 
>>> On Tue, May 27, 2014 at 5:03 PM, Benedikt Huber
>>>  wrote:
 (Sorry for the duplicate.)
 
 I managed to pass the needed parameters to the generated function.
 However I cannot pin down the reason why the compilation fails.
 It seems that the cfg is somehow broken, but I cannot tell how.
 Do you have any debugging hints?
 
 As far as I can tell, the cfg is changed during the generation of the 
 preheader
 (I do this to find the entry block easily.)
 and in the function move_sese_region_to_fn.
 
 I noticed that after pass 058t.copyrename2 the original function bar 
 disappears
 and the new function is replaced by _GLOBAL__N_bar.constprop, could this 
 have
 anything to do with the problem?
>>> 
>>> Unlikely.  You can disable that by using -fno-ipa-cp.
>>> 
 The pass runs just after the construction of cfg,  outline.c.011t.cfg.
 
 /home/bhuber/sandbox/install/bin/gcc -O3 -I /home/bhuber/sandbox/src -c 
 -fdump-tree-all-details -fdump-ipa-all-details -fdump-rtl-all-details 
 -funinline-innermost-loops -Wall -Wextra /home/bhuber/sandbox/try/outline.c
 /home/bhuber/sandbox/try/outline.c: In function '_GLOBAL__N_bar.constprop':
 /home/bhuber/sandbox/try/outline.c:3:1: internal compiler error: in 
 purge_dead_edges, at cfgrtl.c:3183
>>> 
>>> the line doesn't match anything that would ICE on current trunk, but I 
>>> suppose
>>> it's the single_succ_p assert that triggers?
>> Yes, that is right, it is
>> 
>> gcc_assert (single_succ_p (bb));
>> 
>>> 
>>> Either you really got until RTL generation or somehow cfgrtl cfg hooks are
>>> still active while you are working in your pass.
>> 
>> The pass that fails, according to the dump files is outline.c.174r.expand
>> So it already tries to generate RTL.
>> My problem is that there are so many passes in
>> between, that I do not know where to start looking.
>> Any idea?
> 
> What code-base are you developing on?  Do you build with checking
> enabled (--enable-checking, the default on trunk but not on release branches).

It is a linaro branch, but I am going to port the pass to the fsf trunk and see
whether the behaviour changes.

Best regards,
Benedikt

> 
> Richard.
> 
>>> 
>>> Richard.
>>> 
 bar (int s, int r, unsigned * t, int * k, int * p, int * l)
 ^
 0x67e7c4 purge_dead_edges(basic_block_def*)
  ../../src/gcc/cfgrtl.c:3183
 0xe5a0d6 find_bb_boundaries
  ../../src/gcc/cfgbuild.c:522
 0xe5a0d6 find_many_sub_basic_blocks(simple_bitmap_def*)
  ../../src/gcc/cfgbuild.c:604
 0x66c0f5 execute
  ../../src/gcc/cfgexpand.c:5873
 Please submit a full bug report,
 with preprocessed source if appropriate.
 Please include the complete backtrace with any bug report.
 See  for instructions.
 
 I attach the transformation pass and the small example program.
 
 Thank you again for the help,
 Benedikt
 
 P.s. I am aware that this transformation is not safe in general,
   however in this case it should work.



Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)

2014-05-27 Thread Mike Stump
Ping?

Or, I can ask, any objections?  In https://gcc.gnu.org/PR61146 it is stated 
that GMP removed the casts in 2005.

On May 26, 2014, at 4:26 AM, FX  wrote:
>> So changing just 2 of them doesn't feel right to me…
> 
> [Again, with the patch actually attached… sorry]
> 
> Here’s a patch that removes all the casts on output operands in x86/x86_64 
> code in longlong.h. Again bootstrapped on x86_64-apple-darwin13, passing both 
> stage1 (system compiler) and stages 2-3 (gcc). OK to commit?
> 
> Other archs which have such code are arc, arm, hppa, m32r, mc68000, mc68020, 
> ns32000, ibm032, sparc, and vax.
> Since I don’t have any of those to test on, I can’t test it there. If you or 
> another global reviewer indicate that a patch extending the work attached to 
> these other archs is suitable, I’m willing to do the tedious work of 
> proposing a full patch, but I won’t be able to test it (and I didn’t want to 
> do it if it had no chance of being accepted).
> 
> Thanks,
> FX



longlong.diff
Description: Binary data


longlong.ChangeLog
Description: Binary data


Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)

2014-05-27 Thread FX
> Or, I can ask, any objections?  In https://gcc.gnu.org/PR61146 it is stated 
> that GMP removed the casts in 2005.

Among the many many versions of longlong.h that one can find around the web, 
many have don’t have these casts, including GMP and coreutils 
(http://code.metager.de/source/xref/gnu/coreutils/src/longlong.h).

FX

Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)

2014-05-27 Thread Jakub Jelinek
On Mon, May 26, 2014 at 08:36:31AM -0700, Mike Stump wrote:
> On May 26, 2014, at 2:22 AM, FX  wrote:
> >> This causes GCC bootstrap to fail on Darwin systems (whose system compiler 
> >> is clang-based). Since PR 61146 was resolved as INVALID (but I’m not sure 
> >> it’s the right call, see below), I’ve filed a separate report for the 
> >> bootstrap issue (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61315).
> > 
> > Since my PR has been closed twice by Andrew Pinski (“it’s clang’s fault, 
> > bouh ouh”), I’d ask the maintainers to step in. Can we please provide a GCC 
> > that works for the default darwin setup? Or at least drop darwin as 
> > secondary target and document the failure?
> 
> The best coarse of action, post a patch, have it reviewed and put in. 
> Current action, a patch has been posted, the review is outstanding, I’d
> like to see it put in; though, I am curious why the casts were there in
> the first place.

Note, haven't added them there, but from what I can test, the casts there
can serve as a compile time check that the right type is used, e.g.
unsigned long i;

void
foo (void)
{
  asm volatile ("# %0 %1" : "=r" ((unsigned long long) i) : "0" ((unsigned long 
long) 0));
}

errors out on x86_64 -m32, but compiles fine with -m64, because in the
latter case the type has the correct size, while in the former case it
doesn't.  So, perhaps instead of removing the casts we should replace them
with some kind of static assertions (whether
extern char foobar[sizeof (arg) == sizeof (UDItype) && __builtin_classify_type 
(arg) == 1 ? 1 : -1];
or __builtin_types_compatible_p, or C++ templates for C++, ...

Jakub