how can I write a right V32QI Unpack Low Data insn pattern?

2011-02-27 Thread Liu
hi all
I write a v16hi mode Unpack Low Data insn pattern and it is OK. v8si
and v4di modes are OK, too.
But the v32qi mode Unpack Low Data insn pattern get error like:
../../gcc-4.5.1/gcc/config/mips/hr.md:509: error: expected identifier
or ‘(’ before ‘goto’
../../gcc-4.5.1/gcc/config/mips/hr.md:511: error: expected ‘=’, ‘,’,
‘;’, ‘asm’ or ‘__attribute__’ before ‘:’ token
anyone will tell me what's wrong with my code?

;; V16HI Unpack Low Data
(define_insn "vec_punpcklhw"
  [(set (match_operand:V16HI 0 "register_operand" "=Z")
(vec_select:V16HI
  (vec_concat:V32HI
(match_operand:V16HI 1 "register_operand" "Z")
(match_operand:V16HI 2 "register_operand" "Z"))
  (parallel [(const_int 0) (const_int 16)
 (const_int 1) (const_int 17)
 (const_int 2) (const_int 18)
 (const_int 3) (const_int 19)
 (const_int 4) (const_int 20)
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
  "TARGET_VECTORS"
  "vpunpcklhw\t%0,%1,%2"
  [(set_attr "type" "fadd")])

;; V32QI Unpack Low Data
(define_insn "vec_punpcklbh"
  [(set (match_operand:V32QI 0 "register_operand" "=Z")
(vec_select:V32QI
  (vec_concat:V64QI
(match_operand:V32QI 1 "register_operand" "Z")
(match_operand:V32QI 2 "register_operand" "Z"))
  (parallel [(const_int 0) (const_int 32)
 (const_int 1) (const_int 33)
 (const_int 2) (const_int 34)
 (const_int 3) (const_int 35)
 (const_int 4) (const_int 36)
 (const_int 5) (const_int 37)
 (const_int 6) (const_int 38)
 (const_int 7) (const_int 39)
 (const_int 8) (const_int 40)
 (const_int 9) (const_int 41)
 (const_int 10) (const_int 42)
 (const_int 11) (const_int 43)
 (const_int 12) (const_int 44)
 (const_int 13) (const_int 45)
 (const_int 14) (const_int 46)
 (const_int 15) (const_int 47)])))]
  "TARGET_VECTORS"
  "vpunpcklbh\t%0,%1,%2"
  [(set_attr "type" "fadd")])


Re: how can I write a right V32QI Unpack Low Data insn pattern?

2011-03-01 Thread Liu
On Tue, Mar 1, 2011 at 7:29 AM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>> I write a v16hi mode Unpack Low Data insn pattern and it is OK. v8si
>> and v4di modes are OK, too.
>> But the v32qi mode Unpack Low Data insn pattern get error like:
>> ../../gcc-4.5.1/gcc/config/mips/hr.md:509: error: expected identifier
>> or '(' before 'goto'
>> ../../gcc-4.5.1/gcc/config/mips/hr.md:511: error: expected '=', ',',
>> ';', 'asm' or '__attribute__' before ':' token
>> anyone will tell me what's wrong with my code?
>
> Looks like something in a .h file has #define'd something that your .md
> file is using.  I can't tell what it is from the code fragment here.  If
> it's not obvious, you are going to have to look at the generated file,
> and possibly even the preprocessed version of the generated file.
>
> Ian
>

hi Ian, thank you for reply.

I didn't find something in a.h has #define used by my .md file.
but I find error at insn-recog.c.
When I delete all "#line xxx" in insn-recog.c, compile it as:
gcc -c  -g -O0 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -W -Wall
-Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes
-Wmissing-format-attribute -pedantic -Wno-long-long
-Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition
-Wc++-compat   -DHAVE_CONFIG_H -I. -I. -I../../gcc-4.5.1/gcc
-I../../gcc-4.5.1/gcc/. -I../../gcc-4.5.1/gcc/../include
-I../../gcc-4.5.1/gcc/../libcpp/include -I/cross-tools/include
-I/cross-tools/include  -I../../gcc-4.5.1/gcc/../libdecnumber
-I../../gcc-4.5.1/gcc/../libdecnumber/dpd -I../libdecnumber
-I/cross-tools/include  -I/cross-tools/include -DCLOOG_PPL_BACKEND
insn-recog.c -o insn-recog.o

I get a lot of errors like this:
insn-recog.c: In function 'recog_21':
insn-recog.c:24754: error: expected expression before '{' token
insn-recog.c:24760: error: expected expression before '|' token
insn-recog.c:24766: error: expected expression before '}' token
insn-recog.c:24766: error: expected ')' before '}' token
insn-recog.c:24766: error: expected ')' before '}' token
insn-recog.c:24766: error: expected ';' before '}' token
insn-recog.c:24108: error: label 'ret0' used but not defined
insn-recog.c:24104: error: label 'L9397' used but not defined
insn-recog.c: At top level:
insn-recog.c:24766: error: expected identifier or '(' before ']' token
insn-recog.c:24767: error: expected identifier or '(' before 'if'
insn-recog.c:24769: error: expected identifier or '(' before 'goto'
insn-recog.c:24771: error: expected '=', ',', ';', 'asm' or
'__attribute__' before ':' token
insn-recog.c:24773: error: expected identifier or '(' before 'if'
insn-recog.c:24775: error: expected identifier or '(' before 'goto'
insn-recog.c:24777: error: expected '=', ',', ';', 'asm' or
'__attribute__' before ':' token
insn-recog.c:24778: error: stray '\177' in program
insn-recog.c:24779: error: expected identifier or '(' before 'if'
insn-recog.c:24781: error: expected identifier or '(' before 'goto'
insn-recog.c:24783: error: expected '=', ',', ';', 'asm' or
'__attribute__' before ':' token
insn-recog.c:24784: error: stray '\200' in program
insn-recog.c:24785: error: expected identifier or '(' before 'if'
insn-recog.c:24791: error: expected identifier or '(' before 'goto'
insn-recog.c:24793: error: expected '=', ',', ';', 'asm' or
'__attribute__' before ':' token
insn-recog.c:24795: error: expected identifier or '(' before 'if'
insn-recog.c:24797: error: expected identifier or '(' before 'goto'
insn-recog.c:24799: error: expected '=', ',', ';', 'asm' or
'__attribute__' before ':' token
insn-recog.c:24809: error: expected identifier or '(' before 'goto'
insn-recog.c:24811: error: expected '=', ',', ';', 'asm' or
'__attribute__' before ':' token
insn-recog.c:24813: error: expected identifier or '(' before 'if'
insn-recog.c:24818: error: expected identifier or '(' before 'goto'
insn-recog.c:24820: error: expected '=', ',', ';', 'asm' or
'__attribute__' before ':' token
insn-recog.c:24822: error

Re: how can I write a right V32QI Unpack Low Data insn pattern?

2011-03-02 Thread Liu
On Wed, Mar 2, 2011 at 11:14 PM, Ian Lance Taylor  wrote:
> Dave Korn  writes:
>
>> On 02/03/2011 07:56, Liu wrote:
>>
>>> The wrong code is :
>>>  L9284: ATTRIBUTE_UNUSED_LABEL
>>>   x3 = XEXP (x2, {);
>>>   if (x3 == const_int_rtx[MAX_SAVED_CONST_INT + (13)])
>>>     goto L9285;
>>>   goto ret0;
>>
>>   Well, that's coming from here:
>>
>>       else
>>       printf ("%sx%d = XEXP (x%d, %c);\n",
>>               indent, depth + 1, depth, newpos[depth]);
>>       ++depth;
>
> Interesting.  Looks you have a define_insn which has too many entries.
> It can only have 26 elements, but, annoyingly, genrecog doesn't check
> for that.
>
> It's a bit odd to have more than 26 elements.  Do you have any
> incredibly large define_insn patterns?
>
Yes, I have some 80 lines define_insn patterns, are they incredibly large?

>> in the MATCH_OPERAND case of the big switch in add_to_sequence().  Possibly
>> change_state needs to do something like
>>
>>       printf ("%sx%d = XEXP (x%d, %d);\n",
>>               indent, depth + 1, depth,
>>               newpos[depth] > 'a'
>>                 ? newpos[depth] - 'a'
>>                 : newpos[depth] - '0');
>
> No, change_state uses upper case and lower case letters to mean
> different things.  In this case it's meant to be a lower case letter but
> it has overrun.
>
> This patch should at least cause genrecog to crash for you rather than
> generating bogus output.  I've verified that this patch bootstraps on
> x86_64 and makes no difference in the generated insn-recog.c.  Can you
> see whether this gives you a crash?  Any opinion on whether I should
> commit this to mainline?
>
> Ian
>
>
I try your patch, but it get the same error still.



Dave Korn, thank you all the same.


Re: how can I write a right V32QI Unpack Low Data insn pattern?

2011-03-02 Thread Liu
On Thu, Mar 3, 2011 at 11:09 AM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>>> It's a bit odd to have more than 26 elements.  Do you have any
>>> incredibly large define_insn patterns?
>>>
>> Yes, I have some 80 lines define_insn patterns, are they incredibly large?
>
> An 80 line pattern is incredibly large, yes.  The size of the overall
> define_insn doesn't matter, just the size of the pattern.
>
>> I try your patch, but it get the same error still.
>
> Bother.  Are you sure that genrecog ran again?  Can you send us an
> example of a very large define_insn pattern?
>
> Ian
>

I'm not sure about "the size of the pattern", I think it is the
"const_int" numbers.

This is a HADD insn-pattern :
(define_insn "xx_vphaddv16hi"
  [(set (match_operand:V16HI 0 "register_operand" "=Z")
  (vec_concat:V16HI
(vec_concat:V8HI
  (vec_concat:V4HI
(vec_concat:V2HI
  (plus:HI
(vec_select:HI
  (match_operand:V16HI 1 "register_operand" "Z")
  (parallel [(const_int 0)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 1)])))
  (plus:HI
(vec_select:HI (match_dup 1) (parallel [(const_int 2)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 3)]
(vec_concat:V2HI
  (plus:HI
(vec_select:HI (match_dup 1) (parallel [(const_int 4)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 5)])))
  (plus:HI
(vec_select:HI (match_dup 1) (parallel [(const_int 6)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 7)])
  (vec_concat:V4HI
(vec_concat:V2HI
  (plus:HI
(vec_select:HI (match_dup 1) (parallel [(const_int 8)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 9)])))
  (plus:HI
(vec_select:HI (match_dup 1) (parallel [(const_int 10)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 11)]
(vec_concat:V2HI
  (plus:HI
(vec_select:HI (match_dup 1) (parallel [(const_int 12)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 13)])))
  (plus:HI
(vec_select:HI (match_dup 1) (parallel [(const_int 14)]))
(vec_select:HI (match_dup 1) (parallel [(const_int 15)]))
(vec_concat:V8HI
  (vec_concat:V4HI
(vec_concat:V2HI
  (plus:HI
(vec_select:HI
  (match_operand:V16HI 2 "register_operand" "Z")
  (parallel [(const_int 0)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 1)])))
  (plus:HI
(vec_select:HI (match_dup 2) (parallel [(const_int 2)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 3)]
(vec_concat:V2HI
  (plus:HI
(vec_select:HI (match_dup 2) (parallel [(const_int 4)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 5)])))
  (plus:HI
(vec_select:HI (match_dup 2) (parallel [(const_int 6)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 7)])
  (vec_concat:V4HI
(vec_concat:V2HI
  (plus:HI
(vec_select:HI (match_dup 2) (parallel [(const_int 8)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 9)])))
  (plus:HI
(vec_select:HI (match_dup 2) (parallel [(const_int 10)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 11)]
(vec_concat:V2HI
  (plus:HI
(vec_select:HI (match_dup 2) (parallel [(const_int 12)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 13)])))
  (plus:HI
(vec_select:HI (match_dup 2) (parallel [(const_int 14)]))
(vec_select:HI (match_dup 2) (parallel [(const_int 15)]]
  "TARGET_XX_VECTORS"
  "vphaddh\t%0,%1,%2"
  [(set_attr "type" "vadd")])

and this is a MADD insn-pattern :
(define_insn "xx_vpmaddubsh"
  [(set (match_operand:V16HI 0 "register_operand" "=Z")
(ss_plus:V16HI
  (mult:V16HI
(zero_extend:V16HI
  (vec_select:V8QI
(match_operand:V32QI 1 "register_operand" "Z")
(parallel [(const_int 0)
   (const_int 2)
   (const_int 4)
   (const_int 6)
   (const_int 8)
   (const_int 10)
  

Re: how can I write a right V32QI Unpack Low Data insn pattern?

2011-03-03 Thread Liu
On Thu, Mar 3, 2011 at 11:09 AM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>>> It's a bit odd to have more than 26 elements.  Do you have any
>>> incredibly large define_insn patterns?
>>>
>> Yes, I have some 80 lines define_insn patterns, are they incredibly large?
>
> An 80 line pattern is incredibly large, yes.  The size of the overall
> define_insn doesn't matter, just the size of the pattern.
>
>> I try your patch, but it get the same error still.
>
> Bother.  Are you sure that genrecog ran again?  Can you send us an
> example of a very large define_insn pattern?
>
> Ian
>

Oh I get! Your patch made it! Thank you! Ina.


how can I split 1 mov insn into 2 sub_mov and 1 combine?

2011-03-16 Thread Liu
hi all
Our processor have a outrageous load insn, so I have to make gcc
support it. But when I tried some way, I failed.
When we suppose a load should be:
load_256  $z1, 16($fp)  ;load 256bits to a 256bits-wide register.
we have to split it into:
load_low_128  $z1, 16($fp)  ;load 128bits to the low
128bits of a 256bits-wide register.
load_low_128  $z2, 32($fp)  ;load 128bits to the low
128bits of a 256bits-wide register.
combine_2_to_1  $z1, $z1, $z2, 0x20  ;combine them together.

in mips_output_move, I can return a string such like "load_256  %0, %1",
but I can't return "load_low_128  %0, %1\n
 load_low_128  %2, %3\n
 combine_2_to_1  %0, %0, %2, 0x20"
%3 is %1+16bytes offset, %0 and %2 are 256bits-wide registers, 0x20 is a const.

in define_insn "mov_internal", I can't using emit_insn(gen_xxx()).

when I using emit_insn(gen_xxx()) in define_expand "mov" I get a
error like:
root@localhost:~/# mips64el-unknown-linux-gnu-gcc -S -march=xx xx-simd.c
xx-simd.c: In function 'test_vpaddd_u':
xx-simd.c:33:1: error: unrecognizable insn:
(insn 31 30 32 3 hr-simd.c:28 (set (mem/c/i:V4DI (reg/f:DI 253
virtual-stack-vars) [0 s+0 S32 A256])
(reg:V4DI 257 [ D.5235 ])) -1 (nil))
xx-simd.c:33:1: internal compiler error: in extract_insn, at recog.c:2103
Please submit a full bug report,

Please show me a path, thank you very much!

-Liu


Re: how can I split 1 mov insn into 2 sub_mov and 1 combine?

2011-03-29 Thread Liu
On Thu, Mar 17, 2011 at 2:11 PM, WANG.Jiong  wrote:
> define_split should be the correct way to handle this.
> You should first use define_split to break your 256bit pattern and generate
> legitimized 128bit rtl pattern sequence  for you processor
> mips_output_move should only be used to handle those legitimized one.
>
>  256 bit rtl pattern  > define_split
>                                                     |
>                                                    V
>                                             rtl pattern 1      --->
> mips_output_move () ->  corresponding instruciton
>                                             rtl pattern 2     ...
>                                             rtl pattern 3     ...
>
> And I am doubt abour you pattern
>
> (insn 31 30 32 3 hr-simd.c:28 (set (mem/c/i:V4DI (reg/f:DI 253
> virtual-stack-vars) [0 s+0 S32 A256])
>        (reg:V4DI 257 [ D.5235 ])) -1 (nil))
>
>
> You set the memeory from a register?
>
>
>
> On 03/17/2011 12:59 PM, Liu wrote:
>>
>> hi all
>> Our processor have a outrageous load insn, so I have to make gcc
>> support it. But when I tried some way, I failed.
>> When we suppose a load should be:
>> load_256      $z1, 16($fp)          ;load 256bits to a 256bits-wide
>> register.
>> we have to split it into:
>> load_low_128      $z1, 16($fp)          ;load 128bits to the low
>> 128bits of a 256bits-wide register.
>> load_low_128      $z2, 32($fp)          ;load 128bits to the low
>> 128bits of a 256bits-wide register.
>> combine_2_to_1      $z1, $z1, $z2, 0x20          ;combine them together.
>>
>> in mips_output_move, I can return a string such like "load_256      %0,
>> %1",
>> but I can't return "load_low_128      %0, %1\n
>>                              load_low_128      %2, %3\n
>>                              combine_2_to_1      %0, %0, %2, 0x20"
>> %3 is %1+16bytes offset, %0 and %2 are 256bits-wide registers, 0x20 is a
>> const.
>>
>> in define_insn "mov_internal", I can't using emit_insn(gen_xxx()).
>>
>> when I using emit_insn(gen_xxx()) in define_expand "mov" I get a
>> error like:
>> root@localhost:~/# mips64el-unknown-linux-gnu-gcc -S -march=xx xx-simd.c
>> xx-simd.c: In function 'test_vpaddd_u':
>> xx-simd.c:33:1: error: unrecognizable insn:
>> (insn 31 30 32 3 hr-simd.c:28 (set (mem/c/i:V4DI (reg/f:DI 253
>> virtual-stack-vars) [0 s+0 S32 A256])
>>         (reg:V4DI 257 [ D.5235 ])) -1 (nil))
>> xx-simd.c:33:1: internal compiler error: in extract_insn, at recog.c:2103
>> Please submit a full bug report,
>>
>> Please show me a path, thank you very much!
>>
>> -Liu
>>
>
>

Sorry  WANG
another toolchain take me several days.

split 256-bits move like:
(define_split
  [(set (match_operand:ZDSOTDWHB 0 "nonimmediate_operand")
(match_operand:ZDSOTDWHB 1 "move_operand"))]
  "TARGET_1 && reload_completed"
  [(const_int 0)]
{
  mips_split_octupleword_move (operands[0], operands[1]);
  DONE;
})

mips_split_octupleword_move is:
void
mips_split_octupleword_move (rtx dest, rtx src)
{
  rtx addr, tmp_reg, imm8;
  enum machine_mode mode;
  enum rtx_code dest_code, src_code;

  dest_code = GET_CODE (dest);
  src_code = GET_CODE (src);
  mode = GET_MODE (dest);

  if (dest_code == REG && ZZ_REG_P (REGNO(dest)) && src_code == MEM)
{
  mips_emit_move (dest, src);
  addr = plus_constant (src, 16);
  if (GET_MODE (dest) == V32QImode)
tmp_reg = gen_reg_rtx (V32QImode);
  else if (GET_MODE (dest) == V16HImode)
tmp_reg = gen_reg_rtx (V16HImode);
  else if (GET_MODE (dest) == V8SImode)
tmp_reg = gen_reg_rtx (V8SImode);
  else if (GET_MODE (dest) == V4DImode)
tmp_reg = gen_reg_rtx (V4DImode);
  else if (GET_MODE (dest) == V2TImode)
tmp_reg = gen_reg_rtx (V2TImode);
  else if (GET_MODE (dest) == OImode)
tmp_reg = gen_reg_rtx (OImode);
  else if (GET_MODE (dest) == V8SFmode)
tmp_reg = gen_reg_rtx (V8SFmode);
  else if (GET_MODE (dest) == V4DFmode)
tmp_reg = gen_reg_rtx (V4DFmode);
  else
printf("Can't alloc Pseudo-Register for vload. \n");
  mips_emit_move (tmp_reg, addr);
  imm8 = GEN_INT(0x20);
  emit_insn (gen_vpermutqi (dest, dest, tmp_reg, imm8));
}
  else if (src_code == REG && ZZ_REG_P (REGNO(src)) && dest_code ==MEM)
{
  mips_emit_move (dest, src);
  addr = plus_constant (src, 16);
  if (GET_MODE (dest) == V32QImode)
tmp_reg = gen_reg_rtx (V32QImode);
  else if (GET_MODE (dest) == V16HImode)
tmp_reg = gen_r

Re: how can I split 1 mov insn into 2 sub_mov and 1 combine?

2011-03-30 Thread Liu
On Wed, Mar 30, 2011 at 11:45 PM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>>       if (GET_MODE (dest) == V32QImode)
>>         tmp_reg = gen_reg_rtx (V32QImode);
>
>> vpaddd.c:33:1: internal compiler error: in gen_reg_rtx, at emit-rtl.c:863
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>
>> emit-rtl.c:863 is  gcc_assert (can_create_pseudo_p ());
>
> You can only call gen_reg_rtx if can_create_pseudo_p returns true.  It
> will return false during and after register allocation.  Your code is
> being called at that time somehow, probably during reload.  You have to
> either ensure that that does not happen, or, more likely, you have to
> arrange to use an existing register rather than create a new one.  For
> example, look for uses of can_create_pseudo_p in mips.c.
>
> Ian
>

Thank you very much Ian!

Does SCRATCH will be OK?

And, I will look into can_create_pseudo_p().

--Liu


Re: how can I split 1 mov insn into 2 sub_mov and 1 combine?

2011-03-31 Thread Liu
On Thu, Mar 31, 2011 at 2:57 PM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>> On Wed, Mar 30, 2011 at 11:45 PM, Ian Lance Taylor  wrote:
>>> Liu  writes:
>>>
>>>>       if (GET_MODE (dest) == V32QImode)
>>>>         tmp_reg = gen_reg_rtx (V32QImode);
>>>
>>>> vpaddd.c:33:1: internal compiler error: in gen_reg_rtx, at emit-rtl.c:863
>>>> Please submit a full bug report,
>>>> with preprocessed source if appropriate.
>>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>>>
>>>> emit-rtl.c:863 is  gcc_assert (can_create_pseudo_p ());
>>>
>>> You can only call gen_reg_rtx if can_create_pseudo_p returns true.  It
>>> will return false during and after register allocation.  Your code is
>>> being called at that time somehow, probably during reload.  You have to
>>> either ensure that that does not happen, or, more likely, you have to
>>> arrange to use an existing register rather than create a new one.  For
>>> example, look for uses of can_create_pseudo_p in mips.c.
>>>
>>> Ian
>>>
>>
>> Thank you very much Ian!
>>
>> Does SCRATCH will be OK?
>
> Not directly, no.  You can use match_scratch as part of a secondary
> reload, though.  I don't know whether you need a secondary reload here
> or not.
>
> Ian
>

Thanks.
 I'll go to look into can_create_pseudo_p, maybe reload.c, it is really complex.


What is the type of imm16 in builtin-func?

2011-04-25 Thread Liu
hi all

I write a pattern like this:
(define_insn "extrv4di"
  [(set (match_operand:V4DI 0 "register_operand" "=Z")
(unspec:V4DI
  [(match_operand:V4DI 1 "register_operand" "Z")
   (match_operand:SI 2 "immediate_operand" "")]
  UNSPEC_EXTR))]
  "TARGET_VECTORS"
  "extrd\t%0,%1,%2"
  [(set_attr "type" "vadd")])

and the the code in mips.c:
#define CODE_FOR_extrd CODE_FOR_extrv4di
  XX_BUILTIN (extrd, MIPS_V4DI_FTYPE_V4DI_INT),

define a macro in mips.md:
   (UNSPEC_EXTR 821)

the xx.h:
__extension__ static __inline int64x4_t __attribute__ ((__always_inline__))
extrd (int64x4_t s, const int i)
{
  return __builtin_extrd (s, i);
}

When I write a testcase like:
int64x4_t vec_vpextrd (int64x4_t s, const int t)
{
  int64x4_t r;
  r = vpextrd (s, t);
  return r;
}

I get a error:
/opt/cross-tools/bin/../lib/gcc/mips64el-unknown-linux-gnu/4.5.1/include/xx.h:1535:31:
error: invalid argument to built-in function

What should I do? What's the type of imm8/imm16 in builtin-func?

Thanks.

--Liu


Re: What is the type of imm16 in builtin-func?

2011-04-25 Thread Liu
On Tue, Apr 26, 2011 at 1:44 AM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>> I get a error:
>> /opt/cross-tools/bin/../lib/gcc/mips64el-unknown-linux-gnu/4.5.1/include/xx.h:1535:31:
>> error: invalid argument to built-in function
>
> That is an error from the MIPS backend.
>
> In this case it seems to mean that the mode for the type int64x4_t does
> not match the mode for the insn in the .md file.  I don't know where
> int64x4_t is defined, so I don't know what its mode is.  It looks like
> you want the mode to be V4DImode; is it?
>
> Ian
>

Thank you, Ian.

Yes, I want V4DImode and in the xx.md file I write match_operand:V4DI
in this pattern.
The type of int64x4_t is define in xx.h:

typedef int64_t int64x4_t __attribute__((vector_size (32)));

This type is OK in other builtin-funcs, only imm-operand-builtin-funcs
get this error. So, I'm confused.

--Liu


Re: What is the type of imm16 in builtin-func?

2011-04-27 Thread Liu
On Tue, Apr 26, 2011 at 1:44 AM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>> I get a error:
>> /opt/cross-tools/bin/../lib/gcc/mips64el-unknown-linux-gnu/4.5.1/include/xx.h:1535:31:
>> error: invalid argument to built-in function
>
> That is an error from the MIPS backend.
>
> In this case it seems to mean that the mode for the type int64x4_t does
> not match the mode for the insn in the .md file.  I don't know where
> int64x4_t is defined, so I don't know what its mode is.  It looks like
> you want the mode to be V4DImode; is it?
>
> Ian
>

now I modify the pattern into:
(define_insn "hr_vpextrv4di"
  [(set (match_operand:V4DI 0 "register_operand" "=Z")
(unspec:V4DI
  [(match_operand:V4DI 1 "register_operand" "Z")
   (match_operand:SI 2 "const_int_operand" "")]
  UNSPEC_HR_VPEXTR))]
  "TARGET_HR_VECTORS"
  "vpextrd\t%0,%1,%2"
  [(set_attr "type" "vadd")])

then I compile the testcase:
mips64el-unknown-linux-gnu-gcc -march=hr1 -S extrd-func.c -fdump-rtl-all

I get the expand-pass rtl only, in the rtl file, it doesn't match the
pattern at all!

How can I fix this bug?


What should I do to make gcc support PIC code?

2011-06-07 Thread Liu
Hi all,

If I want make a GNU Toolchain support PIC code and Dynamic link,
do I need do some work on gcc?
If I do need. What should I do?

Thanks
--Liu


Re: What should I do to make gcc support PIC code?

2011-06-07 Thread Liu
On Tue, Jun 7, 2011 at 9:20 PM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>> If I want make a GNU Toolchain support PIC code and Dynamic link,
>> do I need do some work on gcc?
>> If I do need. What should I do?
>
> The GNU toolchain supports PIC and dynamic linking by default.  Are you
> talking about some new gcc target?  If so, you need to give us more
> details.
>
> Ian
>

Thank you for reply, Ian.
Yes, I am working on a new gcc target, it almost finished but PIC and
dynamic linking.
They want me make the toolchain support PIC and dynamic linking. I'm
not sure what should I do, will you show me a path?

Thanks again,

--Liu


Re: What should I do to make gcc support PIC code?

2011-06-07 Thread Liu
On Tue, Jun 7, 2011 at 10:52 PM, Ian Lance Taylor  wrote:
> Liu  writes:
>
>> Yes, I am working on a new gcc target, it almost finished but PIC and
>> dynamic linking.
>> They want me make the toolchain support PIC and dynamic linking. I'm
>> not sure what should I do, will you show me a path?
>
> [ Sorry for my earlier reply, I see now that you did also reply to the
>  mailing list. ]
>
> The first and most important thing you need to do is work out the ABI
> for position independent code and dynamic linking.  This is not a gcc
> issue.  It will depend entirely on how your processor works.  You need
> to design code sequences for position independent function calls and
> access to global variables.  I recommend reading one of the ELF
> processor supplements--several are available online--to see how they
> generally work.  There is also a short introduction to these ideas at
> http://www.airs.com/blog/archives/41 .
>
> Once you've figured out what code you need to generate, then you can
> think about how to represent it in gcc.
>
> Ian
>

Thank you Ian.
I'll read your blog and find a simple and complete port to read.

Liu


Re: How effect the OpenSource EKOPath the GCC ?‏

2011-06-18 Thread Liu
2011/6/18 theUser BL :
>
> Hi!
>
> Currently I have nothing about it found in the mailinglist. So I try to ask 
> it: How effect the OpenSource EKOPath the GCC ?
>
> Have a look at the latest press news of PathScale:
> http://www.pathscale.com/taxonomy/term/27
>
> Have additional a look at this articls of phronix:
> http://www.phoronix.com/scan.php?page=article&item=pathscale_ekopath4_open&num=1
> http://www.phoronix.com/scan.php?page=news_item&px=OTU2OA
>
>
> The important things:
> EKOPath will be step by step opend. First published opend code you can find at
> https://github.com/path64
>
> At
> http://www.pathscale.com/ekopath-compiler-suite
> you can download nightly builds.
> To compile a helloworld.c program type
> $ ~/ekopath-4.0.10/bin/pathcc helloworld.c -o helloworld
>
> You can use different benchmarks. For me it seems, that with EKOPath compiled 
> programs are a lot faster then with GCC.
>
> The only disadvantage is. that currently EKOPath exists only for 64bit 
> systems with x86_64 CPUs.
>
> But how will the OpenSource EKOPath effect GCC?
> Can GCC make use of some EKOPath code? (EKOPath itself uses parts of GCC)
> Or can GCC learn from it by studying it?
>
> Greatings
> theuserbl
>
>
>

Hi theuserbl

Chris is your boss?

I know that EKOPath is much more better than open64, but open64 can
compile nothing but spec2000.
So, you want something about Effect, still?

GCC is the real compiler that can work! GCC can bootstrap, can compile
Linux Kernel, can compile GNU LibC, can compile almost everything.
Will open64? Does open64 really can be used? or just a tony with a
really suck codes?

GCC is the only choice in the real world except LLVM, but never open64.

--Liu


Re: How effect the OpenSource EKOPath the GCC ?‏

2011-06-18 Thread Liu
2011/6/19 theUser BL :
>
> Hi Liu
>
>
>> Chris is your boss?
>
>
>
> No. Who is Chris?
>
>
>
>
>> I know that EKOPath is much more better than open64,
>
>
>
> And could code of it useful for GCC or not?
>
>
>
>
>
>> but open64 can compile nothing but spec2000.
>
>
>
> Open64? I have googled at it. Do you mean that at
>
> http://www.open64.net/
>
>
>
> I talked about Path64
>
> https://github.com/path64
>
>
>
> But you are right, at
>
> http://www.open64.net/about-open64.html
>
> there stand, that PathScale using it for EKOPath.
>
>
>
> And interestingly Open64 is a port of the old SGI-compiler.
>
> So Irix was written with MIPSPro and Open64 is a derivated work of it? Thats 
> nice.
>
>
>
>
>> So, you want something about Effect, still?
>
>
>
> Yes. And I am looking, if there existing other OpenSource Compiler then GCC, 
> which could be better in some areas.
>
>
>
>
>> GCC is the real compiler that can work! GCC can bootstrap, can compile
>> Linux Kernel, can compile GNU LibC, can compile almost everything.
>
>> Will open64? Does open64 really can be used? or just a tony with a
>> really suck codes?
>
>
>
> Why do you talk everytime about open64, when I talk about Path64?
>
> Open64 is the old one. I think if there existing nice code in it, GCC already 
> have using it.
>
> Path64 is the new one, which opens.
>
>
>
>
>> GCC is the only choice in the real world except LLVM, but never open64.
>
>
> Can you descripe it a little bit more clear?
>
> What are the disadvantages of Open64 and what are the disadvantages of 
> EKOPath/Path64?
>
>
>
> And if it have only disadvantages over GCC, why are people still working on 
> Open64 and EKOPath/Path64?
>
> As I see, the newest version of Open64 was released April 13th, 2011.
>
> And EKOPath will now the times opened and becomes lot of updates.
>
>
>
> Open64 is under the GPL (similar to the GCC).
>
> But if GCC is the better one, why using PathScale for EKOPath and ENZO Open64 
> as base?
>
>
>
> But as I said before, it looks that PathScale using additional lot of parts 
> of GCC:
>
> https://github.com/path64/compiler/tree/master/GCC
>
> https://github.com/path64/compiler/tree/master/gcc_incl
>
>
>
> Greatings
>
> theuserbl
>
>
>
>

Hi theuserbl

What I mean is pathcc is much more better than open64, but not enough
to gcc. For pathcc and open64 have the same code-base!

Why Pathscale chose it? Fred Chow!

Open64 always say "we will replace gcc" for almost 20 years... it
still play with spec2000, only!


--Liu


Re: I am work with lm32 and want to help with the lm32 target in gcc

2011-08-31 Thread Liu
On Thu, Sep 1, 2011 at 11:00 AM, Xiangfu Liu  wrote:
> Hi
>
> I am work with lm32 and want to help with the lm32 target in gcc.
> the device name is milkymist one. with FPGA software CPU core lm32.
>
> what is the first step I should do for help with lm32 target in gcc?
> I have read this http://gcc.gnu.org/contribute.html#legal
> I have to start with small contributions. should I do this legal stuff?
>
> thanks
>

ask for a assignment, sign it, send it back to FSF, and summit your patch.

--Liu


Re: I am work with lm32 and want to help with the lm32 target in gcc

2011-09-04 Thread Liu
On Thu, Sep 1, 2011 at 2:58 PM, Xiangfu Liu  wrote:
> Hi
>
> can you send me the copyright assignment forms.
> it should be an assignment for all future changes, right?
>
> thanks for reply
>
> On 09/01/2011 11:32 AM, Liu wrote:
>>
>> On Thu, Sep 1, 2011 at 11:00 AM, Xiangfu Liu
>>  wrote:
>>>
>>> Hi
>>>
>>> I am work with lm32 and want to help with the lm32 target in gcc.
>>> the device name is milkymist one. with FPGA software CPU core lm32.
>>>
>>> what is the first step I should do for help with lm32 target in gcc?
>>> I have read this http://gcc.gnu.org/contribute.html#legal
>>> I have to start with small contributions. should I do this legal stuff?
>>>
>>> thanks
>>>
>>
>> ask for a assignment, sign it, send it back to FSF, and summit your patch.
>>
>> --Liu
>
>

Please email the following information to ass...@gnu.org , and we
will send you the assignment form for your past and future changes.

Please use your full legal name (in ASCII characters) as the subject
line of the message.
--
REQUEST: SEND FORM FOR PAST AND FUTURE CHANGES


[What is the name of the program or package you're contributing to?]


[Did you copy any files or text written by someone else in these changes?
Even if that material is free software, we need to know about it.]


[Do you have an employer who might have a basis to claim to own
your changes?  Do you attend a school which might make such a claim?]


[For the copyright registration, what country are you a citizen of?]


[What year were you born?]


[Please write your email address here.]


[Please write your postal address here.]


[Which files have you changed so far, and which new files have you written
so far?]


Re: **Help I love GCC

2011-10-13 Thread Liu
2011/10/14 Ian Lance Taylor :
> "花儿对我笑" <870523...@qq.com> writes:
>
>> Please see the whole E-mail  Please send a GCC 
>> for windows.   Language:Chinese or English.   I'm a Chinese student,now 
>> I'm studing C++.I want a GCC(For Windows,Chinese),but my English isn't very 
>> good,and I can't find GCC. So,please send me a GCC,for tomorrow of the wold. 
>>Write (Send)to me soon.  E-MAIL ADDRESS:870523...@qq.com
>> --
>
> This messages should have been sent to gcc-h...@gcc.gnu.org, not
> gcc@gcc.gnu.org.  Please send any followups to gcc-help.  Thanks.
>
> For gcc for Windows see http://cygwin.com or http://mingw.org .
>
> Ian
>

And, If you wanna a IDE, google mingw+eclipse.
Anymore you need, let me know.

Liu


How can I write a builtin func?

2010-12-09 Thread Liu
Hi all,
  I'm porting gcc to a MIPS-based DSP, I need write some builtin
func for some insns, but I can't find a doc., any one should help me?
Show me a example please? Thanks very much.

Liu.


Re: How can I write a builtin func?

2010-12-09 Thread Liu
2010/12/10 WANG.Jiong :
> On 12/10/2010 02:17 PM, Liu wrote:
>
> Hi all,
>   I'm porting gcc to a MIPS-based DSP, I need write some builtin
> func for some insns, but I can't find a doc., any one should help me?
> Show me a example please? Thanks very much.
>
> Liu.
>
>
> Maybe you should at least implement the following two hooks:
>
> TARGET_INIT_BUILTINS
>   ---> to do some initialization
>
> TARGET_EXPAND_BUILTIN
>   ---> to expand builtin related tree to rtl
>   I suggest you define some UNSPEC rtl,  like:
>  (define_insn "xxx"
>  [set (match_operand: .)
>  (unspec: [...] UNSPEC_XXX)]
>   ..
>     )
>
>   and then expand builtin tree to these rtl by calling the
> related gen_xxx
>
> Suggest you to see s390's related hook implementation.
>
> --
> Best,
> Wong.KwongYuan

Dear Wong.KwongYuan,
  Thank you very much! Your answer is very clear!
  I'll look into it, maybe you will help me more when I get more problems.
Thanks again!

Liu.


How can I add 256bits register file to a MIPS port?

2010-12-20 Thread Liu
Hi all
  I need add 256bits-register support for our MIPS-based
processor, so I add some codes.
  When I build gcc and test it, get a error "unable to find a
register to spill in class 'XX_REGS'"
  can you tell me how to add 256bits register file to a MIPS port?

Thanks!

codes:
gcc/config/mips/constraints.md :
(define_register_constraint 'Z' 'XX_REGS'
  '@internal')

gcc/config/mips/mips-ftypes.def :
DEF_MIPS_FTYPE (2, (UV32QI, UV32QI, UV32QI))
DEF_MIPS_FTYPE (2, (V32QI, V32QI, V32QI))

gcc/config/mips/mips.h :
#define FIXED_REGISTERS 
  /* XX regusters */   \
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,   \
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,   \
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1\

#define CALL_USED_REGISTERS 
  /* XX regusters */   \
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,   \
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,   \
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1\

#define CALL_REALLY_USED_REGISTERS
  /* XX regusters */   \
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,   \
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,   \
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,   \
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\

#define XX_REG_FIRST 188
#define XX_REG_LAST  251
#define XX_REG_NUM   (XX_REG_LAST - XX_REG_FIRST + 1)

#define XX_REG_P(REGNO) \
  ((unsigned int) ((int) (REGNO) - XX_REG_FIRST) < XX_REG_NUM)

enum reg_class
  XX_REGS,

#define REG_CLASS_NAMES 
  XX_REGS,

#define REG_CLASS_CONTENTS add bits into it.

#define REG_ALLOC_ORDER 
188,189,190,191,192,193,194,195,196,197,\
 198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,   \
 214,215,216,217,218,219,229,221,222,223,224,225,226,227,228,229,   \
 230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,   \
  246,247,248,249,250,251   

#define REGISTER_NAMES  
  '$z0', '$z1', '$z2', '$z3', '$z4', '$z5', '$z6', '$z7', '$z8', '$z9',\
  '$z10', '$z11', '$z12', '$z13', '$z14', '$z15', '$z16', '$z17',
'$z18', '$z19',   \
  '$z20', '$z21', '$z22', '$z23', '$z24', '$z25', '$z26', '$z27',
'$z28', '$z29',   \
  '$z30', '$z31', '$z32', '$z33', '$z34', '$z35', '$z36', '$z37',
'$z38', '$z39',   \
  '$z40', '$z41', '$z42', '$z43', '$z44', '$z45', '$z46', '$z47',
'$z48', '$z49',   \
  '$z50', '$z51', '$z52', '$z53', '$z54', '$z55', '$z56', '$z57',
'$z58', '$z59',   \
  '$z60', '$z61', '$z62', '$z63' }

gcc/config/mips/mips-modes.def :
/* XX Vec modes */
VECTOR_MODES (INT, 32);/*   V32QI V16HI V8SI V4DI */

gcc/config/mips/xx.h :
vpaddb_u (uint8x32_t s, uint8x32_t t)
{
  return __builtin_xx_vpaddb_u (s, t);
}
__extension__ static __inline int8x32_t __attribute__ ((__always_inline__))
vpaddb_s (int8x32_t s, int8x32_t t)
{
  return __builtin_xx_vpaddb_s (s, t);
}

gcc/config/mips/xx.md :
(define_mode_iterator ZB [V32QI])
(define_mode_iterator ZH [V16HI])
(define_mode_iterator ZW [V8SI])
(define_mode_iterator ZD [V4DI])
(define_mode_iterator ZHB [V16HI V32QI])
(define_mode_iterator ZWH [V8SI V16HI])
(define_mode_iterator ZDW [V4DI V8SI])
(define_mode_iterator ZWHB [V8SI V16HI V32QI])
(define_mode_iterator ZDWH [V4DI V8SI V16HI])
(define_mode_iterator ZDWHB [V4DI V8SI V16HI V32QI])
(define_mode_attr Z_suffix [(V4DI 'd')(V8SI 'w') (V16HI 'h') (V32QI 'b')])

(define_insn 'xx_vpadd'
  [(set (match_operand:ZDWHB 0 'register_operand' '=Z')
(plus:ZDWHB (match_operand:ZDWHB 1 'register_operand' 'Z')
(match_operand:ZDWHB 2 'register_operand' 'Z')))]
  'TARGET_HARD_FLOAT && TARGET_XX_VECTORS'
  'vpadd\t%0,%1,%2'
  [(set_attr 'type' 'fadd')])

gcc/config/mips/mips.c :
  if (TARGET_XX_VECTORS
  && (mode == V32QImode
  || mode == V16HImode
  || mode == V8SImode
  || mode == V4DImode))
return true;

case V32QImode:
case V16HImode:
case V8SImode:
case V4DImode:
 return TARGET_XX_VECTORS;

#define CODE_FOR_xx_vpaddb CODE_FOR_xx_vpaddv32qi

  XX_BUILTIN_SUFFIX (vpaddb, u, MIPS_UV32QI_FTYPE_UV32QI_UV32QI),
  XX_BUILTIN_SUFFIX (vpaddb, s, MIPS_V32QI_FTYPE_V32QI_V32QI),

#define MIPS_ATYPE_V32QI mips_builtin_vector_type (intQI_type_node, V32QImode)
#define MIPS_ATYPE_V16HI mips_builtin_vector_type (intHI_type_node, V16HImode)
#define MIPS_ATYPE_V8SI mips_builtin_vector_type (intSI_type_node, V8SImode)
#define MIPS_ATYPE_V4DI mips_builtin_vector_type (intDI_type_node, V4DImode)
#define MIPS_ATYPE_UV32QI 

Re: How can I add 256bits register file to a MIPS port?

2010-12-21 Thread Liu
2010/12/21 Ian Lance Taylor :
> Liu  writes:
>
>>       I need add 256bits-register support for our MIPS-based
>> processor, so I add some codes.
>>       When I build gcc and test it, get a error "unable to find a
>> register to spill in class 'XX_REGS'"
>>       can you tell me how to add 256bits register file to a MIPS port?
>>
>> Thanks!
>>
>> codes:
>> gcc/config/mips/constraints.md :
>> (define_register_constraint 'Z' 'XX_REGS'
>>   '@internal')
>>
>> gcc/config/mips/mips-ftypes.def :
>> DEF_MIPS_FTYPE (2, (UV32QI, UV32QI, UV32QI))
>> DEF_MIPS_FTYPE (2, (V32QI, V32QI, V32QI))
>>
>> gcc/config/mips/mips.h :
>> #define FIXED_REGISTERS
>>   /* XX regusters */                                                   \
>>  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                      \
>>   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                     \
>>   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                     \
>>   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1                      \
>
> Setting them all as 1 in FIXED_REGISTERS means that gcc can't use them.
>
> Ian
>

Thank you for reply.
I changed FIXED_REGISTERS into 0, FIXED_REGISTERS into 1,
CALL_REALLY_USED_REGISTERS into 1.

build and test it, but I still get the same error.


Re: How can I add 256bits register file to a MIPS port?

2010-12-22 Thread Liu
Thanks for your pointing. I'll try my best.

2010/12/21 Ian Lance Taylor :
> Liu  writes:
>
>> I changed FIXED_REGISTERS into 0, FIXED_REGISTERS into 1,
>> CALL_REALLY_USED_REGISTERS into 1.
>>
>> build and test it, but I still get the same error.
>
> Have you updated HARD_REGNO_MODE_OK?
>
> I encourage you to carefully read the gcc internals manual.  It's pretty
> good on this kind of thing.
>
> Ian
>


Re: vcond implementation in altivec

2007-03-05 Thread Sa Liu
David Edelsohn <[EMAIL PROTECTED]> wrote on 02.03.2007 19:10:58:

> > Devang Patel writes:
> 
> >> Is there a reason why op0 is V4SF
> Devang> It is destination so, yes this is wrong.
> 
> >> and op1 is V4SI (and not V8HI)?
> 
> Devang> condition should be v4si, but it is not op1. So this is also
> not correct.
> 
> >> And also, why not use if_then_else instead of unspec (in all 
vcond's)?
> 
> Devang> I did not try that path. May be I did not know about it at that 
time.
> 
> 
>Patches welcome.
> 
> David
> 

I am working on the patch and will submit it soon.

Sa


-fprofile-arcs changes the structure of basic blocks

2005-06-23 Thread Liu Haibin
Hi,

I want to use profiling information. I know there're two relevent
fields in each basic block, count and frequency. I want to use
frequency because the compiled program is for another architecture so
it cannot run on the host.

I use -fprofile-arcs. And I can see the frequency value when I debug
cc1. But I happen to realize that when I add -fprofile-arcs, it change
the the whole structure of basic block. I compared the vcg output
files with and without the -fprofile-arcs. I found they're totally
different.

My question is why it is so? I want to know the profiling info, but if
profiling info I get is for another different structure of basic
block, it's useless to me.

Where do I go wrong here? Which option is the suitable in this case?
The gcc version is 3.3.3. Thanks.


Regards,
Timothy


Re: -fprofile-arcs changes the structure of basic blocks

2005-06-23 Thread Liu Haibin
Then I think I shouldn't use -fprofile-arcs. The reason why I used
-fprofile-arcs is when I debugged a program without any flags, I saw
the frequency was zero. When I added this flag, I saw frequency with
values.

I checked the frequency after life_analysis and before
combine_instructions. I used

FOR_EACH_BB(bb) {
 // some code
}

and checked the bb->frequency.

So now the question is how I can see the frequency without any flags.
The following was the small program I used to check the frequency.

int foo(int i)
{
if (i < 2)
return 2;
else
return 0;
}
int main()
{
int i;

i = 0;
if (i < 100)
i = 3;
else
i = foo(i);

return 0;
}




On 6/24/05, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> On Thu, 23 Jun 2005, Liu Haibin wrote:
> 
> > Hi,
> >
> > I want to use profiling information. I know there're two relevent
> > fields in each basic block, count and frequency. I want to use
> > frequency because the compiled program is for another architecture so
> > it cannot run on the host.
> 
> Besides the fact that, as Zdenek hsa pointed out, this is not a useful
> situation for -fprofile-arcs, ...
> >
> > My question is why it is so? I want to know the profiling info, but if
> > profiling info I get is for another different structure of basic
> > block, it's useless to me.
> >
> 
> This is because it's inserting profiling code.
> 
> This isn't magic, it's inserting code to do the profiling, which
> necessarily changes the basic blocks.
> The profiling info you get is for the original set of basic blocks.
> 
>


Cross compiler

2005-06-24 Thread Eric Liu
Hi, all:

I need a gcc cross compiler under Cygwin. Is there any step-by-step document
on how to make the cross compiler? Any help would be appriciated  very much!

Thanks and regards
Eric



Re: -fprofile-arcs changes the structure of basic blocks

2005-06-27 Thread Liu Haibin
I found that the optimization must be on in order to see the frequency.


Timothy

On 6/24/05, Liu Haibin <[EMAIL PROTECTED]> wrote:
> Then I think I shouldn't use -fprofile-arcs. The reason why I used
> -fprofile-arcs is when I debugged a program without any flags, I saw
> the frequency was zero. When I added this flag, I saw frequency with
> values.
> 
> I checked the frequency after life_analysis and before
> combine_instructions. I used
> 
> FOR_EACH_BB(bb) {
>  // some code
> }
> 
> and checked the bb->frequency.
> 
> So now the question is how I can see the frequency without any flags.
> The following was the small program I used to check the frequency.
> 
> int foo(int i)
> {
> if (i < 2)
> return 2;
> else
> return 0;
> }
> int main()
> {
> int i;
> 
> i = 0;
> if (i < 100)
> i = 3;
> else
> i = foo(i);
> 
> return 0;
> }
> 
> 
> 
> 
> On 6/24/05, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> > On Thu, 23 Jun 2005, Liu Haibin wrote:
> >
> > > Hi,
> > >
> > > I want to use profiling information. I know there're two relevent
> > > fields in each basic block, count and frequency. I want to use
> > > frequency because the compiled program is for another architecture so
> > > it cannot run on the host.
> >
> > Besides the fact that, as Zdenek hsa pointed out, this is not a useful
> > situation for -fprofile-arcs, ...
> > >
> > > My question is why it is so? I want to know the profiling info, but if
> > > profiling info I get is for another different structure of basic
> > > block, it's useless to me.
> > >
> >
> > This is because it's inserting profiling code.
> >
> > This isn't magic, it's inserting code to do the profiling, which
> > necessarily changes the basic blocks.
> > The profiling info you get is for the original set of basic blocks.
> >
> >
>


on nios2 difine_insn indirect_call

2005-07-17 Thread Liu Haibin
Hi, 

The nios2.md has a define_insn "indirect_call"

(define_insn "indirect_call"
  [(call (mem:QI (match_operand:SI 0 "register_operand" "r"))
 (match_operand 1 "" ""))
   (clobber (reg:SI RA_REGNO))]
  ""
  "callr\\t%0"
  [(set_attr "type" "control")])

But I find that in test.c.26.flow2, there is such a code.

(call_insn 41 37 42 1 0x101e17b0 (parallel [
(call (mem:QI (reg/f:SI 3 r3 [58]) [0 S1 A8])
(const_int 0 [0x0]))
(clobber (reg:SI 31 ra))
]) 41 {indirect_call} (insn_list 40 (insn_list 39 (nil)))
(expr_list:REG_DEAD (reg:SI 4 r4)
(expr_list:REG_DEAD (reg/f:SI 3 r3 [58])
(expr_list:REG_UNUSED (reg:SI 31 ra)
(nil
(expr_list (use (reg:SI 4 r4))
(nil)))

Why is there a "parallel" for indirect_call in .26.flow2 but no
"parallel" in define_insn indirect_call? Does it mean that the
"define_insn indirect_call" in md file impilcitly has "parallel"
surrounding it?


Regards,
Timothy


on define_peephole2

2005-07-21 Thread Liu Haibin
Hi, 

I have a problem on the define_peephole2. In nois2.md, there's such a
define_insn

(define_insn "addsi3"
  [(set (match_operand:SI 0 "register_operand"  "=r,r")
(plus:SI (match_operand:SI 1 "register_operand" "%r,r")
 (match_operand:SI 2 "arith_operand" "r,I")))]
  ""
  "add%i2\\t%0, %1, %z2"
  [(set_attr "type" "alu")])

I defined a peephole2 to replace this instruction.

(define_peephole2
  [(set (match_operand:SI 0 "register_operand" "=r")
(plus:SI (match_operand:SI 1 "register_operand" "%r")
;(match_operand:SI 2 "arith_operand" "r")))]
 (match_operand:SI 2 "register_operand" "r")))]
  ""
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec_volatile:SI [(match_operand:SI 4 "custom_insn_opcode" "N")
(match_operand:SI 1 "register_operand" "r")
(match_operand:SI 2 "register_operand" "r")] 
CUSTOM_INII))]
  "
{
operands[4] = const0_rtx;
}")

Because the operand 2 in the replacing instruction must be a register,
I changed the "arith_operand" to "register_operand", hoping that it
only replaces something like, add r1, r2, r3 instead of addi r1, r2, 9

I did a test with a file, which contains

(insn/f 106 73 107 0 0x0 (set:SI (reg/f:SI 27 sp)
(plus:SI (reg/f:SI 27 sp)
(const_int -16 [0xfff0]))) -1 (nil)
(nil))

and it seems that it did try to replace it with the new instruct. And
I got the following error:

isqrt.c:65: error: unrecognizable insn:
(insn 123 73 107 0 0x0 (set (reg/f:SI 27 sp)
(unspec_volatile:SI [
(const_int 0 [0x0])
(reg/f:SI 27 sp)
(const_int -16 [0xfff0])
] 117)) -1 (nil)
(nil))
isqrt.c:65: internal compiler error: in extract_insn, at recog.c:2175

Any ideas why it still tries to replace it even when it's obviously
not a register (const_int -16)? Thanks.


Regards,
Timothy


Re: on define_peephole2

2005-07-21 Thread Liu Haibin
On 7/21/05, Liu Haibin <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I have a problem on the define_peephole2. In nois2.md, there's such a
> define_insn
> 
> (define_insn "addsi3"
>   [(set (match_operand:SI 0 "register_operand"  "=r,r")
> (plus:SI (match_operand:SI 1 "register_operand" "%r,r")
>  (match_operand:SI 2 "arith_operand" "r,I")))]
>   ""
>   "add%i2\\t%0, %1, %z2"
>   [(set_attr "type" "alu")])
> 
> I defined a peephole2 to replace this instruction.
> 
> (define_peephole2
>   [(set (match_operand:SI 0 "register_operand" "=r")
> (plus:SI (match_operand:SI 1 "register_operand" "%r")
> ;(match_operand:SI 2 "arith_operand" "r")))]
>  (match_operand:SI 2 "register_operand" "r")))]
>   ""
>   [(set (match_operand:SI 0 "register_operand" "=r")
> (unspec_volatile:SI [(match_operand:SI 4 "custom_insn_opcode" "N")

my mistake. should be match_operand:SI 3 here. Now no more error.

> (match_operand:SI 1 "register_operand" "r")
> (match_operand:SI 2 "register_operand" "r")] 
> CUSTOM_INII))]
>   "
> {
> operands[4] = const0_rtx;
> }")
> 
> Because the operand 2 in the replacing instruction must be a register,
> I changed the "arith_operand" to "register_operand", hoping that it
> only replaces something like, add r1, r2, r3 instead of addi r1, r2, 9
> 
> I did a test with a file, which contains
> 
> (insn/f 106 73 107 0 0x0 (set:SI (reg/f:SI 27 sp)
> (plus:SI (reg/f:SI 27 sp)
> (const_int -16 [0xfff0]))) -1 (nil)
> (nil))
> 
> and it seems that it did try to replace it with the new instruct. And
> I got the following error:
> 
> isqrt.c:65: error: unrecognizable insn:
> (insn 123 73 107 0 0x0 (set (reg/f:SI 27 sp)
> (unspec_volatile:SI [
> (const_int 0 [0x0])
> (reg/f:SI 27 sp)
> (const_int -16 [0xfff0])
> ] 117)) -1 (nil)
> (nil))
> isqrt.c:65: internal compiler error: in extract_insn, at recog.c:2175
> 
> Any ideas why it still tries to replace it even when it's obviously
> not a register (const_int -16)? Thanks.
> 
> 
> Regards,
> Timothy
>


How can I create a const rtx other than 0, 1, 2

2005-07-22 Thread Liu Haibin
Hi,

There's const0_rtx, const1_rtx and const2_rtx. How can I create a
const rtx other than 0, 1, 2? I want to use it in md file, like

operand[1] = 111.

I know I must use const rtx here. How can I do it? A simple question,
but just no idea where to find the answer.


Regards,
Timothy


how to write a define_peephole2 that uses custom registers in nios2

2005-07-26 Thread Liu Haibin
Hi,

nios2 has a set of custom registers for custom instructions. They all
start with "c", like

custom 1 c4, c2, c0

I want to define a peephole to replace a sequence of codes with this
above custom instruction.

custom instruction is defined as following in nios2.md

(define_insn "custom_inii"
  [(set (match_operand:SI 0 "register_operand"   "=r")
(unspec_volatile:SI [(match_operand:SI 1 "custom_insn_opcode" "N")
  (match_operand:SI 2 "register_operand"   "r")
  (match_operand:SI 3 "register_operand"  
"r")] CUSTOM_INII))]
  ""
  "custom\\t%1, %0, %2, %3"
  [(set_attr "type" "custom")])

But the problem is it uses normal register, like r8, r9. How can I
write the define_peephole2 so that it uses custom registers?


Thanks
Haibin


Re: how to write a define_peephole2 that uses custom registers in nios2

2005-07-28 Thread Liu Haibin
Thanks. I modified the related macros, like reg_class,
REG_CLASS_FROM_LETTER(CHAR) and so on. But I have a problem on
define_peephole2.

After I modified the related macros, I replaced the "r" in
"custom_inii" with "c".

(define_insn "custom_inii"
  [(set (match_operand:SI 0 "register_operand"   "=c")
(unspec_volatile:SI [(match_operand:SI 1 "custom_insn_opcode" "N")
  (match_operand:SI 2 "register_operand"   "c")
  (match_operand:SI 3 "register_operand"  
"c")] CUSTOM_INII))]
  ""
  "custom\\t%1, %0, %2, %3"
  [(set_attr "type" "custom")])

And I defined the peephole as 

(define_peephole2
  [(set (match_operand:SI 0 "register_operand"  "")
(plus:SI (match_operand:SI 1 "register_operand" "")
 (match_operand:SI 2 "arith_operand" "")))]
  "REG_P(operands[2])"
  [(set (match_operand:SI 0 "register_operand" "=c")
(unspec_volatile:SI [(match_operand:SI 3 "custom_insn_opcode" "N")
(match_operand:SI 1 "register_operand" "c")
(match_operand:SI 2 "register_operand" "c")] 
CUSTOM_INII))]
  "
{
operands[3] = GEN_INT(100);
}")

I encounter the following error

isqrt.c: In function `usqrt':
isqrt.c:65: error: insn does not satisfy its constraints:
(insn 118 41 36 1 0x1002f390 (set (reg/v:SI 4 r4 [82])
(unspec_volatile:SI [
(const_int 100 [0x64])
(reg:SI 4 r4 [86])
(reg:SI 3 r3 [88])
] 117)) 75 {custom_inii} (nil)
(expr_list:REG_DEAD (reg:SI 3 r3 [88])
(nil)))
isqrt.c:65: internal compiler error: in build_def_use, at regrename.c:782
Please submit a full bug report,
with preprocessed source if appropriate.
See http://www.altera.com/mysupport> for instructions.

I think the reason it failed is there's no more define_insn
"custom_inii" with general registers because I already changed the "r"
to "c". However, it seems very difficult here. The old insn patterns
are all general registers, but the new insn patterns are defined as
custom registers.

Can I use something like

operands[0] = gen_rtx_REG (DImode, REGNO(operands[0]));

here to force all the operands to be a different kind? Or how can I
define the peephole?










On 7/28/05, James E Wilson <[EMAIL PROTECTED]> wrote:
> Liu Haibin wrote:
> >   (match_operand:SI 2 "register_operand"   "r")
> > But the problem is it uses normal register, like r8, r9. How can I
> > write the define_peephole2 so that it uses custom registers?
> 
> See the "Constraints" section of the documentation.  "r" means a general
> register.  If you want a custom register, then you need to use a
> contraint letter that maps to a custom register.
> 
> If the port does not already support custom registers, then you need to
> modify many of the register allocation related macros to add support for
> the custom registers.  See the "Registers" and "Register Classes"
> sections of the documentation.
> --
> Jim Wilson, GNU Tools Support, http://www.specifix.com
>


some seemingly redundant register uses in nios gcc compiled assembly code

2005-09-06 Thread Liu Haibin
Hi,

I compiled the following code using nios gcc -da -O3 (gcc version 3.3.3)

#include 
#define PI (4*atan(1))

double rad2deg(double rad)
{
  return (180.0 * rad / (PI));
}

In .s file, it has some codes like this


mov r4, zero
movhi   r5, %hiadj(1072693248)
addir5, r5, %lo(1072693248)
mov r16, r2
mov r17, r3
callatan
mov r5, r3
mov r4, r2
mov r6, zero
movhi   r7, %hiadj(1074790400)
addir7, r7, %lo(1074790400)
call__muldf3
mov r10, r2
mov r5, r17
mov r6, r10
mov r7, r3
mov r4, r16


In .c.26.flow2 file, 

(call_insn 23 19 28 0 0x0 (parallel [
(set (reg:DF 2 r2)
(call (mem:QI (symbol_ref:SI ("atan")) [0 S1 A8])
(const_int 0 [0x0])))
(clobber (reg:SI 31 ra))
]) 44 {*call_value} (insn_list 21 (nil))
(expr_list:REG_DEAD (reg:DF 4 r4)
(expr_list:REG_UNUSED (reg:SI 31 ra)
(nil)))
(expr_list (use (reg:DF 4 r4))
(nil)))

.

(call_insn/u 31 30 36 0 0x0 (parallel [
(set (reg:DF 2 r2)
(call (mem:QI (symbol_ref:SI ("__muldf3")) [0 S1 A8])
(const_int 0 [0x0])))
(clobber (reg:SI 31 ra))
]) 44 {*call_value} (insn_list 27 (insn_list 29 (nil)))
(expr_list:REG_DEAD (reg:DF 4 r4)
(expr_list:REG_DEAD (reg:DF 6 r6)
(expr_list:REG_UNUSED (reg:SI 31 ra)
(expr_list:REG_EH_REGION (const_int -1 [0x])
(nil)
(expr_list (use (reg:DF 6 r6))
(expr_list (use (reg:DF 4 r4))
(nil

>From the RTL we can see that these two calls don't use r5, but why
here both assembly codes and rtl have some codes with r5, like

movhi   r5, %hiadj(1072693248)
addir5, r5, %lo(1072693248)(move 32-bit constant into 
register)
and 
mov r5, r3

In nios2, r2 and r3 are for return value. r4, r5, r6, r7 are for
registre auruments

Does the following rtl implicitly indicate that r5 is used?

(expr_list (use (reg:DF 6 r6))
(expr_list (use (reg:DF 4 r4))

Thanks.


Regards,
Haibin


arguements used in .c.26.flow2 are not used in assembly codes

2005-09-22 Thread Liu Haibin
Hi,

I compiled the following code using nios gcc -da -O3 (gcc version 3.3.3)

#include 
#define PI (4*atan(1))

double rad2deg(double rad)
{
 return (180.0 * rad / (PI));
}

The begining of the .s file is
rad2deg:
addisp, sp, -16
stw fp, 8(sp)
mov r6, zero
mov fp, sp
movhi   r7, %hiadj(1080459264)
addir7, r7, %lo(1080459264)
stw ra, 12(sp)
stw r16, 4(sp)
stw r17, 0(sp)
call__muldf3
mov r4, zero
movhi   r5, %hiadj(1072693248)
addir5, r5, %lo(1072693248)
mov r16, r2
mov r17, r3
callatan
..

The corresponding rtl to "call __muldf3" in .c.26.flow2 file is

(call_insn/u 17 16 21 0 0x0 (parallel [
(set (reg:DF 2 r2)
(call (mem:QI (symbol_ref:SI ("__muldf3")) [0 S1 A8])
(const_int 0 [0x0])))
(clobber (reg:SI 31 ra))
]) 44 {*call_value} (insn_list 15 (nil))
(expr_list:REG_DEAD (reg:DF 4 r4)
(expr_list:REG_DEAD (reg:DF 6 r6)
(expr_list:REG_UNUSED (reg:SI 31 ra)
(expr_list:REG_EH_REGION (const_int -1 [0x])
(nil)
(expr_list (use (reg:DF 6 r6))
(expr_list (use (reg:DF 4 r4))
(nil

According to the rtl, it uses r4, r5, r6 and r7 as arguements. But the
assemble codes show no r4 or r5 is ever used before "call __muldf3".
Any idea why it is so?


Thanks
Haibin


how to add source or header file in gcc

2005-12-22 Thread Liu Haibin
Hi,

I'd like to add some source and header files into gcc. I think I
probably need to make some change in Makefile.in. But the Makefile.in
looks very complicated. Could anyone give some advice on this?


Regards,
Haibin


on data depenence

2005-12-28 Thread Liu Haibin
Hi,

I got a dump of sha.c.27.flow2 from gcc 3.4.1. I don't quite
understand the LOG_LINKS of insn 498. LOG_LINKS in insn 498 shows that
it has a data dependence (a read after write dependence) with insn 3.
Why is it so? I don't see any dependence between "mov r14 r4" and
"addi r3, r4, 28". The bottom is the whole dump of the basic block.

(insn 3 4 11 0 (set (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46])
(reg:SI 4 r4 [ sha_info ])) 8 {movsi_internal} (nil)
(expr_list:REG_DEAD (reg:SI 4 r4 [ sha_info ])
(nil)))



(insn 498 375 560 0 (set (reg/f:SI 3 r3 [235])
(plus:SI (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46])
(const_int 28 [0x1c]))) 20 {addsi3} (insn_list 3 (nil))
(nil))





;; Start of basic block 0, registers live: 4 [r4] 16 [r16] 17 [r17] 18
[r18] 19 [r19] 27 [sp] 31 [ra]
(note 289 2 597 0 [bb 0] NOTE_INSN_BASIC_BLOCK)

(insn/f 597 289 598 0 (set:SI (reg/f:SI 27 sp)
(plus:SI (reg/f:SI 27 sp)
(const_int -336 [0xfeb0]))) -1 (nil)
(nil))

(insn/f 598 597 599 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp)
(const_int 332 [0x14c])) [0 S4 A32])
(reg:SI 16 r16)) -1 (nil)
(expr_list:REG_DEAD (reg:SI 16 r16)
(nil)))

(insn/f 599 598 600 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp)
(const_int 328 [0x148])) [0 S4 A32])
(reg:SI 17 r17)) -1 (nil)
(expr_list:REG_DEAD (reg:SI 17 r17)
(nil)))

(insn/f 600 599 601 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp)
(const_int 324 [0x144])) [0 S4 A32])
(reg:SI 18 r18)) -1 (nil)
(expr_list:REG_DEAD (reg:SI 18 r18)
(nil)))

(insn/f 601 600 602 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp)
(const_int 320 [0x140])) [0 S4 A32])
(reg:SI 19 r19)) -1 (nil)
(expr_list:REG_DEAD (reg:SI 19 r19)
(nil)))

(note 602 601 4 0 NOTE_INSN_PROLOGUE_END)

(note 4 602 3 0 NOTE_INSN_FUNCTION_BEG)

(insn 3 4 11 0 (set (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46])
(reg:SI 4 r4 [ sha_info ])) 8 {movsi_internal} (nil)
(expr_list:REG_DEAD (reg:SI 4 r4 [ sha_info ])
(nil)))

(insn 11 3 375 0 (set (reg/v:SI 5 r5 [orig:47 i ] [47])
(const_int 0 [0x0])) 8 {movsi_internal} (nil)
(nil))

(insn 375 11 498 0 (set (reg/s:SI 6 r6 [54])
(const_int 15 [0xf])) 8 {movsi_internal} (nil)
(expr_list:REG_EQUIV (const_int 15 [0xf])
(nil)))

(insn 498 375 560 0 (set (reg/f:SI 3 r3 [235])
(plus:SI (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46])
(const_int 28 [0x1c]))) 20 {addsi3} (insn_list 3 (nil))
(nil))

(insn 560 498 12 0 (set (reg/f:SI 4 r4 [266])
(reg/f:SI 27 sp)) 8 {movsi_internal} (nil)
(nil))
;; End of basic block 0, registers live:
 3 [r3] 4 [r4] 5 [r5] 6 [r6] 14 [r14] 27 [sp] 31 [ra]



Regards,
Haibin


extract register input, output and operator from rtl right before peepholes

2005-12-29 Thread Liu Haibin
Hi,

I'd doing some coding right before peephole2 pass. I'd like to have a
function that takes rtl as input and returns the values of register
inputs, register output and operator. For example,

input:
(insn 496 34 29 1 (set (reg/f:SI 3 r3 [235])
(plus:SI (reg/f:SI 3 r3 [235])
(const_int 4 [0x4]))) 20 {addsi3} (insn_list:REG_DEP_ANTI 28 (nil))
(nil))
returns:
inputs: r3, 4. ouput r3. operator: plus.

I know sched_analyze() in sched-deps.c builds the dependencies in
basic blocks and hope I can find some useful functions there. I
roughly went through the code and didn't really understand.

Because the rtl's are right before peephole2, they're much processed,
which makes things easier. I hope I can find some existing function to
use instead of using something like REGNO(XEXP(SET_SRC(PATTERN(x)),
0)). I believe sched-deps.c has something useful. Can someone help on
this?


Regards,
Haibin


about REG_DEP_OUTPUT dependence

2006-01-03 Thread Liu Haibin
Hi,

Can someone help me explain that why there's an REG_DEP_OUTPUT (write
after write dependence) between jump_insn 547 and insn 82?

(insn 82 543 478 3 (set (mem/s:SI (reg/f:SI 6 r6 [224]) [4 W S4 A32])
(reg:SI 2 r2 [95])) 8 {movsi_internal} (insn_list 81 (nil))
(expr_list:REG_DEAD (reg:SI 2 r2 [95])
(nil)))
(insn 478 82 547 3 (set (reg/f:SI 6 r6 [224])
(plus:SI (reg/f:SI 6 r6 [224])
(const_int 4 [0x4]))) 20 {addsi3} (insn_list:REG_DEP_ANTI
65 (insn_list:REG_DEP_ANTI 66 (insn_list:REG_DEP_ANTI 73
(insn_list:REG_DEP_ANTI 80 (insn_list:REG_DEP_ANTI 82 (nil))
(nil))
(jump_insn 547 478 93 3 (set (pc)
(if_then_else (ne:SI (reg/v:SI 7 r7 [orig:270 i ] [270])
(const_int 0 [0x0]))
(label_ref 88)
(pc))) 61 {*cbranch} (insn_list 543
(insn_list:REG_DEP_OUTPUT 82 (nil)))
(expr_list:REG_BR_PROB (const_int 9844 [0x2674])
(nil)))


Regards,
Haibin



Re: Bug 85667 - (x86_64) ms_abi rules aren't followed when returning short structs with float values

2018-09-19 Thread Liu Hao

在 2018/9/19 16:52, lokesh janghel 写道:

Hi,

I am starting to looking into this issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85667#c0




We did some analysis about that issue and found that the problem occurs may
be in the code generation phase in GCC.
ABI says that struct type uses the integer register  (like clang )or memory
to return not the  SSE registers.

Before we go ahead and fix this issue and we would like to know the
community views/comments on the same.




This happens on not only x64, but x86 too:

```
typedef struct
{
  float x;
} Float;

#ifdef _MSC_VER
#  define MSABI
#else
#  define MSABI __attribute__((ms_abi))
#endif

Float MSABI fn1()
{
  Float v;
  v.x = 3.145;
  return v;
}
```


```
E:\Desktop>cl /nologo /c /O2 /Fo:test-cl.obj test.c && objdump 
-dMintel-mneomic test-cl.obj


test-cl.obj: file format pe-i386


Disassembly of section .text$mn:

 <_fn1>:
   0:   51  push   ecx
   1:   c7 04 24 ae 47 49 40movDWORD PTR [esp],0x404947ae
   8:   8b 04 24moveax,DWORD PTR [esp]
   b:   59  popecx
   c:   c3  ret

E:\Desktop>gcc -c -O2 -o test-gcc.o test.c && objdump -dMintel-mneomic 
test-gcc.o


test-gcc.o: file format pe-i386


Disassembly of section .text:

 <_fn1>:
   0:   d9 05 00 00 00 00   fldDWORD PTR ds:0x0
   6:   c3  ret
   7:   90  nop

E:\Desktop>
```


--
Best regards,
LH_Mouse



Re: warning: conversion from ‘int’ to ‘char’ may change value

2018-09-20 Thread Liu Hao
在 2018/9/20 22:08, Vincent Lefevre 写道:
>> In C++, declaring n1 const avoids the warning regardless of
>> optimization levels.
> 
> If the constant propagation is done at -O0, this could explain
> the behavior.
> 
> Or do you mean that GCC remembers the type the data come from,
> i.e. assuming char is signed, if n1 is of type char, then ~n1
> is necessarily representable in a char, thus can be regarded
> as of being of type char in its analysis?
> 

In C++ adding the `const` qualifier makes `~n1` a constant expression. 
In C it never is, regardless of qualifiers.

BTW, I am quite disappointed with such 'false' warnings, because by 
performing a compound AND-and-ASSIGN operation on a `char` object I have 
no interest in bits that don't fit into a `char`, be they ones or 
zeroes. Perhaps there are scenarios where they shouldn't be ignored, but 
I can't think of any.


-- 
Best regards,
LH_Mouse


Re: [9/10 Regression] [PR87833] Intel MIC (emulated) offloading still broken (was: GCC 9.0.1 Status Report (2019-04-25))

2019-05-07 Thread Hongtao Liu
On Tue, Apr 30, 2019 at 7:31 PM Jakub Jelinek  wrote:
>
> On Tue, Apr 30, 2019 at 01:02:40PM +0200, Thomas Schwinge wrote:
> > Hi Jakub!
> >
> > On Tue, 30 Apr 2019 12:56:52 +0200, Jakub Jelinek  wrote:
> > > On Tue, Apr 30, 2019 at 12:47:54PM +0200, Thomas Schwinge wrote:
> > > > Email to  apparently is no longer gets delivered.
> > > > Is there anyone else from Intel who'd take over maintenance?
> > >
> > > As your patch is to LTO option handling, I think you want a review from
> > > Honza.
> >
> > Well, I'm actually not asking for review of the WIP patch, but rather
> > looking for someone to take on ownership/maintenance of the functionality
> > of Intel MIC offloading.
>
> That would be indeed greatly appreciated.
>
> Jakub

I don't konw this guy ilya.ver...@intel.com.
Do you know him/her, H.J?

-- 
BR,
Hongtao


Re: Usage of C11 Annex K Bounds-checking interfaces on GCC

2019-12-15 Thread Liu Hao
在 2019/12/16 4:00, Jeffrey Walton 写道:
> 
> If RTFM was going to work, then it would have happened in the last 50
> years or so.
> 
> If error free programming was going to happen, then it would have
> happened in the last 50 years or so.
> 
> Come back to reality.
> 

What's your point? Don't RTFM then don't code, period.

> 
> Microsoft calls them "safer" functions. They are "safer" then the
> original C functions they are supplementing. For completeness,
> Microsoft does not claim they are completely safe.
> 

They are of course not 'safer' for two reasons:

One is that by having an additional parameter you ask for an additional
size argument, but it is still possible that the user passed a wrong
size, such as when you want the number of `wchar_t`s but your user
supplied the number of bytes, which you have no clue about. The best
advice would be using C++ templates to deduce the size of output buffer,
but it doesn't work in C, and even in C++ it works only when the
argument is an array, string, vector, etc. It doesn't work if the
argument is a pointer, in which case you still have to pass the size
yourself.

The other reason is that by requiring more arguments you increase the
probability of bugs. Let's say there is a 1% chance that you pass a
wrong argument. Then if there is 1 argument, the probability that you do
everything right is 99%. If there are 2 arguments, it is 98.01%. If
there are 10 arguments, it is 97.0299%. If there are 100 arguments, it
is about 36.6%. It is not something we would like.


> Hugh? Are you begging the argument:
> 
> char* ptr = malloc (50);
> 
> And then claiming you don't know the size?
> 


Why don't you use Java which keeps tracking of allocated arrays and
throws exceptions in case of out-of-bound access?

> Developer training does not work. If it was going to work, then it
> would have happened in the last 50 years or so.
> 
> Microsoft recognized the fact years ago. You have to force developers
> to use something safer.
> 


Let's take a C++ example. A lot of people still prefer `operator[]` to
`.at()`, which is prone to errors; but they have been taught and have
got used to that for decades. It is not developer training that does not
work. It is /bad/ developer training that does not work.


> 
> I don't think a known dangerous and banned function is a good example.
> strcpy() is banned by both Microsoft and APple. Only Linux still
> embraces strcpy(). strcpy() still suffers the problem that is trying
> to be corrected.
> 

I don't think `strcpy()` is unsafe. The only real issue of it is that it
copies strings without returning its length. So in order to get the
length you have to scan the copied string again. `stpcpy()` would be a
much better alternative.

> 
> I generally consider the Glibc folks better trained in C and more
> knowledgeable of the C standard then me. If the Glibc folks are making
> the mistakes, then there is no hope in practice for folks like me or
> those who are just starting in C. There are too many sharp edges.
> 

Yes yes why don't you use Java? If you write C you are supposed to have
been well educated ('well educated' means at least you should RTFM
before ask). C is not for beginners.


-- 
Best regards,
LH_Mouse



signature.asc
Description: OpenPGP digital signature


Re: signed int performance question

2017-02-28 Thread Liu Hao

On 2017/2/28 14:38, Eyal Itkin wrote:

Hello,

I wanted to ask a question regarding the compilation of code samples
like the following:
"
int a = fetch_value();
int b = fetch_value();
int c = SOME_BIG_CONSTANT;

if ( c - b < a)
{
... 
}

pass_value(a + b);
return a + b;
"
The value "a + b" is used 3 times in this snippet, while the first use
is hidden in the if check. It seems that changing the if check to "if
( c < a + b ) " will allow the compiler to generate a more efficient
assembly code: 1 addition, used 3 times. However in all of my checks,
all GCC versions will prefer to calculate both the subtraction and the
addition, and to only use the addition's result 2 times.
I am not sure but I believe that on some targets if (a + b) does 
overflow you get an interrupt/exception. So these two expressions aren't 
equivalent.



I know that the case with unsigned integers is that all operations are
done in a modulu 2 ^ 32, and so changing the condition will change the
logical meaning of the check.
However, in signed integers, the logical meaning of any relation check
is only the theoretical meaning of the order relation between the
numbers in the group Z. Meaning that in a purely theoretical manner "a
+ b < c" is a relation order that is equivalent to "a < c - b" or even
" 0 < c - b - a".
What does 'in a purely theoretical manner' mean? In purely mathematical 
manner it may make sense, but in purely C programmatical manner it 
doesn't make any sense at all.



The only exception here is about any possible
integer overflow (above MAX_INT) or underflow (below MIN_INT), however
such cases are specified to be undefined in the C standard, and should
not harm the possible efficiency of the code generation.
The original code doesn't contain any UB given (c - b) does not 
underflow. You are no willing to transform code that doesn't contain UB 
into code that contains UB, are you?



Since a C programmer that follows the standard, and writes such a code
check, means only to check relations between 3 signed numbers in the
group Z, is there a reason why not to update the if check and to
generate a more efficient code?

Just because we might be more efficient doesn't mean we should introduce UB.


In addition, is there a way to
optionally raise warning in similar cases, so to warn against possible
signed integer overflows, in case the programmer is not aware of the
dangers in his code.
I am not sure what you want but there is an option `-Wstrict-overflow` 
that you may be interested in. Care must be taken that 
`-Wstrict-overflow=3` or above generates a lot of false warnings.


--
Best regards,
LH_Mouse



How to create new functions with a gcc plugin?

2017-06-07 Thread Benxi Liu
Hi all,

I'm using a gcc plugin to do some instrument during compilation.

Instrument in functions is simple. But how can I create new functions,
and append it to executables? I want to instrument in this way: to
create new functions,  add my codes into them, then instrument some
calls to them.

foo:
call to new_function;  //instrument a call

new_function: //created function
instrument  codes here

I think the most difficult part is to create functions. If it's
possible to do so, can I create functions at any phase during
compilation(with a gimple, or a rtl pass)?

 Any tips?


What kind of data would be put into code section?

2017-06-27 Thread Benxi Liu
Hello everyone,
I'm using GCC 5.4.0.  I know that in some situations, GCC will put
data into .text section, to improve performance. I know one case is
jump table, but I'm still curious about other cases. What kind of data
will be put into executable sections? Is there any way to avoid this?
Any ideas?


Re: What kind of data would be put into code section?

2017-06-28 Thread Benxi Liu
Hi R0b0t1,
Thanks for your reply!
That helps me a lot, and now I know it's a more complicated question
than I've thought.
I'm using GCC on X86_64, more specially, on linux x86_64. I also find
that when compiling with -O2, GCC will emits some data(like const
string or const int) into .text. I wonder if I could forbid this by
setting some GCC optimization options? I want to eliminate such data
in the code sections, and put them into data sections.

2017-06-28 12:40 GMT+08:00 R0b0t1 :
> On Tue, Jun 27, 2017 at 11:00 PM, Benxi Liu  wrote:
>> Hello everyone,
>> I'm using GCC 5.4.0.  I know that in some situations, GCC will put
>> data into .text section, to improve performance. I know one case is
>> jump table, but I'm still curious about other cases. What kind of data
>> will be put into executable sections? Is there any way to avoid this?
>> Any ideas?
>
> This is rather hard to answer because what .text and .data actually
> are depends very heavily on the target architecture. Except for very
> specific optimizations it doesn't matter. When it does, the compiler
> knows better than you.
>
> On von Neumann machines there is effectively no difference between
> .text and .data (or .bss) so the location of information is simply a
> nicety for the programmer. As far as optimizations go you could put
> data into .text when you need to ensure that it is very close in
> memory to the code that operates on it, but on modern machines
> instruction and data caches are separate. The vast majority of
> optimizations rely on reducing the number of comparisons and ensuring
> execution is as linear as possible. Where memory is located matters
> far less than what you are doing with it and how you are doing it.
>
> On Harvard architecture machines .text and .data are different and
> usually wildly so. Most simple microcontrollers treat .data in a
> special way - on the device it exists in the program memory, but the
> standard library loads it in to RAM at runtime. It is common to want
> more information available than can readily be loaded into memory.
> This is accomplished by marking the relevant variables with
> __attribute__((section(".rodata"))), __ATTR_PROGMEM__, PROGMEM, etc
> (implementation dependent). They must be swapped into and out of RAM
> manually using special instructions for reading the program memory.
> These instructions may have special forms for reading sequential
> blocks of memory, and the memory controller may perform best when
> reading sequentially. In these cases how you organize your data
> matters, but reading program memory with the relevant instructions is
> still separate (always, as far as I know) from the instruction fetcher
> that is always reading program memory for the processor, so there's no
> inherent benefit to interleaving code and data.
>
> R0b0t1.


Re: Optimization breaks inline asm code w/ptrs

2017-08-15 Thread Liu Hao

On 2017/8/14 20:41, Alan Modra wrote:

On Sun, Aug 13, 2017 at 10:25:14PM +0930, Alan Modra wrote:

On Sun, Aug 13, 2017 at 03:35:15AM -0700, David Wohlferd wrote:

Using "m"(*pStr) as an (unused) input parameter has no effect.


Use "m" (*(const void *)pStr) and ignore the warning, or use
"m" (*(const struct {char a; char x[];} *) pStr).


or even better "m" (*(const char (*)[]) pStr).



This should work in the sense that GCC now thinks bytes adjacent to 
`pStr` are subject to modification by the asm statement.


But I just tried GCC 7.2 and it seems that even if such a "+m" 
constraint is the only output parameter of an asm statement and there is 
no `volatile` or the "memory" clobber, GCC optimizer will not optimize 
the asm statement away, which is the case if a plain `"+m"(*pStr)` is used.



The issue is one of letting gcc know what memory is accessed by the
asm, if you don't want to use a "memory" clobber.  And there are very
good reasons to avoid clobbering all memory.

"m"(*pStr) ought to work IMO, but apparently just tells gcc you are
only interested in the first character.  Of course that is exactly
what *pStr is, but in this context it would be nicer if it meant the
entire array.


I take that back.  The relatively simple cast to differentiate a
pointer to a char from a pointer to an indeterminate length char array
makes it quite unnecessary for "m"(*pStr) to be treated as as array
reference.

I've opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81845 to
track the lack of documentation.



Yes. I hope there will be a memory-range constraint in the future.



--
Best regards,
LH_Mouse



TLS details on Linux for x86 and x64

2017-12-02 Thread Liu Hao
Dear x86 and x64 developers,

I and JonY are fumbling for implementation of native TLS support for x86
and x64 on Windows. Similarly to that on Linux, the address of a thread
local object on Windows is calculated via indirection from the FS or GS
segment register, but requires more than one steps. We have trouble
comprehending how TLS works on Linux at the moment:

Compiling this program:
```
int a = 12345;
int get_a(void){ return a; }

__thread int b = 67890;
int get_b(void){ return b; }
```

with `gcc -O2 -S` on x64 results in:
```
.file   "test.c"
.text
.p2align 4,,15
.globl  get_a
.type   get_a, @function
get_a:
.LFB0:
.cfi_startproc
movla(%rip), %eax
ret
.cfi_endproc
.LFE0:
.size   get_a, .-get_a
.p2align 4,,15
.globl  get_b
.type   get_b, @function
get_b:
.LFB1:
.cfi_startproc
movl%fs:b@tpoff, %eax
ret
.cfi_endproc
.LFE1:
.size   get_b, .-get_b
.globl  b
.section.tdata,"awT",@progbits
.align 4
.type   b, @object
.size   b, 4
b:
.long   67890
.globl  a
.data
.align 4
.type   a, @object
.size   a, 4
a:
.long   12345
.ident  "GCC: (Debian 6.3.0-18) 6.3.0 20170516"
.section.note.GNU-stack,"",@progbits

```

The questions are:

0) What is the magical `@tpoff` suffix supposed to do? The `@ntpoff` and
`@dtpoff` things are documented in System V ABI but there doesn't seem
to be anything about `@tpoff`.
1) How does LD tell that `b` (a thread-local integer) is different from
`a` (a static integer)? `a` is apparently offset from RIP, but what
thing is `b` offset from?
2) TLS initializers are placed into specially named sections. The
sections will have the names like `.tls$XXX` where `$XXX` is used to
sort these sections and discarded thereafter. How is LD supposed to
associate the section containing the initializer with the symbol of
object being initialized, without disordering?

Any help will be appreciated.

-- 
Best regards,
LH_Mouse



Re: TLS details on Linux for x86 and x64

2017-12-03 Thread Liu Hao
On 2017/12/3 1:00, Andrew Haley wrote:
> Have you read
> 
> https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
> 
> ?
> 

No. Well, will do. Thanks for the information.


-- 
Best regards,
LH_Mouse



Re: TLS details on Linux for x86 and x64

2017-12-03 Thread Liu Hao
On 2017/12/3 18:50, Jakub Jelinek wrote:
> Well, for GNU TLS (rather than GNU2) you want to read
> https://www.akkadia.org/drepper/tls.pdf
> 
>   Jakub
> 

Thank you too. I am thinking about migrating the technique used by GCC
on Linux to Windows, minimizing modification by our side, since it
should be considered to be essentially bugfree and stable.


-- 
Best regards,
LH_Mouse



Re: Can GCC generate totally native Microsoft Windows binaries as Visual Studio?

2018-01-05 Thread Liu Hao
On 2018/1/5 15:30, timofonic timofonic wrote:
> Hello.
> 
> Excuse me my ignorance, but that's what people say me.
> 
> GCC can compile to Microsoft Windows platforms, I understand it. But
> people say me it uses a "shim" between *nix and native Microsoft
> Windows API.
> 

There are three platforms on Windows: Cygwin, the original mingw32, and
mingw-w64.

It is only Cygwin that relies on the emulation layer. mingw32 and
mingw-w64 both make use of system DLLs directly, at the cost of absence
of quite a little infrastructure mandated by POSIX (e.g. signals other
than those in standard C, the fork syscall, terminal support,
filesystem, etc). Programs that require no such support (by having some
features disabled or worked around) should work fine.

> Some developers said me GCC on Windows is a "toy compiler".
> 
> Is this right?
> 

Over-subjective opinions like that are often initiation of *worthless*
wars. You learn *nothing* from the answer. Judge wisely.

> Kind regards.
> 


-- 
Best regards,
LH_Mouse



Re: C++ - compilation error for all implicit conversion

2018-03-05 Thread Liu Hao
在 2018年03月05日 15:42, Satya Prakash Prasad 写道:
> Is there a compiler flag that logs warning / error in case of any implicit
> conversions - like int32_t to uint32_t.
> 
> #include 
> #include 
> 
> using ::std::int32_t;
> using ::std::uint32_t;
> 
> int main(int argc, char *argv[])
> {
>int32_t x = 9;
> uint32_t i = x;
>uint32_t i1 = socketread(...); // returns  int32_t  -1 on error and >0
> on success
> std::cout << " " << x << " " << i << std::flush << std::endl;
> return 0;
> }
> c++ -std=c++11 -Wall -Wconversion -Wpedantic cast.cpp
> 
> I get no issues / warning / error during compilation - is there a way it
> can achieved.
> 

Try `-Wsign-conversion` in addition to `-Wconversion`.


-- 
Best regards,
LH_Mouse


Re: C++ - compilation error for all implicit conversion

2018-03-05 Thread Liu Hao
在 2018年03月05日 16:17, Satya Prakash Prasad 写道:
> I still does not throws an error:
> (... abridged ...)
> 
> /c/tools/mingw64/bin/c++ -std=c++11 -Wall -Wconversion -Wpedantic
> -Wextra -w -Wsign-compare -Wnarrowing -Wreturn-type -Wno-int-conversion
> -Wtype-limits -Wuseless-cast -Wsign-conversion -Wextra -Wsign-conversion
> cast.cpp
> 
> 

Your `-w` option suppressed all warnings.


-- 
Best regards,
LH_Mouse


Queries on GSoC project on OMPD interface

2018-03-12 Thread #LIU SIYUAN#
Dear all,

Hi, my name is Siyuan and this is my first time using the GCC mailing list!

I’m a senior CS student from Nanyang Technological University, Singapore who is 
interested in the GSOC project related to OMPD.

I personally have experience in C/C++ (development and performance 
optimization). I have also taken compiler course and implemented a 
mini-compiler (https://github.com/koallen/mini-go).

I participated in student cluster competitions so I also have experience in 
parallel computing (OpenMP, CUDA, MPI, etc.). And that’s why I’m particularly 
interested in the OMPD project.

Could you give me further information if I’m intending to apply for GSoC for 
this project?

Here are a few more links that you may find useful:
- My homepage & blog (https://shawnliu.me)
- My resume (https://shawnliu.me/files/siyuan_resume.pdf)
- My GitHub (https://github.com/koallen)

Regards,
Siyuan


Re: GSoC (Improvements to GCC on Windows)

2018-04-01 Thread Liu Hao
在 2018/4/2 13:54, Ko Phyo 写道:
> Thank for your valuable information. I couldn't made it for GSoC 2018 due
> to delay of my proposal application because of my University 1st semester
> examination. If you guys allow me to apply this kind of project in future
> (Not as GSoC applicant), I would be happily take part in it. Thanks in
> advanced.
> 
> On Apr 1, 2018 9:39 PM, "JonY" <10wa...@gmail.com> wrote:
>> Hi, I have been away for work this past month.
>>
>> There is a demand for a proper C++11 threading implementation for
>> Windows using native Windows APIs (post Vista APIs), I have not looked
>> too much into it, nor do I know if the scope is too big or too small for
>> a summer project. Liu Hao has worked on it, and should be familiar with
>> the scope, I have CC'ed him.
>>
>> Never having run any gsoc programs, are there any specific procedures to
>> get it started?
>>
>>

Well I thought I was almost forgotten.  XD

You might want to have a look at my benchmarking result at
<https://github.com/lhmouse/mcfgthread/wiki/Benchmarking>. Prebuilt
binaries are available at <https://gcc-mcf.lhmouse.com/>.

I haven't heard of anyone using this new thread model in production
environments. So it would be very kind of you for advertising for it.

-- 
Best regards,
LH_Mouse



Re: 【GCC version can not be changed】

2018-05-06 Thread Liu Hao

在 2018/5/5 20:13, 夏晗 写道:

root@Xia-Ubuntu:/usr/bin# gcc -v
使用内建 specs。
COLLECT_GCC=gcc
目标:x86_64-pc-linux-gnu
配置为:../configure -enable-checking=release -enable-languages=c,c++ 
-disable-multilib
线程模型:posix
gcc 版本 6.2.0 (GCC)
I have tried many methods like 'ln' and priority changing, but 'gcc -v' still 
maintain at '6.2.0'...



If you are using Ubuntu, the command `gcc` is a symlink to whichever 
version selected by your Ubuntu release and is the one used to build all 
system packages. Consequently, using a different target might result in 
binary incompatibility and is not recommended.


If you would like to invoke a different version of GCC, append the 
version number to it. This is true for all official releases and PPA 
packages.


For example, to invoke GCC 7 explicitly, you have to ensure it is 
installed by running `sudo apt-get install gcc-7`. The command `gcc-7` 
will be available thereafter and can be invoked either directly or 
indirectly by setting the `CC` environment variable.


--
Best regards,
LH_Mouse



GCC 8 Ada bootstrap failure on mingw-w64

2018-06-02 Thread Liu Hao

Dear developers,

(This issue is originally reported at 
.)


On mingw-w64, bootstrapping GCC 8 with Ada enabled results in the 
following error after stage 3:


```
GNATLINK 8.1.1 20180602
Copyright (C) 1995-2018, Free Software Foundation, Inc.
xgcc -c -gnatA -gnatWb -gnatiw -B../../ -I- -I../rts -I. 
-I/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/gcc/gcc/ada -gnatws 
E:\GitHub\MINGW-packages\mingw-w64

-gcc-git\src\build-x86_64-w64-mingw32\gcc\ada\tools\b~gnatdll.adb
checking for fptrap.h... 
E:/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/gcc/xg++.exe 
b~gnatdll.o ../link.o ../targext.o ../../ggc-none.
o ../rts\ada.o ../rts\a-charac.o ../rts\a-chlat1.o ../rts\gnat.o 
../rts\interfac.o ../rts\system.o ../rts\s-addope.o ../rts\s-atocou.o 
../rts\s-casuti.o ../rts\
s-imgboo.o ../rts\s-imgint.o ../rts\s-io.o ../rts\s-parame.o 
../rts\s-crtl.o ../rts\i-cstrea.o ../rts\s-stoele.o ../rts\s-stache.o 
../rts\s-strhas.o ../rts\s-ht
able.o ../rts\g-htable.o ../rts\s-string.o ../rts\g-string.o 
../rts\s-traent.o ../rts\s-unstyp.o ../rts\s-imguns.o ../rts\s-wchcon.o 
../rts\s-wchjis.o ../rts\s-
wchcnv.o ../rts\s-carun8.o ../rts\s-conca2.o ../rts\s-traceb.o 
../rts\a-exctra.o ../rts\s-exctab.o ../rts\a-ioexce.o ../rts\a-string.o 
../rts\a-contai.o ../rts\
s-except.o ../rts\s-soliin.o ../rts\s-soflin.o ../rts\s-secsta.o 
../rts\s-excdeb.o ../rts\s-exctra.o ../rts\s-memory.o ../rts\s-wchstw.o 
../rts\s-valuti.o ../rt
s\s-valllu.o ../rts\s-vallli.o ../rts\s-win32.o ../rts\s-mmosin.o 
../rts\s-mmap.o ../rts\s-os_lib.o ../rts\s-bitops.o ../rts\a-stmaco.o 
../rts\a-chahan.o ../rts
\s-excmac.o ../rts\a-elchha.o ../rts\s-addima.o ../rts\s-boustr.o 
../rts\s-stalib.o ../rts\s-dwalin.o ../rts\i-c.o ../rts\a-strmap.o 
../rts\s-trasym.o ../rts\a-
except.o ../rts\s-objrea.o ../rts\a-comlin.o ../rts\a-strsea.o 
../rts\a-strfix.o ../rts\a-tags.o ../rts\a-stream.o ../rts\g-os_lib.o 
../rts\s-ficobl.o ../rts\s-
finroo.o ../rts\a-finali.o ../rts\s-fileio.o ../rts\s-stopoo.o 
../rts\s-finmas.o ../rts\s-stposu.o ../rts\s-spsufi.o ../rts\s-stratt.o 
../rts\a-strunb.o ../rts\
s-valuns.o ../rts\s-valint.o ../rts\a-textio.o ../rts\g-dirope.o 
../rts\s-assert.o ../rts\s-pooglo.o ../rts\s-regexp.o ../rts\g-regexp.o 
../rts\g-comlin.o .\deb
ug.o .\types.o .\alloc.o .\gnatvsn.o .\hostparm.o .\output.o .\rident.o 
.\tree_io.o .\opt.o .\csets.o .\table.o .\widechar.o .\namet.o .\fmap.o 
.\targparm.o .\o
sint.o .\sdefault.o .\mdll-fil.o .\mdll-utl.o .\mdll.o .\switch.o 
.\gnatdll.o ../../libcommon-target.a ../../libcommon.a 
../../../libcpp/libcpp.a ../rts/libgnat
.a C:/MinGW/MSYS2/mingw64/lib/libiconv.a 
../../../libbacktrace/.libs/libbacktrace.a 
../../../libiberty/libiberty.a -no-pie -o ../../gnatdll.exe -L../rts\ -L.\ -
LE:/GitHub/MINGW-packages/mingw-w64-gcc-git/src/gcc/gcc/ada\ 
-L/mingw64/lib/gcc/x86_64-w64-mingw32/8.1.1/adalib/ 
E:\GitHub\MINGW-packages\mingw-w64-gcc-git\src\
build-x86_64-w64-mingw32\gcc\ada\rts\libgnat.a -Wl,--stack=0x200 
-B/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/./gcc/ 
-nostdinc+
+ -nostdinc++ 
-I/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/include/x86_64-w64-mingw32 
-I/e/GitHub/M
INGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/include 
-I/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/gcc/libstd
c++-v3/libsupc++ 
-I/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/gcc/libstdc++-v3/include/backward 
-I/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/gcc/libstdc
++-v3/testsuite/util 
-L/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/src 
-L/e/GitHub/MINGW-packages/mi
ngw-w64-gcc-git/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/src/.libs 
-L/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32
/x86_64-w64-mingw32/libstdc++-v3/libsupc++/.libs 
-B/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/src/.
libs 
-B/e/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/libsupc++/.libs 
-L/mingw64/x86_64-w64-mingw32/lib
 -L/mingw64/lib -isystem /mingw64/x86_64-w64-mingw32/include -isystem 
/mingw64/include -B/mingw64/x86_64-w64-mingw32/bin/ 
-B/mingw64/x86_64-w64-mingw32/lib/ -is
ystem /mingw64/x86_64-w64-mingw32/include -isystem 
/mingw64/x86_64-w64-mingw32/sys-include -static-libstdc++ -static-libgcc 
-static-libstdc++ -static-libgcc
xg++.exe: fatal error: -fuse-linker-plugin, but liblto_plugin-0.dll not 
found

compilation terminated.
gnatlink: error when calling 
E:/GitHub/MINGW-packages/mingw-w64-gcc-git/src/build-x86_64-w64-mingw32/gcc/xg++.exe

make[3]: *** [../gcc-interface/Makefile:2238: ../../gnatdll.exe] Error 4
make[3]: *** Waiting for unfinished jobs
```

It was configured with:
```
  $ ../

Re: GCC 8 Ada bootstrap failure on mingw-w64

2018-06-03 Thread Liu Hao

在 2018/6/2 18:11, Eric Botcazou 写道:

Any ideas about how to resolve this?


Compare with a known working version (e.g. GCC 7) and find the discrepancy.



Hmm there would be a huge amount of code to check. It seems to me that 
it is GNATLINK that causes the error. As I have absolutely no knowledge 
about Ada stuff I need some information about how it is built and how it 
is invoked.


BTW it is quite strange that both stage2 and stage3 didn't fail and the 
comparison was successful.


--
Best regards,
LH_Mouse



Re: __builtin_isnormal question

2018-06-04 Thread Liu Hao

在 2018/6/5 4:44, Steve Ellcey 写道:

Is there a bug in __builtin_isnormal or am I just confused as to what it
means?  There doesn't seem to be any actual definition/documentation for
the function.  __builtin_isnormal(0.0) is returning false.  That seems
wrong to me, 0.0 is a normal (as opposed to a denormalized) number isn't
it?  Or is zero special?



It is documented here (this block of text is copied from GCC 7 manual):

```
6.59 Other Built-in Functions Provided by GCC
GCC provides built-in versions of the ISO C99 floating-point comparison 
macros ... . In the same fashion, GCC provides fpclassify, isfinite, 
isinf_sign, **isnormal** and signbit built-ins used with __builtin_ 
prefxed. The isinf and isnan built-in functions appear both with and 
without the __builtin_ prefx.

```

ISO C says:
```
Description
2 The isnormal macro determines whether its argument value is normal 
(neither zero, subnormal, infinite, nor NaN). First, an argument 
represented in a format wider than its semantic type is converted to its 
semantic type. Then determination is based on the type of the argument.

```

`isnormal(x)` is roughly equivalent to `fpclassify(x) == FP_NORMAL`. 
When `x` is `0.0`, `fpclassify(x)` yields `FP_ZERO`, so the result is 
`false`.



--
Best regards,
LH_Mouse



Re: About a error reported by gcc

2018-07-04 Thread Liu Hao

在 2018/7/5 9:14, snow_xmas 写道:

Hello.
 The source in the attachment can not be compiled, because there is a 
variable in the lambda-introducer does not have a copy-constructor, but have a 
move-constructor. When a function object constructed by a lambda-expression 
like this, the compiler will report this error. And I have try it under gcc 
version 5.4, 7,3 and 8.1, the same error will be reported by all these versions 
of gcc. However, the lambda-expression can be moved normally to std::async, 
std::packaged_task, except std::function. So I believe it's a bug of gcc

 The classes which is prohibited to copy but allowed to move is necessary 
in some condition, like unique_ptr. Please give me a solution for this error.





The constructor of `std::function` in question requires the argument 
object to be CopyConstructible:


  ISO/IEC WG21 N1750
  Working Draft, Standard for Programming Language C++

  23.14.13.2.1 function construct/copy/destroy [func.wrap.func.con]

template function(F f);

  7 Requires: F shall be CopyConstructible.

, which, recursively, requires everything in your lambda that is 
captured by value to be CopyConstructible.


As an alternative, using `uptr = std::make_shared("c++17")` in 
place of `uptr = Test("c++17")` will overcome this problem.


--
Best regards,
LH_Mouse



An issue on loop optimization/vectorization

2018-07-11 Thread jiangning liu
For the case below, the code generated by “gcc -O3” is very ugly.



char g_d[1024], g_s1[1024], g_s2[1024];

void test_loop(void)

{

char *d = g_d, *s1 = g_s1, *s2 = g_s2;



for( int y = 0; y < 128; y++ )

{

for( int x = 0; x < 16; x++ )

d[x] = s1[x] + s2[x];

d += 16;

}

}



If we change “for( int x = 0; x < 16; x++ )” to be like “for( int x = 0; x
< 32; x++ )”, very beautiful vectorization code would be generated,



test_loop:

.LFB0:

.cfi_startproc

adrpx2, g_s1

adrpx3, g_s2

add x2, x2, :lo12:g_s1

add x3, x3, :lo12:g_s2

adrpx0, g_d

adrpx1, g_d+2048

add x0, x0, :lo12:g_d

add x1, x1, :lo12:g_d+2048

ldp q1, q2, [x2]

ldp q3, q0, [x3]

add v1.16b, v1.16b, v3.16b

add v0.16b, v0.16b, v2.16b

.p2align 3,,7

.L2:

str q1, [x0]

str q0, [x0, 16]!

cmp x0, x1

bne .L2

ret


The code generated for " for( int x = 0; x < 8; x++ )" is also very ugly.


It looks gcc has potential bugs on loop vectorization. Any idea?



Thanks,

-Jiangning


Re: GCC 8 Ada bootstrap failure on mingw-w64

2018-07-29 Thread Liu Hao
It has been long since this was reported. Today Alexey said he had found 
the reason. He believed that 899af040b0 was causing the failure [1]. 
After reverting it the x64 bootstrap succeeded, however the x86 
bootstrap failed with an ICE [2].


So now we have two issues here. Should we file a new PR for the first 
one? It seems a regression however. I am not familiar with Ada so I am 
looking forward to your help. Thanks.


[1] 
https://github.com/Alexpux/MINGW-packages/pull/3877#issuecomment-408651809
[2] 
https://github.com/Alexpux/MINGW-packages/pull/3877#issuecomment-408667559



--
Best regards,
LH_Mouse



Re: [llvm-dev] GCC 5 and -Wstrict-aliasing in JSON.h

2018-08-09 Thread Liu Hao

在 2018-08-10 06:20, Kim Gräsman 写道:

On Fri, Aug 10, 2018 at 12:02 AM, Jonathan Wakely 
wrote:


If GCC 4.9.3 thinks there's an aliasing violation it might
misoptimise. It doesn't matter if it's right or not, it matters if it
treats the code as undefined or not.

And apparently GCC does think there's a violation, because it warns.

Unless you're sure that not only is the code OK, but GCC is just being
noisy and doesn't misoptimise, then I think using -fno-strict-aliasing
is safer than just suppressing the warning.


Good point, I can see how that would play out nicer.

So this would probably need to be addressed in the LLVM build system, I'll
try and work up a patch tomorrow.



When I used to do such type punning in C, I got similar warnings. Then I 
looked for some solutions... I can't recall the principle now and I fail 
to find it in the C or C++ standard. Despite that, the solution is simple:


Only an lvalue of a pointer to (possibly CV-qualified) `void` or a 
pointer to a character type (in C) / any of `char`, `unsigned char` or 
`std::byte` (in C++) can alias objects.


That is to say, in order to eliminate the aliasing problem an 
intermediate lvalue pointer is required.


Hence, altering
```
return *reinterpret_cast(Union.buffer);
```
to
```
auto p = static_cast(Union.buffer);
return *static_cast(p);
```
will probably resolve this problem.



Thanks,
- Kim




--
Best regards,
LH_Mouse



Re: [llvm-dev] GCC 5 and -Wstrict-aliasing in JSON.h

2018-08-10 Thread Liu Hao

在 2018-08-10 18:53, Kim Gräsman 写道:

I'm worried that this might only serve to trick the compiler.



It shouldn't. If it was merely a trick then `std::aligned_storage` would 
be completely unusable.



Explicitly using `-fno-strict-aliasing` for GCC < 6 would seem more
direct to me -- as Jonathan says, if the compiler classifies a strict
aliasing rule violation as undefined behavior, and that is further
used to optimize in an unexpected manner, it doesn't matter whether it
warns or not. Then again, I guess disabling strict aliasing would also
disable optimizations that are generally useful for LLVM as a whole.

I'm reading up on safe aliasing techniques, but so far nothing stands
out to me as applicable in this scenario.



The C++ standard requires creation of objects in such ways to use new 
expressions (a.k.a. placement new). Athough [intro.object]-3 only 
defines /provides storage/ for arrays of a character type or 
`std::byte`, the specification of `aligned_storage` and `aligned_union` 
in [meta.trans.other] doesn't require the type used as uninitialized 
storage to be an array of a character type or `std::byte` - in fact it 
cannot be, because alignment information is not part of the nominal type 
system of C and will be lost when obtaining the `type` member.


Focusing on the cast: As long as the compiler is unable to know whether 
a placement new has been made on the union (i.e. whether it is providing 
storage for another object), I don't think a standard-conforming 
compiler is ever allowed to ignore such possibility.



- Kim




--
Best regards,
LH_Mouse



Can offsetting a non-null pointer result in a null one?

2018-08-14 Thread Liu Hao

Dear GCC people,

At the moment, with GCC 8.2, I compile the program

```
int foo(const char *p)
{
if(p == 0)
return 2;
const char *q = p + 1;
if(q == 0)
return 1;
return 0;
}
```

using

```
gcc-8 test.c -Wall -Wextra -Wpedantic -O3 -S
```

and get the following assembly (with irrelevant directives stripped out):

```
foo:
testq   %rdi, %rdi
je  .L3
xorl%eax, %eax
cmpq$-1, %rdi
sete%al
ret
.L3:
movl$2, %eax
ret
```

My question is that, when the first `if` is not taken, i.e. when `p` is 
not null, is it possible that after adding 1 to `p` would result in a 
null `q`?  Clang has been assuming that the result can't be null and 
optimize out the second `if` statement for years, but GCC is still 
emitting a check there. Are there any special reasons that prevent GCC 
from optimizing code this way?


--
Best regards,
LH_Mouse



Re: Can offsetting a non-null pointer result in a null one?

2018-08-14 Thread Liu Hao

在 2018-08-15 12:48, Jeff Law 写道:

I just don't think anyone's ever bothered to catch this case.  I believe
there is a BZ which touches on this issue.



Yes, here it is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655

This PR uses a placement new as the example, which in GCC 8 assumes the 
pointer argument to `operator new()` is never null. But for a general 
case it is still not fixed.


So it looks like nobody has been working on this. I will just fix my 
code then.



Jeff




--
Best regards,
LH_Mouse



Re: Can offsetting a non-null pointer result in a null one?

2018-08-20 Thread Liu Hao
在 2018-08-20 16:27, Richard Biener 写道:
> Btw, I can't find wording in the standards that nullptr + 1 is
> invoking undefined behavior,
> that is, that pointer arithmetic is only allowed on pointers pointing
> to a valid object.
> Any specific pointers?
> 

The C standard only defines addition and subtraction for pointers 
pointing to elements and past-the-end positions of arrays [1]. As a null 
pointer ipoints to nothing, the 'otherwise' clauses apply, rendering the 
behavior undefined.

The C++ standard has a couple of similar paragraphs [2]. In addition to 
that, there is a special case for null pointers and the integer zero 
[3]. There is no special case for null pointers and integers other than 
zero.

[1] ISO/IEC WG14 N1570, 6.5.6 Additive operators, 8, 9.
[2] ISO/IEC WG21 N4750, 8.5.6 Additive operators, 4, 5
[3] ISO/IEC WG21 N4750, 8.5.6 Additive operators, 7


-- 
Best regards,
LH_Mouse


RTEMS Port of GCJ Progress Report

2011-07-08 Thread Jie Liu
Hi,

This is the second report after “GCJ Porting for RTEMS Status
Report”[1]. During this time, I am
--- Focusing on running the testsuite and fix encountered problem
--- Submitting patches to related community

In details, I have got the testsuite result for boehm-gc, libffi and
libjava, which can be seen in rtemsgcj project’s trunk[2]. For
problems encountered in boehm-gc, they are fixed under Ivan’s help,
and the problems can be seen in mailing list[3](Just search rtems for
related information). For problems encountered in libjava, I have send
a mail to java-patches[4], some problems have been fixed while others
still in fixing.

The patches have been send to related communities, although libjava’s
patch still needs modify. For boehm-gc, the patch can be seen on
java-patches mailing list[5]. For RTEMS, the patch can be seen on
RTEMS mailing list[6]. For libjava, the patch also can be seen on
java-patches mailing list[7]. The libffi is worked on RTEMS/i386, so
there is no patch for it.

Next step, I will
+++ Make as more libjava test case PASS as possible and get the
libjava patch merged
+++ Move to a new architecture and repeat the porting progress

If you have any ideas on this project, please do not hesitate to contact me. :)

[1] http://www.rtems.org/pipermail/rtems-users/2011-June/008574.html
[2]http://rtemsgcj.googlecode.com/svn/trunk/gcjtest/testsuite_out
[3] http://news.gmane.org/gmane.comp.programming.garbage-collection.boehmgc
[4] http://gcc.gnu.org/ml/java-patches/2011-q3/msg00016.html
[5] http://gcc.gnu.org/ml/java-patches/2011-q2/msg00067.html
[6] http://www.rtems.org/pipermail/rtems-users/2011-June/008573.html
[7] http://gcc.gnu.org/ml/java-patches/2011-q3/msg1.html

Best Regards,
Jie


RE: [RFC] Add middle end hook for stack red zone size

2011-07-25 Thread Jiangning Liu
Hi,

One month ago, I sent out this RFC to *gcc-patches* mail list, but I didn't 
receive any response yet. So I'm forwarding this mail to *gcc* mail list. Can 
anybody here really give feedback to me?

Appreciate your help in advance!

-Jiangning

-Original Message-
From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] 
Sent: Tuesday, July 19, 2011 6:18 PM
To: Jiangning Liu
Cc: gcc-patc...@gcc.gnu.org; vmaka...@redhat.com; dje@gmail.com; Richard 
Henderson; Ramana Radhakrishnan
Subject: Re: [RFC] Add middle end hook for stack red zone size

2011/7/19 Jiangning Liu :
>
> I see a lot of feedbacks on other posts, but mine is still with ZERO
> response in the past 3 weeks, so I'm wondering if I made any mistake in my
> process? Who can help me?

It would be worth CC'ing the other relevant target maintainers as well
to get some feedback since the patch touches ARM, x86 and Powerpc.
I've added the maintainers for i386 and PPC to the CC list using the
email addresses from the MAINTAINERS file.

Thanks,
Ramana

>
> Thanks,
> -Jiangning
>
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Jiangning Liu
> Sent: Tuesday, July 05, 2011 8:32 AM
> To: gcc-patc...@gcc.gnu.org; rgue...@gcc.gnu.org
> Subject: RE: [RFC] Add middle end hook for stack red zone size
>
> PING...
>
> I just merged with the latest code base and generated new patch as attached.
>
> Thanks,
> -Jiangning
>
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
>> Sent: 2011年6月28日 4:38 PM
>> To: gcc-patc...@gcc.gnu.org
>> Subject: [RFC] Add middle end hook for stack red zone size
>>
>> This patch is to fix PR38644, which is a bug with long history about
>> stack red zone access, and PR30282 is correlated.
>>
>> Originally red zone concept is not exposed to middle-end, and back-end
>> uses special logic to add extra memory barrier RTL and help the
>> correct dependence in middle-end. This way different back-ends must
>> handle red zone problem by themselves. For example, X86 target
>> introduced function
>> ix86_using_red_zone() to judge red zone access, while POWER introduced
>> offset_below_red_zone_p() to judge it. Note that they have different
>> semantics, but the logic in caller sites of back-end uses them to
>> decide whether adding memory barrier RTL or not. If back-end
>> incorrectly handles this, bug would be introduced.
>>
>> Therefore, the correct method should be middle-end handles red zone
>> related things to avoid the burden in different back-ends. To be
>> specific for PR38644, this middle-end problem causes incorrect
>> behavior for ARM target.
>> This patch exposes red zone concept to middle-end by introducing a
>> middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in
>> target.def, and by default its value is 0. Back-end may redefine this
>> function to provide concrete red zone size according to specific ABI
>> requirements.
>>
>> In middle end, scheduling dependence is modified by using this hook
>> plus checking stack frame pointer adjustment instruction to decide
>> whether memory references need to be all flushed out or not. In
>> theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end
>> would not be required to specially handle this scheduling dependence
>> issue by introducing extra memory barrier RTL.
>>
>> In back-end, the following changes are made to define the hook,
>> 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
>> ix86_stack_red_zone_size() in i386.c, which is an newly introduced
>> function.
>> 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
>> rs6000_stack_red_zone_size() in rs6000.c, which is also a newly
>> defined function.
>> 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
>> default_stack_red_zone_size in targhooks.c, and this function returns
>> 0, which means ARM eabi and others don't support red zone access at all.
>>
>> In summary, the relationship between ABI and red zone access is like
>> below,
>>
>> -
>> |   ARCH   |  ARM  |   X86 |POWER  | others |
>> |--|---|---|---||
>> |ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
>> |--|---|---|---||--||
>> | RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
>> |--|---|---|---||--||
>> | RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
>> -
>>
>> Thanks,
>> -Jiangning
>
>
>
>






RE: [RFC] Add middle end hook for stack red zone size

2011-07-31 Thread Jiangning Liu
Joern,

Thanks for your valuable feedback! This is only a RFC, and I will send out 
formal patch along with ChangLog later on. 

Basically, my patch is only to add new dependence in scheduler, and it only 
blocks some instruction movements, so it is NO RISK to compiler correctness. 
For whatever stack pointer changes you gave in different scenarios, the current 
code base should already work. My patch intends neither to replace old 
dependences, nor maximize the scheduler capability due to the existence of red 
zone in stack. It is only to block the memory access moving over stack pointer 
adjustment if distance is beyond red zone size, which is an OS requirement due 
to interruption existence. 

Stack adjustment in epilogue is a very general usage in stack frame. It's quite 
necessary to solve the general problem in middle-end rather than in back-end. 
Also, that old patch you attached is to solve the data dependence between two 
memory accesses, but stack pointer doesn't really have data dependence with 
memory access without using stack pointer, so they have different stories. 
Alternative solution of without adding blunt scheduling barrier is we insert an 
independent pass before scheduler to create RTL barrier by using the same 
interface stack_red_zone_size, but it would really be an over-design, if we add 
a new pass only for this *small* functionality.

In my patch, *abs* of offset is being used, so you are right that it's possible 
to get false positive to be too conservative, but there won't exist false 
negative, because my code would only add new dependences. 

Since the compilation is based on function, it would be OK if red zone size 
varies due to different ABI. Could you please tell me exactly on what system 
being supported by GCC red zone size can be different for incoming and 
outgoing? And also how scheduler guarantee the correctness in current code 
base? Anyway, I don't think my patch will break the original solution.

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Joern Rennecke
> Sent: Tuesday, July 26, 2011 10:33 AM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org; gcc-patc...@gcc.gnu.org; vmaka...@redhat.com;
> dje@gmail.com; Richard Henderson; Ramana Radhakrishnan; 'Ramana
> Radhakrishnan'
> Subject: RE: [RFC] Add middle end hook for stack red zone size
> 
> Quoting Jiangning Liu :
> 
> > Hi,
> >
> > One month ago, I sent out this RFC to *gcc-patches* mail list, but I
> >  didn't receive any response yet. So I'm forwarding this mail to
> > *gcc* mail list. Can anybody here really give feedback to me?
> 
> Well, I couldn't approve any patch, but I can point out some issues with your 
> patch.
> 
> First, it's missing a ChangeLog, and you don't state how you have tested it.
> And regarding the code in sched_analyze_1, I think you'll get false positives 
> with
> alloca, and false negatives when registers are involved to compute offsets or 
> to
> restore the stack pointer from.
> 
> FWIW, I think generally blunt scheduling barriers should be avoided, and 
> instead
> the dependencies made visible to the scheduler.
> E.g., I've been working with another architecture with a redzone, where at 
> -fno-
> omit-frame-pointer, the prologue can put pretend_args into the redzone, then 
> after
> stack adjustment and frame allocation, these arguments are accessed via the 
> frame
> pointer.
> 
> With the attached patchlet, alias analysis works for this situation too, so 
> no blunt
> scheduling block is required.
> 
> Likewise, with stack adjustments, they should not affect scheduling in 
> general, but
> be considered to clobber the part of the frame that is being exposed to 
> interrupt
> writes either before or after the adjustment.
> At the moment, each port that wants to have such selective scheduling 
> blockages
> has to define a stack_adjust pattern with a memory clobber in a parallel, 
> with a
> memref that shows the high water mark of possible interrupt stack writes.
> Prima facia it would seem convenient if you only had to tell the middle-end 
> about
> the redzone size, and it could figure out the implicit clobbers when the 
> stack is
> changed.  However, when a big stack adjustment is being made, or the stack is
> realigned, or restored from the frame pointer / another register where it was
> saved due to realignment, the adjustment is not so obvious.  I'm not sure if 
> you can
> actually create an robust interface that's simpler to use than putting the 
> right
> memory clobber in the stack adjust pattern.  Note also that the redzone size 
> can
> vary from function to function depending on ABI-altering attributes, in 
> particular
> for interrupt functions, which can also have different incoming and outgoing
> redzone sizes.  Plus, you can have an NMI / reset handler which can use the 
> stack
> like an ordinary address register.





RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu
The answer is ARM can. However, if you look into the bugs PR30282 and 
PR38644, PR44199, you may find in history, there are several different cases

in different ports reporting the similar failures, covering x86, PowerPC and

ARM. You are right, they were all fixed in back-ends in the past, but we
should 
fix the bug in a general way to make GCC infrastructure stronger, rather 
than fixing the problem target-by-target and case-by-case! If you further 
look into the back-end fixes in x86 and PowerPC, you may find they looks 
quite similar in back-ends. 

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Jakub Jelinek
> Sent: Monday, August 01, 2011 5:12 PM
> To: Jiangning Liu
> Cc: 'Joern Rennecke'; gcc@gcc.gnu.org; gcc-patc...@gcc.gnu.org;
> vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
> Radhakrishnan; 'Ramana Radhakrishnan'
> Subject: Re: [RFC] Add middle end hook for stack red zone size
> 
> On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote:
> > It's quite necessary to solve the general problem in middle-end rather
than in
> back-end.
> 
> That's what we disagree on.  All back-ends but ARM are able to handle it
> right, why can't ARM too?  The ABI rules for stack handling in the
epilogues
> are simply too diverse and complex to be handled easily in the scheduler.
> 
>   Jakub






A case that PRE optimization hurts performance

2011-08-01 Thread Jiangning Liu
Hi,

For the following simple test case, PRE optimization hoists computation
(s!=1) into the default branch of the switch statement, and finally causes
very poor code generation. This problem occurs in both X86 and ARM, and I
believe it is also a problem for other targets. 

int f(char *t) {
int s=0;

while (*t && s != 1) {
switch (s) {
case 0:
s = 2;
break;
case 2:
s = 1;
break;
default:
if (*t == '-') 
s = 1;
break;
}
t++;
}

return s;
}

Taking X86 as an example, with option "-O2" you may find 52 instructions
generated like below,

 :
   0:   55  push   %ebp
   1:   31 c0   xor%eax,%eax
   3:   89 e5   mov%esp,%ebp
   5:   57  push   %edi
   6:   56  push   %esi
   7:   53  push   %ebx
   8:   8b 55 08mov0x8(%ebp),%edx
   b:   0f b6 0amovzbl (%edx),%ecx
   e:   84 c9   test   %cl,%cl
  10:   74 50   je 62 
  12:   83 c2 01add$0x1,%edx
  15:   85 c0   test   %eax,%eax
  17:   75 23   jne3c 
  19:   8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi
  20:   0f b6 0amovzbl (%edx),%ecx
  23:   84 c9   test   %cl,%cl
  25:   0f 95 c0setne  %al
  28:   89 c7   mov%eax,%edi
  2a:   b8 02 00 00 00  mov$0x2,%eax
  2f:   89 fb   mov%edi,%ebx
  31:   83 c2 01add$0x1,%edx
  34:   84 db   test   %bl,%bl
  36:   74 2a   je 62 
  38:   85 c0   test   %eax,%eax
  3a:   74 e4   je 20 
  3c:   83 f8 02cmp$0x2,%eax
  3f:   74 1f   je 60 
  41:   80 f9 2dcmp$0x2d,%cl
  44:   74 22   je 68 
  46:   0f b6 0amovzbl (%edx),%ecx
  49:   83 f8 01cmp$0x1,%eax
  4c:   0f 95 c3setne  %bl
  4f:   89 df   mov%ebx,%edi
  51:   84 c9   test   %cl,%cl
  53:   0f 95 c3setne  %bl
  56:   89 de   mov%ebx,%esi
  58:   21 f7   and%esi,%edi
  5a:   eb d3   jmp2f 
  5c:   8d 74 26 00 lea0x0(%esi,%eiz,1),%esi
  60:   b0 01   mov$0x1,%al
  62:   5b  pop%ebx
  63:   5e  pop%esi
  64:   5f  pop%edi
  65:   5d  pop%ebp
  66:   c3  ret
  67:   90  nop
  68:   b8 01 00 00 00  mov$0x1,%eax
  6d:   5b  pop%ebx
  6e:   5e  pop%esi
  6f:   5f  pop%edi
  70:   5d  pop%ebp
  71:   c3  ret

But with command line option "-O2 -fno-tree-pre", there are only 12
instructions generated, and the code would be very clean like below,

 :
   0:   55  push   %ebp
   1:   31 c0   xor%eax,%eax
   3:   89 e5   mov%esp,%ebp
   5:   8b 55 08mov0x8(%ebp),%edx
   8:   80 3a 00cmpb   $0x0,(%edx)
   b:   74 0e   je 1b 
   d:   80 7a 01 00 cmpb   $0x0,0x1(%edx)
  11:   b0 02   mov$0x2,%al
  13:   ba 01 00 00 00  mov$0x1,%edx
  18:   0f 45 c2cmovne %edx,%eax
  1b:   5d  pop%ebp
  1c:   c3  ret

Do you have any idea about this?

Thanks,
-Jiangning





RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu
Hi Jakub,

Appreciate for your valuable comments!

I think SPARC V9 ABI doesn't have red zone defined, right? So
stack_red_zone_size should be defined as zero by default, the scheduler
would block moving memory accesses across stack adjustment no matter what
the offset is. I don't see any risk here. Also, in my patch function *abs*
is being used to avoid the opposite stack direction issue as you mentioned.

Some people like you insist on the ABI diversity, and actually I agree with
you on this. But part of the ABI definition is general for all targets. The
point here is memory access beyond stack red zone should be avoided, which
is the general part of ABI that compiler should guarantee. For this general
part, middle end should take the responsibility.

Thanks,
-Jiangning

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Monday, August 01, 2011 6:31 PM
> To: Jiangning Liu
> Cc: 'Joern Rennecke'; gcc@gcc.gnu.org; gcc-patc...@gcc.gnu.org;
> vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
> Radhakrishnan; 'Ramana Radhakrishnan'
> Subject: Re: [RFC] Add middle end hook for stack red zone size
> 
> On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote:
> > ARM. You are right, they were all fixed in back-ends in the past, but
> we
> > should
> > fix the bug in a general way to make GCC infrastructure stronger,
> rather
> > than fixing the problem target-by-target and case-by-case! If you
> further
> > look into the back-end fixes in x86 and PowerPC, you may find they
> looks
> > quite similar in back-ends.
> >
> 
> Red zone is only one difficulty, your patch is e.g. completely ignoring
> existence of biased stack pointers (e.g. SPARC -m64 has them).
> Some targets have stack growing in opposite direction, etc.
> We have really a huge amount of very diverse ABIs and making the
> middle-end
> grok what is an invalid stack access is difficult.
> 
>   Jakub






I am work with lm32 and want to help with the lm32 target in gcc

2011-08-31 Thread Xiangfu Liu

Hi

I am work with lm32 and want to help with the lm32 target in gcc.
the device name is milkymist one. with FPGA software CPU core lm32.

what is the first step I should do for help with lm32 target in gcc?
I have read this http://gcc.gnu.org/contribute.html#legal
I have to start with small contributions. should I do this legal stuff?

thanks


Re: I am work with lm32 and want to help with the lm32 target in gcc

2011-08-31 Thread Xiangfu Liu

Hi

can you send me the copyright assignment forms.
it should be an assignment for all future changes, right?

thanks for reply

On 09/01/2011 11:32 AM, Liu wrote:

On Thu, Sep 1, 2011 at 11:00 AM, Xiangfu Liu  wrote:

Hi

I am work with lm32 and want to help with the lm32 target in gcc.
the device name is milkymist one. with FPGA software CPU core lm32.

what is the first step I should do for help with lm32 target in gcc?
I have read this http://gcc.gnu.org/contribute.html#legal
I have to start with small contributions. should I do this legal stuff?

thanks



ask for a assignment, sign it, send it back to FSF, and summit your patch.

--Liu




Is VRP is too conservative to identify boolean value 0 and 1?

2011-09-01 Thread Jiangning Liu
Hi,

For the following small case,

int f(int i, int j)
{
if (i==1 && j==2)
return i;
else
return j;
}

with -O2 option, GCC has vrp2 dump like below,

==

Value ranges after VRP:

i_1: VARYING
i_2(D): VARYING
D.1249_3: [0, +INF]
j_4(D): VARYING
D.1250_5: [0, +INF]
D.1251_6: [0, +INF]
j_10: [2, 2]  EQUIVALENCES: { j_4(D) } (1 elements)


Removing basic block 3
f (int i, int j)
{
  _Bool D.1251;
  _Bool D.1250;
  _Bool D.1249;

:
  D.1249_3 = i_2(D) == 1;
  D.1250_5 = j_4(D) == 2;
  D.1251_6 = D.1250_5 & D.1249_3;
  if (D.1251_6 != 0)
goto ;
  else
goto ;

:

:
  # i_1 = PHI <1(3), j_4(D)(2)>
  return i_1;

}



Variable D.1249_3, D.1250_5 and D.1251_6 should be boolean values, so the
their value ranges should be

D.1249_3: [0, 1]
D.1250_5: [0, 1]
D.1251_6: [0, 1]

So why current VRP can't find out this value range?

I'm asking this question because the optimizations in back-end need this
info to do advanced optimization.

Thanks,
-Jiangning




RE: Is VRP is too conservative to identify boolean value 0 and 1?

2011-09-02 Thread Jiangning Liu
Andrew,

I realize I needn't back-end solution for my case at all, and in middle end I 
can directly use the _Bool type info! Appreciate your reply!

Thanks,
-Jiangning


> -Original Message-
> From: Andrew Pinski [mailto:pins...@gmail.com]
> Sent: Friday, September 02, 2011 2:27 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: Re: Is VRP is too conservative to identify boolean value 0 and
> 1?
> 
> On Thu, Sep 1, 2011 at 10:58 PM, Jiangning Liu 
> wrote:
> > D.1249_3: [0, 1]
> > D.1250_5: [0, 1]
> > D.1251_6: [0, 1]
> 
> Those are equivalent to [0, MAX] as _Bool only has two different
> values, 0 and 1 (MAX).  Can you explain more about the optimization
> which you are working on that needs the ranges as (int)[0,1] rather
> than (_Bool)[0,MAX] ?
> 
> Thanks,
> Andrew Pinski





Re: I am work with lm32 and want to help with the lm32 target in gcc

2011-09-07 Thread Xiangfu Liu

On 09/04/2011 03:36 PM, Liu wrote:

Please email the following information toass...@gnu.org  , and we
will send you the assignment form for your past and future changes.

Please use your full legal name (in ASCII characters) as the subject
line of the message.
--
REQUEST: SEND FORM FOR PAST AND FUTURE CHANGES


thanks Proljc

email sent out


Re: [HELP] Fwd: Mail delivery failed: returning message to sender

2011-09-08 Thread Xiangfu Liu

On 09/08/2011 12:11 PM, Joe Buck wrote:

On Wed, Sep 07, 2011 at 08:08:01PM -0700, Xiangfu Liu wrote:

>  Hi
>
>  I got the pdf file. and I also sent out the papers by postal mail.
>  where is the pdf file I should send to?
>
>  I have tried:
>  copyright-cl...@fsf.org  ass...@gnu.org
>
>  and I don't know Donald R. Robertson's email address

copyright-cl...@fsf.org  should be correct.  Maybe it was bounced
because of a file size limit or some configuration issue?  I suggest
seeing if you can send a shorter message.


thanks Joe

the pdf size is 4MB. maybe that is the problem.


Re: [HELP] Fwd: Mail delivery failed: returning message to sender

2011-09-08 Thread Xiangfu Liu

On 09/08/2011 05:38 PM, Jonathan Wakely wrote:


>
>  the pdf size is 4MB. maybe that is the problem.

Please use some common sense before forwarding a 100KB message to the
mailing list, where it gets sent to hundreds of people.  It would only
have taken you a few seconds to remove the base64-encoded attachment
before sending it to the list.


sorry about that, thanks for notice.


RE: A case that PRE optimization hurts performance

2011-09-15 Thread Jiangning Liu
Hi Richard,

I slightly changed the case to be like below,

int f(char *t) {
int s=0;

while (*t && s != 1) {
switch (s) {
case 0:   /* path 1 */
s = 2;
break;
case 2:   /* path 2 */
s = 3; /* changed */
break;
default:  /* path 3 */
if (*t == '-') 
s = 2;
break;
}
t++;
}

return s;
}

"-O2" is still worse than "-O2 -fno-tree-pre". 

"-O2 -fno-tree-pre" result is 

f:
pushl   %ebp
xorl%eax, %eax
movl%esp, %ebp
movl8(%ebp), %edx
movzbl  (%edx), %ecx
jmp .L14
.p2align 4,,7
.p2align 3
.L5:
movl$2, %eax
.L7:
addl$1, %edx
cmpl$1, %eax
movzbl  (%edx), %ecx
je  .L3
.L14:
testb   %cl, %cl
je  .L3
testl   %eax, %eax
je  .L5
cmpl$2, %eax
.p2align 4,,5
je  .L17
cmpb$45, %cl
.p2align 4,,5
je  .L5
addl$1, %edx
cmpl$1, %eax
movzbl  (%edx), %ecx
jne .L14
.p2align 4,,7
.p2align 3
.L3:
popl%ebp
.p2align 4,,2
ret
.p2align 4,,7
.p2align 3
.L17:
movb$3, %al
.p2align 4,,3
jmp .L7

While "-O2" result is 

f:
pushl   %ebp
xorl%eax, %eax
movl%esp, %ebp
movl8(%ebp), %edx
pushl   %ebx
movzbl  (%edx), %ecx
jmp .L14
.p2align 4,,7
.p2align 3
.L5:
movl$1, %ebx
movl$2, %eax
.L7:
addl$1, %edx
testb   %bl, %bl
movzbl  (%edx), %ecx
je  .L3
.L14:
testb   %cl, %cl
je  .L3
testl   %eax, %eax
je  .L5
cmpl$2, %eax
.p2align 4,,5
je  .L16
cmpb$45, %cl
.p2align 4,,5
je  .L5
cmpl$1, %eax
setne   %bl
addl$1, %edx
testb   %bl, %bl
movzbl  (%edx), %ecx
jne .L14
.p2align 4,,7
.p2align 3
.L3:
popl%ebx
popl%ebp
ret
.p2align 4,,7
.p2align 3
.L16:
movl$1, %ebx
movb$3, %al
jmp .L7

You may notice that register ebx is introduced, and some more instructions
around ebx are generated as well. i.e.

setne   %bl
testb   %bl, %bl

I agree with you that in theory PRE does the right thing to minimize the
computation cost on gimple level. However, the problem is the cost of
converting comparison result to a bool value is not considered, so it
actually makes binary code worse. For this case, as I summarized below, to
complete the same functionality "With PRE" is worse than "Without PRE" for
all three paths,

* Without PRE,

Path1:
movl$2, %eax
cmpl$1, %eax
je  .L3

Path2:
movb$3, %al
cmpl$1, %eax
je  .L3

Path3:
cmpl$1, %eax
jne .L14

* With PRE,

Path1:
movl$1, %ebx
movl$2, %eax
testb   %bl, %bl
je  .L3

Path2:
movl$1, %ebx
movb$3, %al
testb   %bl, %bl
je  .L3

Path3:
cmpl$1, %eax
setne   %bl
testb   %bl, %bl
jne .L14

Do you have any more thoughts?

Thanks,
-Jiangning

> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, August 02, 2011 5:23 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: Re: A case that PRE optimization hurts performance
> 
> On Tue, Aug 2, 2011 at 4:37 AM, Jiangning Liu 
> wrote:
> > Hi,
> >
> > For the following simple test case, PRE optimization hoists
> computation
> > (s!=1) into the default branch of the switch statement, and finally
> causes
> > very poor code generation. This problem occurs in both X86 and ARM,
> and I
> > believe it is also a problem for other targets.
> >
> > int f(char *t) {
> >    int s=0;
> >
> >    while (*t && s != 1) {
> >        switch (s) {
> >        case 0:
> >            s = 2;
> >            break;
> >        case 2:
> >            s = 1;
> >            break;
> >        default:
> >            if (*t == '-')
> >                s = 1;
> >            break;
> >        }
> >        t++;
> >    }
> >
> >    return s;
> > }
> >
> > Taking X86 as an example, with option "-O2" you may find 52
> instructions
> > generated like below,
> >
> >  :
> 

A question about detecting array bounds for case Warray-bounds-3.c

2011-09-21 Thread Jiangning Liu
Hi,

For case gcc/testsuite/gcc.dg/Warray-bounds-3.c, obviously it is an invalid
C program, because the last iterations of all the loops cause the access of
arrays is beyond the max size of corresponding array declarations. The
condition of checking upper bound should be "<" rather than "<=". 

Right now, GCC compiler doesn't report any warning messages for this case,
should it be a bug in both test case and compiler?

But looking at http://gcc.gnu.org/PR31227 , it seems this test case is
designed to be like this on purpose. Anybody can explain about this?

The case is like below,

/* { dg-do compile } */
/* { dg-options "-O2 -Warray-bounds" } */
/* based on PR 31227 */

struct S
{
  const char *abday[7];
  const char *day[7];
  const char *abmon[12];
  const char *mon[12];
  const char *am_pm[2];
};

...

  for (cnt = 0; cnt <= 7; ++cnt)
{
  iov[2 + cnt].iov_base = (void *) (time->abday[cnt] ?: "");
  iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
}

  for (; cnt <= 14; ++cnt)
{
  iov[2 + cnt].iov_base = (void *) (time->day[cnt - 7] ?: "");
  iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
}

  for (; cnt <= 26; ++cnt)
{
  iov[2 + cnt].iov_base = (void *) (time->abmon[cnt - 14] ?: "");
  iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
}

  for (; cnt <= 38; ++cnt)
{
  iov[2 + cnt].iov_base = (void *) (time->mon[cnt - 26] ?: "");
  iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
}

  for (; cnt <= 40; ++cnt)
{
  iov[2 + cnt].iov_base =  (void *) (time->am_pm[cnt - 38] ?: "");
  iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
}

Thanks,
-Jiangning





RE: A question about detecting array bounds for case Warray-bounds-3.c

2011-09-26 Thread Jiangning Liu
PING...

> -Original Message-
> From: Jiangning Liu [mailto:jiangning@arm.com]
> Sent: Thursday, September 22, 2011 10:19 AM
> To: gcc@gcc.gnu.org
> Cc: 'ja...@gcc.gnu.org'; 'muel...@gcc.gnu.org'; 'rgue...@gcc.gnu.org';
> Matthew Gretton-Dann
> Subject: A question about detecting array bounds for case Warray-
> bounds-3.c
> 
> Hi,
> 
> For case gcc/testsuite/gcc.dg/Warray-bounds-3.c, obviously it is an
> invalid C program, because the last iterations of all the loops cause
> the access of arrays is beyond the max size of corresponding array
> declarations. The condition of checking upper bound should be "<"
> rather than "<=".
> 
> Right now, GCC compiler doesn't report any warning messages for this
> case, should it be a bug in both test case and compiler?
> 
> But looking at http://gcc.gnu.org/PR31227 , it seems this test case is
> designed to be like this on purpose. Anybody can explain about this?
> 
> The case is like below,
> 
> /* { dg-do compile } */
> /* { dg-options "-O2 -Warray-bounds" } */
> /* based on PR 31227 */
> 
> struct S
> {
>   const char *abday[7];
>   const char *day[7];
>   const char *abmon[12];
>   const char *mon[12];
>   const char *am_pm[2];
> };
> 
> ...
> 
>   for (cnt = 0; cnt <= 7; ++cnt)
> {
>   iov[2 + cnt].iov_base = (void *) (time->abday[cnt] ?: "");
>   iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
> }
> 
>   for (; cnt <= 14; ++cnt)
> {
>   iov[2 + cnt].iov_base = (void *) (time->day[cnt - 7] ?: "");
>   iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
> }
> 
>   for (; cnt <= 26; ++cnt)
> {
>   iov[2 + cnt].iov_base = (void *) (time->abmon[cnt - 14] ?: "");
>   iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
> }
> 
>   for (; cnt <= 38; ++cnt)
> {
>   iov[2 + cnt].iov_base = (void *) (time->mon[cnt - 26] ?: "");
>   iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
> }
> 
>   for (; cnt <= 40; ++cnt)
> {
>   iov[2 + cnt].iov_base =  (void *) (time->am_pm[cnt - 38] ?: "");
>   iov[2 + cnt].iov_len = strlen (iov[2 + cnt].iov_base) + 1;
> }
> 
> Thanks,
> -Jiangning





RE: A case that PRE optimization hurts performance

2011-09-26 Thread Jiangning Liu
> > * Without PRE,
> >
> > Path1:
> >        movl    $2, %eax
> >        cmpl    $1, %eax
> >        je      .L3
> >
> > Path2:
> >        movb    $3, %al
> >        cmpl    $1, %eax
> >        je      .L3
> >
> > Path3:
> >        cmpl    $1, %eax
> >        jne     .L14
> >
> > * With PRE,
> >
> > Path1:
> >        movl    $1, %ebx
> >        movl    $2, %eax
> >        testb   %bl, %bl
> >        je      .L3
> >
> > Path2:
> >        movl    $1, %ebx
> >        movb    $3, %al
> >        testb   %bl, %bl
> >        je      .L3
> >
> > Path3:
> >        cmpl    $1, %eax
> >        setne   %bl
> >        testb   %bl, %bl
> >        jne     .L14
> >
> > Do you have any more thoughts?
> 
> It seems to me that with PRE all the testb %bl, %bl
> should be evaluated at compile-time considering the
> preceeding movl $1, %ebx.  Am I missing something?
> 

Yes. Can this be done by PRE or any other optimizations in middle end?

Thanks,
-Jiangning

> Richard.
> 






RE: A case that PRE optimization hurts performance

2011-09-26 Thread Jiangning Liu


> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, September 27, 2011 12:43 AM
> To: Richard Guenther
> Cc: Jiangning Liu; gcc@gcc.gnu.org
> Subject: Re: A case that PRE optimization hurts performance
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 09/26/11 05:00, Richard Guenther wrote:
> > On Mon, Sep 26, 2011 at 9:39 AM, Jiangning Liu
> >  wrote:
> >>>> * Without PRE,
> >>>>
> >>>> Path1: movl$2, %eax cmpl$1, %eax je  .L3
> >>>>
> >>>> Path2: movb$3, %al cmpl$1, %eax je  .L3
> >>>>
> >>>> Path3: cmpl$1, %eax jne .L14
> >>>>
> >>>> * With PRE,
> >>>>
> >>>> Path1: movl$1, %ebx movl$2, %eax testb   %bl, %bl je
> >>>> .L3
> >>>>
> >>>> Path2: movl$1, %ebx movb$3, %al testb   %bl, %bl je
> >>>> .L3
> >>>>
> >>>> Path3: cmpl$1, %eax setne   %bl testb   %bl, %bl jne
> >>>> .L14
> >>>>
> >>>> Do you have any more thoughts?
> >>>
> >>> It seems to me that with PRE all the testb %bl, %bl should be
> >>> evaluated at compile-time considering the preceeding movl $1,
> >>> %ebx.  Am I missing something?
> >>>
> >>
> >> Yes. Can this be done by PRE or any other optimizations in middle
> >> end?
> >
> > Hm, the paths as you quote them are obfuscated by missed
> > jump-threading. On the tree level we have
> >
> > # s_2 = PHI <2(5), 3(4), 2(6), s_25(7)> # prephitmp.6_1 = PHI
> > <1(5), 1(4), 1(6), prephitmp.6_3(7)> : t_14 = t_24 + 1;
> > D.2729_6 = MEM[base: t_14, offset: 0B]; D.2732_7 = D.2729_6 != 0;
> > D.2734_9 = prephitmp.6_1 & D.2732_7; if (D.2734_9 != 0)
> >
> > where we could thread the cases with prephitmp.6_1 == 1,
> > ultimately removing the & and forwarding the D.2729_6 != 0 test.
> > Which would of course cause some code duplication.
> >
> > Jeff, you recently looked at tree jump-threading, can you see if
> > we can improve things on this particular testcase?
> There's nothing threading can do here because it doesn't know anything
> about the value MEM[t14].
> 

Jeff, 

Could you please explain more about this? What information does jump
threading want to know on MEM[t14]? Do you mean it's hard to duplicate that
basic block due to some reasons?

Thanks,
-Jiangning

> 
> Jeff
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iQEcBAEBAgAGBQJOgKuLAAoJEBRtltQi2kC75aIH/iikuOQXrMrQJFbQw0COXznB
> OGq8iXdGwTJGH13vxdItTE0upJp7RgUVLzuhdqj1elTLHv/ujYygMsQRNGKcc8tb
> GMLECmWDhZqQTFXcTJCgJNZiv7MH1PNELXSdIkkSnxY+pwyn9AX5D3+HcTSjGU6B
> 51AdUNVph/VSaVboAgcrFpu9S0pX9HVTqFy4JI83Lh613zDVSmPo14DDy7vjBvE9
> 2Srlvlw0srYup97bGmRqN8wT4ZLLlyYSB2rjEFc6jmgXVncxiteQYIUZpy0lcC0M
> q3j80aXjZ57/iWyAbqDr1jI5tbVKDBkRa9LL1jvn9534adiG4GrnSMPhoog0ibA=
> =azr5
> -END PGP SIGNATURE-






FW: How to let Linux kernel Makefile generate intermediate *.i files ? It doesn't work to add "EXTRA_CFLAGS += -save-temps" in Makefile and gets "cc: warning: -pipe ignored because -wave-temps specif

2011-10-17 Thread Liu Wang


-Original Message-
From: Liu Wang 
Sent: Saturday, October 15, 2011 5:42 PM
To: 'gcc-h...@gcc.gnu.org'
Subject: How to let Linux kernel Makefile generate intermediate *.i files ? It 
doesn't work to add "EXTRA_CFLAGS += -save-temps" in Makefile and gets "cc: 
warning: -pipe ignored because -wave-temps specified."

Sir./Madam,

Appreciate your helping me with the following.

How to let Linux kernel Makefile generate intermediate *.i files ? 
It doesn't work to add "EXTRA_CFLAGS += -save-temps" in Makefile and gets "cc: 
warning: -pipe ignored because -wave-temps specified."

Sincerely,
Liu Wang



A question about redudant load elimination

2011-11-14 Thread Jiangning Liu
Hi,

For this test case,

int x;
extern void f(void);

void g(int *a)
{
a[x] = 1;
if (x == 100)
f();
a[x] = 2;
}

For trunk, the x86 assembly code is like below,

movlx, %eax
movl16(%esp), %ebx
movl$1, (%ebx,%eax,4)
movlx, %eax   // Is this a redundant one?
cmpl$100, %eax
je  .L4
movl$2, (%ebx,%eax,4)
addl$8, %esp
.cfi_remember_state
.cfi_def_cfa_offset 8
popl%ebx
.cfi_restore 3
.cfi_def_cfa_offset 4
ret
.p2align 4,,7
.p2align 3
.L4:
.cfi_restore_state
callf
movlx, %eax
movl$2, (%ebx,%eax,4)
addl$8, %esp
.cfi_def_cfa_offset 8
popl%ebx
.cfi_restore 3
.cfi_def_cfa_offset 4
Ret

Is the 2nd "movl x, %eax" is a redundant one for single thread programming
model? If yes, can this be optimized away?

Thanks,
-Jiangning





RE: Is VRP is too conservative to identify boolean value 0 and 1?

2011-11-20 Thread Jiangning Liu


> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Friday, September 02, 2011 5:07 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: Re: Is VRP is too conservative to identify boolean value 0 and
> 1?
> 
> On Fri, Sep 2, 2011 at 7:58 AM, Jiangning Liu 
> wrote:
> > Hi,
> >
> > For the following small case,
> >
> > int f(int i, int j)
> > {
> >        if (i==1 && j==2)
> >                return i;
> >        else
> >                return j;
> > }
> >
> > with -O2 option, GCC has vrp2 dump like below,
> >
> > ==
> >
> > Value ranges after VRP:
> >
> > i_1: VARYING
> > i_2(D): VARYING
> > D.1249_3: [0, +INF]
> > j_4(D): VARYING
> > D.1250_5: [0, +INF]
> > D.1251_6: [0, +INF]
> > j_10: [2, 2]  EQUIVALENCES: { j_4(D) } (1 elements)
> >
> >
> > Removing basic block 3
> > f (int i, int j)
> > {
> >  _Bool D.1251;
> >  _Bool D.1250;
> >  _Bool D.1249;
> >
> > :
> >  D.1249_3 = i_2(D) == 1;
> >  D.1250_5 = j_4(D) == 2;
> >  D.1251_6 = D.1250_5 & D.1249_3;
> >  if (D.1251_6 != 0)
> >    goto ;
> >  else
> >    goto ;
> >
> > :
> >
> > :
> >  # i_1 = PHI <1(3), j_4(D)(2)>
> >  return i_1;
> >
> > }
> >
> > 
> >
> > Variable D.1249_3, D.1250_5 and D.1251_6 should be boolean values, so
> the
> > their value ranges should be
> >
> > D.1249_3: [0, 1]
> > D.1250_5: [0, 1]
> > D.1251_6: [0, 1]
> >
> > So why current VRP can't find out this value range?
> 
> It does - it just prints it as [0, +INF], they are bools with
> TYPE_MAX_VALUE
> == 1 after all.

Richard,

May I use REG_EXPR(rtx of D.1249_3) in xxx.md file to detect whether
D.1249_3 is a bool or not? 

Some comments in GCC says REG_EXPR may be lost in back-end. True?

If we do have REG_EXPR info for some cases in back-end, is it guaranteed to
be correct? May I implementing back-end peephole optimization depending on
REG_EXPR?

Thanks,
-Jiangning

> 
> Richard.
> 
> >
> > I'm asking this question because the optimizations in back-end need
> this
> > info to do advanced optimization.
> >
> > Thanks,
> > -Jiangning
> >
> >
> >






A case exposing code sink issue

2011-11-23 Thread Jiangning Liu
Hi,

For this small test case,

int a[512] ;
int b[512] ;
int *a_p ;
int *b_p ;
int i ;
int k ;

int f(void)
{
for( k = 1 ; k <= 9 ; k++)
{
for( i = k ; i < 512 ; i += k)
{
a_p = &a[i + (1<:
  k = 1;

:
  # k.0_9 = PHI 
  i = k.0_9;
  if (k.0_9 <= 511)
goto ;
  else
goto ;

:
Invalid sum of incoming frequencies 900, should be 81
  goto ;

:
  pretmp.11_19 = 1 << k.0_9;

:
  # i.1_34 = PHI 
  D.2246_5 = pretmp.11_19;
  D.2248_7 = i.1_34 + D.2246_5;
  a_p.2_8 = &a[D.2248_7];
  a_p = a_p.2_8;
  b_p.3_13 = &b[D.2248_7];
  b_p = b_p.3_13;
  MEM[(int *)&a][D.2248_7] = 7;
  MEM[(int *)&b][D.2248_7] = 7;
  i.6_18 = k.0_9 + i.1_34;
  i = i.6_18;
  if (i.6_18 <= 511)
goto ;
  else
goto ;

:
  goto ;

:
Invalid sum of incoming frequencies 81, should be 900
  k.7_20 = k.0_9 + 1;
  k = k.7_20;
  if (k.7_20 <= 9)
goto ;
  else
goto ;

:
  goto ;

:
  return;

}

Can the following statements be sinked out of loop? I don't see this
optimization happen in trunk. The consequence is register pressure increased
and a spill/fill occurs in RA.

  a_p.2_8 = &a[D.2248_7];
  a_p = a_p.2_8;
  b_p.3_13 = &b[D.2248_7];
  b_p = b_p.3_13;

I know the sink would happen in sink pass if a_p and b_p are local
variables. 

If this is the root cause, which optimization pass in GCC take the role to
sink them out of loop? How should we get it fixed?

Thanks,
-Jiangning





RE: A case exposing code sink issue

2011-11-23 Thread Jiangning Liu


> -Original Message-
> From: Andrew Pinski [mailto:pins...@gmail.com]
> Sent: Thursday, November 24, 2011 12:15 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: Re: A case exposing code sink issue
> 
> On Wed, Nov 23, 2011 at 8:05 PM, Jiangning Liu 
> wrote:
> > If this is the root cause, which optimization pass in GCC take the
> role to
> > sink them out of loop? How should we get it fixed?
> 
> lim1 handles the case just fine for me.  lim1 is the first loop pass.
> 
> After lim1 I get:
> 
> :
>   # i.1_34 = PHI 
>   D.2934_5 = pretmp.11_33;
>   D.2936_7 = i.1_34 + D.2934_5;
>   a_p.2_8 = &a[D.2936_7];
>   a_p_lsm.13_37 = a_p.2_8;
>   b_p.3_13 = &b[D.2936_7];
>   b_p_lsm.14_38 = b_p.3_13;
>   MEM[(int *)&a][D.2936_7] = 7;
>   MEM[(int *)&b][D.2936_7] = 7;
>   i.6_18 = k.0_9 + i.1_34;
>   i_lsm.12_39 = i.6_18;
>   if (i.6_18 <= 511)
> goto ;
>   else
> goto ;
> 
> :
>   goto ;
> 

&a[D.2936_7] and &b[D.2936_7] are not loop invariants, so it seems lim1 
shouldn't be able to sink them, right? Do I misunderstand this optimization?

Thanks,
-Jiangning

> 
> 
> 
> Thanks,
> Andrew Pinski






RE: A case exposing code sink issue

2011-11-23 Thread Jiangning Liu
Sorry, I realize we can't do that optimization because a_p may have
dependence upon other memory accesses like MEM[(int *)&a][D.2248_7].

For example, if it happens a_p equals &a_p, that optimization would be
wrong.

But can alias analysis solve the problem if we can guarantee (i+(1< -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Jiangning Liu
> Sent: Thursday, November 24, 2011 12:05 PM
> To: gcc@gcc.gnu.org
> Subject: A case exposing code sink issue
> 
> Hi,
> 
> For this small test case,
> 
> int a[512] ;
> int b[512] ;
> int *a_p ;
> int *b_p ;
> int i ;
> int k ;
> 
> int f(void)
> {
> for( k = 1 ; k <= 9 ; k++)
> {
> for( i = k ; i < 512 ; i += k)
> {
> a_p = &a[i + (1< b_p = &b[i + (1< *a_p = 7 ;
> *b_p = 7 ;
> }
> }
> }
> 
> Before sink pass we have,
> 
> f ()
> {
>   int pretmp.11;
>   int k.7;
>   int i.6;
>   int * b_p.3;
>   int * a_p.2;
>   int D.2248;
>   int i.1;
>   int D.2246;
>   int k.0;
> 
> :
>   k = 1;
> 
> :
>   # k.0_9 = PHI 
>   i = k.0_9;
>   if (k.0_9 <= 511)
> goto ;
>   else
> goto ;
> 
> :
> Invalid sum of incoming frequencies 900, should be 81
>   goto ;
> 
> :
>   pretmp.11_19 = 1 << k.0_9;
> 
> :
>   # i.1_34 = PHI 
>   D.2246_5 = pretmp.11_19;
>   D.2248_7 = i.1_34 + D.2246_5;
>   a_p.2_8 = &a[D.2248_7];
>   a_p = a_p.2_8;
>   b_p.3_13 = &b[D.2248_7];
>   b_p = b_p.3_13;
>   MEM[(int *)&a][D.2248_7] = 7;
>   MEM[(int *)&b][D.2248_7] = 7;
>   i.6_18 = k.0_9 + i.1_34;
>   i = i.6_18;
>   if (i.6_18 <= 511)
> goto ;
>   else
> goto ;
> 
> :
>   goto ;
> 
> :
> Invalid sum of incoming frequencies 81, should be 900
>   k.7_20 = k.0_9 + 1;
>   k = k.7_20;
>   if (k.7_20 <= 9)
> goto ;
>   else
> goto ;
> 
> :
>   goto ;
> 
> :
>   return;
> 
> }
> 
> Can the following statements be sinked out of loop? I don't see this
> optimization happen in trunk. The consequence is register pressure
> increased
> and a spill/fill occurs in RA.
> 
>   a_p.2_8 = &a[D.2248_7];
>   a_p = a_p.2_8;
>   b_p.3_13 = &b[D.2248_7];
>   b_p = b_p.3_13;
> 
> I know the sink would happen in sink pass if a_p and b_p are local
> variables.
> 
> If this is the root cause, which optimization pass in GCC take the role
> to
> sink them out of loop? How should we get it fixed?
> 
> Thanks,
> -Jiangning
> 
> 
> 






RE: A case exposing code sink issue

2011-11-23 Thread Jiangning Liu
One more question...

Can " i = i.6_18;" be sinked out of loop, because it doesn't have memory
dependence with others?

Thanks,
-Jiangning

> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Jiangning Liu
> Sent: Thursday, November 24, 2011 2:57 PM
> To: gcc@gcc.gnu.org
> Subject: RE: A case exposing code sink issue
> 
> Sorry, I realize we can't do that optimization because a_p may have
> dependence upon other memory accesses like MEM[(int *)&a][D.2248_7].
> 
> For example, if it happens a_p equals &a_p, that optimization would be
> wrong.
> 
> But can alias analysis solve the problem if we can guarantee (i+(1< is
> less than the upbound of array a's definition?
> 
> Or is there any GCC command line switch assuming no array bound
> overflow?
> That way we can do more aggressive optimizations, right?
> 
> Thanks,
> -Jiangning
> 
> > -Original Message-
> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
> Of
> > Jiangning Liu
> > Sent: Thursday, November 24, 2011 12:05 PM
> > To: gcc@gcc.gnu.org
> > Subject: A case exposing code sink issue
> >
> > Hi,
> >
> > For this small test case,
> >
> > int a[512] ;
> > int b[512] ;
> > int *a_p ;
> > int *b_p ;
> > int i ;
> > int k ;
> >
> > int f(void)
> > {
> > for( k = 1 ; k <= 9 ; k++)
> > {
> > for( i = k ; i < 512 ; i += k)
> > {
> > a_p = &a[i + (1< > b_p = &b[i + (1< > *a_p = 7 ;
> > *b_p = 7 ;
> > }
> > }
> > }
> >
> > Before sink pass we have,
> >
> > f ()
> > {
> >   int pretmp.11;
> >   int k.7;
> >   int i.6;
> >   int * b_p.3;
> >   int * a_p.2;
> >   int D.2248;
> >   int i.1;
> >   int D.2246;
> >   int k.0;
> >
> > :
> >   k = 1;
> >
> > :
> >   # k.0_9 = PHI 
> >   i = k.0_9;
> >   if (k.0_9 <= 511)
> > goto ;
> >   else
> > goto ;
> >
> > :
> > Invalid sum of incoming frequencies 900, should be 81
> >   goto ;
> >
> > :
> >   pretmp.11_19 = 1 << k.0_9;
> >
> > :
> >   # i.1_34 = PHI 
> >   D.2246_5 = pretmp.11_19;
> >   D.2248_7 = i.1_34 + D.2246_5;
> >   a_p.2_8 = &a[D.2248_7];
> >   a_p = a_p.2_8;
> >   b_p.3_13 = &b[D.2248_7];
> >   b_p = b_p.3_13;
> >   MEM[(int *)&a][D.2248_7] = 7;
> >   MEM[(int *)&b][D.2248_7] = 7;
> >   i.6_18 = k.0_9 + i.1_34;
> >   i = i.6_18;
> >   if (i.6_18 <= 511)
> > goto ;
> >   else
> > goto ;
> >
> > :
> >   goto ;
> >
> > :
> > Invalid sum of incoming frequencies 81, should be 900
> >   k.7_20 = k.0_9 + 1;
> >   k = k.7_20;
> >   if (k.7_20 <= 9)
> > goto ;
> >   else
> > goto ;
> >
> > :
> >   goto ;
> >
> > :
> >   return;
> >
> > }
> >
> > Can the following statements be sinked out of loop? I don't see this
> > optimization happen in trunk. The consequence is register pressure
> > increased
> > and a spill/fill occurs in RA.
> >
> >   a_p.2_8 = &a[D.2248_7];
> >   a_p = a_p.2_8;
> >   b_p.3_13 = &b[D.2248_7];
> >   b_p = b_p.3_13;
> >
> > I know the sink would happen in sink pass if a_p and b_p are local
> > variables.
> >
> > If this is the root cause, which optimization pass in GCC take the
> role
> > to
> > sink them out of loop? How should we get it fixed?
> >
> > Thanks,
> > -Jiangning
> >
> >
> >
> 
> 
> 
> 






RE: A case exposing code sink issue

2011-11-27 Thread Jiangning Liu


> -Original Message-
> From: Michael Matz [mailto:m...@suse.de]
> Sent: Friday, November 25, 2011 11:23 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: RE: A case exposing code sink issue
> 
> Hi,
> 
> On Thu, 24 Nov 2011, Jiangning Liu wrote:
> 
> > One more question...
> >
> > Can " i = i.6_18;" be sinked out of loop, because it doesn't have
> memory
> > dependence with others?
> 
> With current trunk the stores to i, a_p, b_p and k are sunken after the
> loop.  (There are no aliasing problems because the decls can't
> conflict).
> 
> What isn't sunken is the calculation of the &a[D.2248_7] expression.
> First, the number of iterations of the inner loop can't be determined
> by
> current code (replacing i+=k with e.g. i++ could be handled for
> instance).

Hi Michael,

Do you know what the essential problem is in the case of loop iteration
uncertainty? I thought it was still an aliasing problem.

Thanks,
-Jiangning

> Then this code could be handled by final value replacement, but isn't
> because interpret_rhs_expr doesn't deal with ADDR_EXPR of ARRAY_REFs.
> 
> 
> Ciao,
> Michael.






RE: A case exposing code sink issue

2011-12-01 Thread Jiangning Liu


> -Original Message-
> From: Michael Matz [mailto:m...@suse.de]
> Sent: Monday, November 28, 2011 9:07 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: RE: A case exposing code sink issue
> 
> Hi,
> 
> On Mon, 28 Nov 2011, Jiangning Liu wrote:
> 
> > > > One more question...
> > > >
> > > > Can " i = i.6_18;" be sinked out of loop, because it doesn't have
> > > memory
> > > > dependence with others?
> > >
> > > With current trunk the stores to i, a_p, b_p and k are sunken after
> the
> > > loop.  (There are no aliasing problems because the decls can't
> > > conflict).
> > >
> > > What isn't sunken is the calculation of the &a[D.2248_7] expression.
> > > First, the number of iterations of the inner loop can't be
> determined
> > > by
> > > current code (replacing i+=k with e.g. i++ could be handled for
> > > instance).
> >
> > Hi Michael,
> >
> > Do you know what the essential problem is in the case of loop
> iteration
> > uncertainty?
> 
> Yes, the number of iterations of the i loop simply is too difficult for
> our loop iteration calculator to comprehend:
> 
>   for (i=k; i<500; i+=k)
> 
> iterates for roundup((500-k)/k) time.  In particular if the step is
> non-constant our nr-of-iteration calculator gives up.

So do you think this can be improved somewhere?

For this case, looking at the result in middle end, "a_p.2_8 =
&a[D.2248_7];" should be able to sunken out of loop. That way the
computation of &a[D.2248_7] would be saved in loop, although the consequence
is the liverange of D.2248_7 is longer and it needs to live out of loop. But
anyway the register pressure would be decreased within the loop, and we
would less possibly have spill/fill code. This is what I want.

I think we can simply use loop induction variable analysis to solve this
problem. Do you think so?

Thanks,
-Jiangning

> 
> > I thought it was still an aliasing problem.
> 
> No.  All accesses are resolved to final objects (i.e. no pointers), and
> hence can be trivially disambiguated.
> 
> 
> Ciao,
> Michael.






RE: A case exposing code sink issue

2011-12-22 Thread Jiangning Liu
> Yes, the number of iterations of the i loop simply is too difficult for
> our loop iteration calculator to comprehend:
> 
>   for (i=k; i<500; i+=k)
> 
> iterates for roundup((500-k)/k) time.  In particular if the step is
> non-constant our nr-of-iteration calculator gives up.
> 

I'm trying to give an even smaller case,

int a[512] ;
int *a_p ;

int f(int k)
{
int i ;

for(i=0; i:
  # i_13 = PHI 
  # ivtmp.10_9 = PHI 
  a_p_lsm.6_4 = &a[i_13];
  ivtmp.10_1 = ivtmp.10_9 + 4;
  D.4085_16 = (void *) ivtmp.10_1;
  MEM[base: D.4085_16, offset: 0B] = 7;
  i_6 = i_13 + 1;
  if (i_6 != k_3(D))
goto ;
  else
goto ;

:
  # a_p_lsm.6_11 = PHI 
  a_p = a_p_lsm.6_11;
  goto ;

Why can't we still sunk &a[i_13] out of loop? For example, I expect to
generate the code like below,

:
  # i_13 = PHI 
  # ivtmp.10_9 = PHI 
  i_14 = i_13;
  ivtmp.10_1 = ivtmp.10_9 + 4;
  D.4085_16 = (void *) ivtmp.10_1;
  MEM[base: D.4085_16, offset: 0B] = 7;
  i_6 = i_13 + 1;
  if (i_6 != k_3(D))
goto ;
  else
goto ;

:
  # a_p_lsm.6_11 = PHI 
  a_p_lsm.6_4 = &a[i_14];
  a_p = a_p_lsm.6_11;
  goto ;

This way the computation of &a[i] would be saved within the loop. 

Any idea?

Thanks,
-Jiangning





RE: A case exposing code sink issue

2011-12-27 Thread Jiangning Liu
> 
> The job to do this is final value replacement, not sinking (we do not
> sink non-invariant expressions - you'd have to translate them through
> the loop-closed SSA exit PHI node, certainly doable, patches
> welcome ;)).
> 

Richard,

In final value replacement, expression "&a + D." can be figured out,
while "&a[i_xxx]" failed to be CHRECed, so I'm wondering if we should lower
&a[i_xxx] to "&a + unitsize(a) * i_xxx" first? It seems GCC intends to keep
&a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC
algorithm to get it calculated?

Appreciate your kindly help in advance!

Thanks,
-Jiangning





  1   2   3   >