[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-05-29 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #13 from Qing Zhao  ---
> The only exception is a cross-profiling:
> https://gcc.gnu.org/onlinedocs/gcc/Cross-profiling.html
> 
> where one can use GCOV_PREFIX environment variable to save .gcda files to a
> separate location.
> 
> Do you use it? Or do you use any of -fprofile-dir options?

Yes,  We use -fprofile-dir=pd%p when compile the modules.

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-05-29 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #14 from Qing Zhao  ---
> 
> -fprofile-dir=gcc_prof_dir/%p"
> 
> So my recommendation would be not to use it and let GCOV run-time library 
> merge
> the profiles. Of course, I'll be interested in `perf report` of such a
> instrumented run..

For our application, all processes generating profiling feedback data to a
single directory seems is not a choice. 
We chose -fprofile-dir=%p and then use gcov-merge to merge them.

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-06-01 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #20 from Qing Zhao  ---
Created attachment 48653
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48653&action=edit
A.data

--- Comment #21 from Qing Zhao  ---
Created attachment 48654
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48654&action=edit
B.data

--- Comment #22 from Qing Zhao  ---
Created attachment 48655
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48655&action=edit
C.data

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-06-01 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #19 from Qing Zhao  ---
Hi, Martin,

I attached 3 profiling data files with this email (among over 5000 files under
one typical directory), 
Hope this is helpful.

Thanks.

Qing

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-06-03 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #26 from Qing Zhao  ---
> --- Comment #25 from Martin Liška  ---
>> I will try to get more data on our real application. 
>> 
>> one question: why not just delete the entire records whose counter is zero
>> and delete the entire file whose counter is zero?
> 
> Because we need to distinguish in between situations where a function was
> really not executed (counter == 0) and the situation where we miss profile for
> a function.
Understood.  However, is it possible to just provide an option for the user to
choose
to just delete all the zero records and files in order to save more space?
> 
> How exactly do you merge profiles? Do you run parallel invocation which can
> take log2(N)?
We run serial merge adding one at a time right now. 
Is it possible for gcov-merge to add a new functionality to automatically merge
complete
Set of subdirectories?

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-06-10 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #29 from Qing Zhao  ---
> 
> And you still haven't replied to my essential question: Why can't you merge
> profiles into one directory during run? Or at least merge to a reasonable
> number of folders that you'll merge later?
Comment #18 has my answer to the above question.

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-06-10 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #31 from Qing Zhao  ---
> The explanation is not sufficient.
You mean the following explanation: (in comment 18)

we tried the scheme that all the processes generate profiling feedback data to
the single directory, 
but looks like a lot of profiling info got lost, resulting bad performance
effect. 
We thought this might relate to the parallel running environment of our
application, 
then we switched to this current approach. 

Per the documentation of -fprofile-dir=path at:
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html


"When an executable is run in a massive parallel environment, it is recommended
to save profile to different folders.
 That can be done with variables in path that are exported during run-time:
%p
process ID.

%q{VAR}
value of environment variable VAR"


> You mentioned that merging takes 24 hours, so I would expect that directing
> merging will same you quite some time.
Yes, that will help.
But at the same time, if the profiling feedback size can be reduced, then the
time for merging will be reduced too. 

> I'm still missing information like:
> - how long does it take the training run?
I have asked the question too (I don’t have permission to run that real app, so
I need to collect such info from other engineer), but no answer so far.
> - how many parallel runs do you have?
Over 1 processes are running at the same time. 
> - what's the average duration of a process?
I have asked the question too, no answer so far.

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-06-10 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #32 from Qing Zhao  ---
> I would be more interested in overall statistics for your training scenario.
> How much can you get from ~1TB of data?

The profile directory generated by the new executable compiled with this patch
was 112G  in size, a lot smaller than previous 1TB. 
Though still bigger than what ICC generated.

[Bug gcov-profile/95348] GCC records zero functions and modules in the profiling data file, ICC does NOT

2020-06-11 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

--- Comment #34 from Qing Zhao  ---
> 
>> Though still bigger than what ICC generated.
> 
> Yep, but we should be only 2x bigger right now?
Yes, around 2-3 times bigger, much better now.
> 
> Can you please test the parallel merging script? I can merge 10GB gcov files
> (5000 runs with 264 files each) in about 50s.

I will make the request soon (I don’t have the permission to do this). Might
might take some time for others to do this.

[Bug middle-end/79538] missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79538

--- Comment #4 from Qing Zhao  ---
fixed in

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=255654

[Bug middle-end/79538] missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79538

Qing Zhao  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Qing Zhao  ---
this bug has been fixed in upstream. and is decided not back port to Gcc 7.
so close it as resolved

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-12-14 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #22 from Qing Zhao  ---
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00962.html
2nd patch

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-19 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #23 from Qing Zhao  ---
I have an implementation for the part C of this task in my private space:

 part C: for strcmp (s1, s2), strncmp (s1, s2, n):

  if the result is NOT used to do simple equality test against zero, one of 
"s1" or "s2" is a small constant string, n is a constant, and the Min value of 
the length of the constant string and "n" is smaller than a predefined 
threshold T, 
  inline the call by a byte-to-byte comparision sequence to avoid calling 
overhead. 


with this implementation, I was able to measure the performance impact of the
inlining transformation on different value of "n", n is the length of the
string need to be compared. In order to decide the following two concerns:
A. what's the default value of n.
B. on a platform that support string compare hardware insns (for exmaple,
X86), which should be done first for a call to strcmp/strncmp, the inline or
the hardware insns?


On both aarch64 and X86, I tried the following small testing case for the
performance experiments:

qinzhao@gcc116:~/Bugs/78809/const_cmp$ cat t_p.c
#include 

char array[]= "fishi";

#define NUM 10
int __attribute__ ((noinline))
cmp2 (const char *p)
{
  int result = 0;
  int i;
  for (i=0; i< NUM; i++) {
result |=  strcmp (p, "fishi"); 
  }
  return result;
}

int result = 0;

int main()
{
  for (int i = 0; i < 25; i++)
 result += cmp2 (array);
  return 0;
}

and the option I used was:  -O -fno-tree-loop-im and the corresponding option
to enable or disable the added inlining, the following is the performance
result:

aarch64  strcmp

n=  3   4   5   6   10  20

inline  31  41  62  72  114 242
no-inline   229 229 229 229 272 333

aarch64  strncmp

n=  3   4   5   6   10  20

inline  41  62  62  76  125 250
no-inline   291 291 291 291 364 427


X86  strcmp

n=  4   5   6   10  20

inline  21  25  31  42  163
no-inline   445 461 488 529 672


X86  strncmp

n=  4   5   6   10  20

inline  21  25  28  43  77
no-inline   412 435 442 495 638

From the above, we can see:

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-19 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #24 from Qing Zhao  ---
From the above, we can see:
even with n is as big as 20, inlined version is much faster than the
non-inlined version, both on aarch64 (no hardware string compare insn provided)
and X86 (hardware string compare insn provided)

So, it's reasonable to do the inline as much as possible.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Qing Zhao  changed:

   What|Removed |Added

  Attachment #42449|0   |1
is obsolete||

--- Comment #26 from Qing Zhao  ---
Created attachment 43222
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43222&action=edit
the run-time performance data for inlining str(n)cmp

This is the performance data with my private str(n)cmp implementation on both
X86 and AARCH64 platform.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #28 from Qing Zhao  ---
> 
> I don't think this is a good test.  Repeatedly calling strcmp with the same
> inputs is not something real code does, especially when the string matches
> exactly.  Why would it do that?  The consequence is that you are probably
> optimizing for a very unlikely scenario.  Instead you need to look at traces
> from real code to understand how strcmp is called in practice.  Nothing else
> will give you a defensible answer.

Please see the latest attachment to this bug, it addressed your above concerns.
let me know if you have more concerns with the new testing cases or any
concrete
suggestion to improve the testing case. 
The next step will do performance testing with SPEC2006 or SPEC2017 to further
decide the default threshold.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-25 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #30 from Qing Zhao  ---
(in reply to Wilco from comment #29)
> 
> The new test is better, however it uses i % 15 which means an expensive
> division by constant every loop iteration. It's best to change to i & 15. Also
> using an array of string pointers means you get something like:
> 
> result += strcmp (p[i & 15], "abc");
> 
> Using this I get ~80% speedup for n=3 on AArch64, similar to your set 2.
I will try with these modification. 
> 
> As for benchmarking, I'm not so sure that SPEC2006 or SPEC2017 call strcmp 
> with
> constant strings.
do you have any suggestion on other real applications?

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-30 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #32 from Qing Zhao  ---
Created attachment 43298
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43298&action=edit
code size impact from inlining str(n)cmp with different n on X86

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-01-30 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #33 from Qing Zhao  ---
with SPEC2017 on X86, I didn't see any run-time performance change as expected.
due to the date I collected for code size change, looks like that n=3 or n=4
might be a reasonable default. n=3 might be safer.

[Bug gcov-profile/47618] Collecting multiple profiles and using all for PGO

2019-05-02 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47618

--- Comment #27 from Qing Zhao  ---
> --- Comment #26 from Martin Liška  ---
> 
>> 2. Intel compiler (icc)'s profmerge is able to merge all the .dyn files
>> under one directory, does gcc have such functionality currently?
> 
> We have folder-base merging where we search for all .gcda files and we merge
> them to a destination folder.

could you please point me which command does this? thanks.

[Bug middle-end/86467] inlining strcmp with small known length array

2018-07-11 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86467

--- Comment #2 from Qing Zhao  ---
> --- Comment #1 from Richard Biener  ---
> I think there's a duplicate report.

you mean another similar PR existing?

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #39 from Qing Zhao  ---
> --- Comment #38 from Wilco  ---
> This uses signed char while the C standard says the comparison is done on
> unsigned chars.
> 

during my implementation, I did some research on whether I should use “unsigned
char” or “signed char”
for the comparison.  what I checked was man page of strcmp, memcmp, (I don’t
have C standard in hand).
in the manpage of memcmp, it clearly and explicitly mentioned that the chars
are interpreted as unsigned char;
however, in the manpage of strcmp/strncmp, it’s not mentioned at all.  So, I
thought that for strcmp/strncmp,
I should use signed char.  but for memcmp, I used unsigned char.

since I don’t have a C standard, could you please point me the corresponding
section for this?
thanks.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-13 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #41 from Qing Zhao  ---
> --- Comment #40 from Wilco  ---
> See eg. http://www.iso-9899.info/n1570.html section 7.24.4:
> 
> "The sign of a nonzero value returned by the comparison functions memcmp,
> strcmp, and strncmp is determined by the sign of the difference between the
> values of the first pair of characters (both interpreted as unsigned char) 
> that
> differ in the objects being compared."

Thanks. I will provide a small patch to fix this issue soon.

[Bug tree-optimization/86526] [9 Regression] ICE in builtin_memcpy_read_str, at builtins.c:3017

2018-07-16 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86526

--- Comment #4 from Qing Zhao  ---
> On Jul 16, 2018, at 5:01 AM, jakub at gcc dot gnu.org 
>  wrote:
> --- Comment #3 from Jakub Jelinek  ---
> As the patch contains a lot of formatting fixes (Qing, please watch out
> formatting of your patches more carefully, there should be no whitespace at 
> the
> end of lines, tabs should be used rather than 8 spaces, we write type
> *var_or_arg
> rather than type* var_or_arg, etc.), here is just the important part from diff
> -upb:

Thank you, Jakub for the quick patch.
I will watch the format issues in my later patch more carefully.
do you plan to check in this patch very soon?

[Bug testsuite/86519] [9 Regression] New test case gcc.dg/strcmpopt_6.c fails with its introduction in r262636

2018-07-16 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86519

--- Comment #3 from Qing Zhao  ---
> --- Comment #2 from seurer at gcc dot gnu.org ---
> What system are you testing on?  I've seen the same failure on power 8 and
> power 9 LE systems and power 7 and power 8 BE systems.

I am using the GCC farm machine gcc110:
[qinzhao@gcc1-power7 ~]$ uname -a
Linux gcc1-power7.osuosl.org 3.10.0-514.26.2.el7.ppc64 #1 SMP Mon Jul 10
02:26:53 GMT 2017 ppc64 ppc64 ppc64 GNU/Linux
let me know this is not the right machine to repeat the error.

[Bug testsuite/86519] [9 Regression] New test case gcc.dg/strcmpopt_6.c fails with its introduction in r262636

2018-07-16 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86519

--- Comment #5 from Qing Zhao  ---
> --- Comment #4 from seurer at gcc dot gnu.org ---
> I also just tried it on gcc110 and indeed it does not fail there.  However, it
> does fail on gcc112.
thanks a lot. will try on gcc112.

[Bug testsuite/86519] [9 Regression] New test case gcc.dg/strcmpopt_6.c fails with its introduction in r262636

2018-07-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86519

--- Comment #9 from Qing Zhao  ---
> --- Comment #8 from Martin Sebor  ---
> FWIW, it would be safer and more deterministic to fold these invalid calls to
> some non-zero value that it is to emit the invalid library call.
could you please provide more details on this?  what kind of non-zero value
should be 
assigned to these invalid calls?

[Bug testsuite/86519] [9 Regression] New test case gcc.dg/strcmpopt_6.c fails with its introduction in r262636

2018-07-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86519

--- Comment #11 from Qing Zhao  ---
> to reply: Comment #10 from Martin Sebor  —
thanks for the details.
However, I do not feel comfortable for the compiler to change an undefined
buggy code.
I think that it’s better to let the user to correct his/her own buggy code.
What the compiler should do
is just tell the user that his/her code is wrong, and why it’s wrong. the user
should know better how
to correct his code.

[Bug middle-end/78809] Inline strcmp with small constant strings

2018-07-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #44 from Qing Zhao  ---
> (In reply to wilco from comment #43)
will provide a simple patch for this issue.

[Bug testsuite/86519] [9 Regression] New test case gcc.dg/strcmpopt_6.c fails with its introduction in r262636

2018-07-27 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86519

--- Comment #13 from Qing Zhao  ---
> --- Comment #12 from seurer at gcc dot gnu.org ---
> gcc.dg/strcmpopt_6.c was recently updated in r263028 but it still fails albeit
> differently:
this is expected, I will provide a separate fix for this one.

[Bug target/81800] [8 regression] on aarch64 ilp32 lrint should not be inlined as two instructions

2017-09-07 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81800

--- Comment #3 from Qing Zhao  ---
I can repeat this with the latest upstream gcc on aarch64 machine.

the inlining happens when -fno-math-errno is specified. 
and it should be only inlined when -fno-trapping-math is specified.

[Bug target/81800] [8 regression] on aarch64 ilp32 lrint should not be inlined as two instructions

2017-09-11 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81800

--- Comment #4 from Qing Zhao  ---
since I cannot change the Assignee field, in order to avoid redundant work:
I am currently studying this bug and try to fix it.

[Bug target/81422] [aarch64] internal compiler error: in update_equiv_regs, at ira.c:3425

2017-09-12 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81422

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #2 from Qing Zhao  ---
I can repeat this bug on aarch64 machine with the latest gcc.
I am taking a deeper study on this one now. let me know if you are currently
working on the fix of this one.

[Bug target/81422] [aarch64] internal compiler error: in update_equiv_regs, at ira.c:3425

2017-09-18 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81422

--- Comment #3 from Qing Zhao  ---
if the DEST is NOT a REG (sometimes it's a SUBREG, for example in the testing
case of this Bug), the REG_EQUIV notes should NOT be added.

[Bug middle-end/80295] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-09-18 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80295

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #3 from Qing Zhao  ---
confirmed with the latest upstream GCC. 
-mabi=ilp32 triggers the failure.

[Bug middle-end/80295] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-09-19 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80295

--- Comment #4 from Qing Zhao  ---
working on a fix to this bug.
let me know if you are working on it too.

[Bug target/80266] ICE in store_pairsi condition with -mabi=ilp32

2017-09-25 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80266

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #3 from Qing Zhao  ---
This is a very similar bug as PR80295.
I have had a fix for 80295 already. hopefully that fix should fix this bug too.

since I cannot build gnat on the available machines, I cannot confirm this on
my side.

[Bug target/80266] ICE in store_pairsi condition with -mabi=ilp32

2017-09-26 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80266

--- Comment #5 from Qing Zhao  ---
Okay, will let you know when the patch is put into upstream.

Qing
> On Sep 26, 2017, at 5:56 AM, wilco at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80266
> 
> --- Comment #4 from Wilco  ---
> (In reply to Qing Zhao from comment #3)
>> This is a very similar bug as PR80295.
>> I have had a fix for 80295 already. hopefully that fix should fix this bug
>> too.
>> 
>> since I cannot build gnat on the available machines, I cannot confirm this
>> on my side.
> 
> When you have the patch ready we'll try it. Based on how many issues I found
> there are multiple places in the mid-end that need to be fixed...
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug target/81357] Extra mov for zero extend of add

2017-09-28 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #1 from Qing Zhao  ---
one question to the submitter:
   what optimization level you used to show this issue?

I tried-O0, -O1, -O2, and -O3.  I did not see this issue with -O0 and -O1.
starting to see the issue from -O2 and above. (I am using the upstream GCC on
9/27/2017).

the assembly sequence I observed with -O1 (GOOD one) is:
test1:
add w0, w0, 1
uxtwx2, w0
adrpx1, d
str x2, [x1, #:lo12:d]
ret

the assembly sequence I observed with -O2 (BAD one) is:
test1:
adrpx2, d
add w1, w0, 1
mov w0, w1
str x1, [x2, #:lo12:d]
ret

which shows the issue that mentioned by Andrew Pinski (i.e the "mov" insn might
be
unnecessary if the "add" insn is "add w0, w0, 1".

[Bug target/81357] Extra mov for zero extend of add

2017-09-28 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357

--- Comment #3 from Qing Zhao  ---
the zero extension "uxtw" insn is generated even without any optimiation, the
additional "mov" insn generated in -O2 is introduced by -fschedule-insns,
please see the following:

***/home/qinzhao/Install/latest/bin/gcc t.c
test1:
sub sp, sp, #16
str w0, [sp, 12]
ldr w0, [sp, 12]
add w0, w0, 1
uxtwx1, w0
adrpx0, d
add x0, x0, :lo12:d
str x1, [x0]
ldr w0, [sp, 12]
add w0, w0, 1
add sp, sp, 16
ret
***/home/qinzhao/Install/latest/bin/gcc -O t.c
test1:
add w0, w0, 1
uxtwx2, w0
adrpx1, d
str x2, [x1, #:lo12:d]
ret
***/home/qinzhao/Install/latest/bin/gcc -O -fschedule-insns t.c
test1:
add w1, w0, 1
adrpx2, d
mov w0, w1
uxtwx1, w1
str x1, [x2, #:lo12:d]
ret

So, 
1. the zero extension comes from the language standard naturally. for aarch64,
due to the fact that the register W0 to X0 implicitly zero extension, the
explicitly zero extension insn can be optimized by some peephole optimization I
think.

2. will study why insn scheduler introduced the additional mov insns.

[Bug target/81357] Extra mov for zero extend of add

2017-09-28 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357

--- Comment #5 from Qing Zhao  ---
Hi, wilco,

thanks for the comments.
see me reply below:

> On Sep 28, 2017, at 2:13 PM, wilco at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357
> 
> --- Comment #4 from Wilco  ---
> (In reply to Qing Zhao from comment #3)
>> 1. the zero extension comes from the language standard naturally. for
>> aarch64, due to the fact that the register W0 to X0 implicitly zero
>> extension, the explicitly zero extension insn can be optimized by some
>> peephole optimization I think.
>> 
>> 2. will study why insn scheduler introduced the additional mov insns.
> 
> The scheduler didn't introduce an additional move. There are 2 distinct values
> here, the 32-bit value fParm + 1 and the 64-bit zero-extended version of it.
> Your example shows it perfectly:
> 
>add w1, w0, 1
>mov w0, w1
>uxtwx1, w1
> 
> The mov and uxtw are in fact the same instruction, however they are different
> in the intermediate code. Due to being different operations the register
> allocator needs 2 registers.

don’t quite understand your above. what’s your mean by “the mov and uxtw are in
fact the same instruction”?

from my understanding:

the “mov” insn that is added by -O2 is mainly for returning an unsigned int 
from the function. since by ABI, the register
w0 should hold the result that will be return. 

at the same time, the “uxtw” insn is there because of the explicitly conversion
from “unsigned int to unsigned long long”. 

If the first “add” insn keeps as “W0” as the destination (i.e. add w0, w0,1),
then the “mov” insn is NOT needed at all.

Not sure why when -fschedule-insns is ON, the destination of the add insns
becomes W1?

Qing



> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

[Bug target/81357] Extra mov for zero extend of add

2017-09-28 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357

--- Comment #8 from Qing Zhao  ---
> 
> I don't think there is an easy fix for this example. The compiler believes
> there are 2 distinct values so it uses 2 registers irrespectively of the order
> of the mov and uxtw.

then, why when there is NO -fschedule-insns, i.e, when compiled with -O, the
assembly
does not have the additional “mov” insn:
***/home/qinzhao/Install/latest/bin/gcc -O t.c
test1:
add w0, w0, 1
uxtwx2, w0
adrpx1, d
str x2, [x1, #:lo12:d]
ret

[Bug target/81357] Extra mov for zero extend of add

2017-10-02 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357

--- Comment #10 from Qing Zhao  ---
the following is my conclusion on this bug based on previous discussion and
study, for this testing case:

  1. due to the fact that  "mov" and "uxtw" are the same instruction, the
assembly generated by -O1 and -O2 are exactly the same except 
 A. the order of the instructions (this is due to the instruction
scheduling applied in -O2). 
 B. the registers used in difference instructions. 

  2. I agree with Wilco's comments in comment 7:
"The compiler believes there are 2 distinct values so it uses 2 registers
irrespectively of the order"

i.e, for the testing case:

 1 unsigned long long d;
 2 unsigned int test1(unsigned int fParm)
 3 {
 4   d = fParm + 1;
 5   return fParm + 1;
 6 }

at line 4 and 5,  the result of fparm + 1 has two different usages:
* one is at line 4,  convert to unsigned long long first,  and then
assign to the global d;
* the other is at line 5, directly return as the return result of the
routine. 

the compiler has to use 2 different registers for these two different values. 

So, I think that the compiler does NOT do anything wrong for this testing case.
the additional "mov" or "uxtw" instruction that is claimed in comment 1
actually is necessary and should NOT be deleted. 

I think that this bug could be closed as not a bug.

[Bug target/81357] Extra mov for zero extend of add

2017-10-02 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357

--- Comment #12 from Qing Zhao  ---
> Well it is not wrong, just non-optimal. It is possible to use a single 
> register
> here but it means teaching GCC that these values are identical, which is
> non-trivial as it likely affects various places in the mid-end (this issue is
> target-independent).

does “these values are identical” mean that the value “fParm + 1” and its
zero_extension “unsigned long long (fParm +1)”
are identical?
I don’t think so.
However, I do agree that there is opportunity for more compact code:
these two values  can be stored in the same register on some of the target.
(for example, aarch64 or ia64).
but Not on all the targets (for example, all 32bit targets).

So, if we really want this additional optimization, this looks like a
target-dependent one. we might add a new target-depend
hook on whether two values with different width can be stored in the same
register (one is the other one’s zero-extension), and
then enhance register allocation with this hook?

[Bug target/81357] Extra mov for zero extend of add

2017-10-03 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357

--- Comment #13 from Qing Zhao  ---
I checked the same testing case on X86, SPARC in addition to aarch64, all have
the same issue:  (compile with gcc -O, the default is -m64)

***x86:
test1:
.LFB0:
.cfi_startproc
leal1(%rdi), %eax
movl%eax, %edx
movq%rdx, d(%rip)
ret
.cfi_endproc
in the above, "movl  %eax, %edx" is the insn that can be eliminated.

***SPARC:
test1:
sethi   %hi(d), %g1
add %o0, 1, %o0
srl %o0, 0, %o0
jmp %o7+8
 stx%o0, [%g1+%lo(d)]

in the above, "srl  %o0, 0, %o0" is the insn that can be eliminated.

so, this seems a common issue on -m64 targets.

(NOTE, this issue does NOT exist on -m32 targets).

[Bug target/81356] __builtin_strcpy is not good for copying an empty string on aarch64

2017-10-03 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81356

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #4 from Qing Zhao  ---
the issue is confirmed on aarch64.

In addition to X86, testing the same testing case on SPARC, we see No such
issue:

***SPARC: (sparc use multiple stores when compile with -O):
f:
mov 72, %g1
stb %g1, [%o0]
mov 105, %g1
stb %g1, [%o0+1]
mov 33, %g1
stb %g1, [%o0+2]
jmp %o7+8
 stb%g0, [%o0+3]

[Bug target/81356] __builtin_strcpy is not good for copying an empty string on aarch64

2017-10-03 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81356

--- Comment #5 from Qing Zhao  ---
the following code in config/aarch64/aarch64.c cause such behavior:

14143 static bool
14144 aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
14145 unsigned int align,
14146 enum by_pieces_operation op,
14147 bool speed_p)
14148 {
14149   /* STORE_BY_PIECES can be used when copying a constant string, but
14150  in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
14151  For now we always fail this and let the move_by_pieces code copy
14152  the string from read-only memory.  */
14153   if (op == STORE_BY_PIECES)
14154 return false;

when deleting line 14153 and 14154. and use this compiler to build the testing
case, I got:

f:
mov w1, 26952
movkw1, 0x21, lsl 16
str w1, [x0]
ret

looks like exactly we want.

[Bug target/81356] __builtin_strcpy is not good for copying an empty string on aarch64

2017-10-03 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81356

--- Comment #6 from Qing Zhao  ---
just found that a similar fix have been submitted 2 weeks ago to gcc_patches:

https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg173652.html

[Bug rtl-optimization/82396] [8 Regression] qsort comparator non-negative on sorted output: 4 in ready_sort_real in haifa scheduler

2017-10-05 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82396

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #11 from Qing Zhao  ---
I am seeing the exactly same issue on aarch64 today (10/5/2017) as comment 10.

[Bug middle-end/80295] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-06 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80295

--- Comment #7 from Qing Zhao  ---
thank you, Wilco.
> Wilco  changed:
> 
>   What|Removed |Added
> 
> CC||wilco at gcc dot gnu.org
> 
> --- Comment #6 from Wilco  ---
> The implementation of __builtin_update_setjmp_buf is incorrect, it takes a
> pointer (ie. SImode in ILP32) and treats the result of the add as Pmode
> (DImode). So it's missing a cast to Pmode.

Yes, this was what I thought too after more study.

However, I am still not very sure about the current implementation of
-mabi=ilp32 on aarch64
for example:

qinzhao@gcc116:~/Bugs/80295$ cat t1.c
void f (void *b) { __builtin_return(b); }
qinzhao@gcc116:~/Bugs/80295$ sh t1
/home/qinzhao/Install/latest/bin/gcc -mabi=ilp32 -fdump-rtl-all -fdump-tree-all
t1.c

in t1.c.231r.expand, I see the following stack pushing RTL insn:

(insn 2 4 3 2 (set (mem/f/c:SI (plus:DI (reg/f:DI 68 virtual-stack-vars)
(const_int -4 [0xfffc])) [1 b+0 S4 A32])
(reg:SI 0 x0 [ b ])) "t1.c":1 -1
 (nil))

the above looks like that ilp32 treats the stack pushing as 32 bit (not 64 bits
as aarch64 LP64 mode).
is this right?

[Bug middle-end/80295] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-06 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80295

--- Comment #9 from Qing Zhao  ---
Created attachment 42319
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42319&action=edit
proposed patch

[Bug middle-end/80295] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-06 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80295

--- Comment #10 from Qing Zhao  ---
Comment on attachment 42319
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42319
proposed patch

The implementation of __builtin_update_setjmp_buf is incorrect. It takes a
pointer
as an operand and treats the Mode of the pointer as Pmode, which is not
correct.

a conversion from ptr_mode to Pmode is needed.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-13 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #3 from Qing Zhao  ---
with the latest upstream GCC, we got the following assembly for the testing
case with -O2 on aarch64:

t1:
adrpx1, .LC0
add x1, x1, :lo12:.LC0
b   strcmp

t2:
adrpx1, .LC0
add x1, x1, :lo12:.LC0
b   strcmp


As I checked on X86, the same testing case has the following assembly with -O2:
t1:

movq%rdi, %rsi
movl$2, %ecx
movl$.LC0, %edi
repz; cmpsb
seta%al
setb%dl
subl%edx, %eax
movsbl  %al, %eax
ret

t2:
movq%rdi, %rsi
movl$2, %ecx
movl$.LC0, %edi
repz; cmpsb
seta%al
setb%dl
subl%edx, %eax
movsbl  %al, %eax
ret

on X86, both routines have the exact assembly and both are inlined by GCC.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #5 from Qing Zhao  ---
Created attachment 42449
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42449&action=edit
Run time performance testing case on aarch64

this is for testing the run-time performance of inlined version of strcmp and
non-inlined version of strcmp on aarch64. 
to run the test:
   0. on an aarch64 machine;
   1. untar the tar file
   2. cd perf_78809
   3. sh t_p

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #6 from Qing Zhao  ---
(A correction to comment 4:

when adding #include , the call to strcmp(s, "a") was inlined by
glibc in the header

with the latest upstream GCC
)


with attached testing case, on GCC farm machine gcc116, I got the following
run-time performance data:

qinzhao@gcc116:~/Bugs/78809/const_cmp$ sh t_p
/home/qinzhao/Install/latest/bin/gcc -O2 t_p_1.c t_p.c -DINLINED
inlined version
41.67user 0.00system 0:41.67elapsed 99%CPU (0avgtext+0avgdata 360maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps
/home/qinzhao/Install/latest/bin/gcc -O2 t_p_1.c t_p.c
non-inlined version
20.84user 0.00system 0:20.83elapsed 100%CPU (0avgtext+0avgdata 360maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps

>From the data, we can see the inlined version of strcmp (by glibc) is much
slower than the direct call to strcmp.  (this is for size 2)
I am using GCC farm machine gcc116:

qinzhao@gcc116:~/Bugs/78809/const_cmp$ uname -a
Linux gcc116 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:45:34 UTC 2016
aarch64 aarch64 aarch64 GNU/Linux

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #7 from Qing Zhao  ---
I have studied the inlining of memcmp and str(n)cmp in GCC, the following are a
summary of my study so far:

1. memcmp is different from str(n)cmp as the following:

• strcmp compares null-terminated C strings
• strncmp compares at most N characters of null-terminated C strings
• memcmp compares binary byte buffers of N bytes.

among these three, both strcmp and strncmp might early stop at NULL terminator
of the compared strings.  as a result, we have to compare the char one by one
for str(n)cmp.

on the other hand, memcmp will NOT early stop, it will compare exactly N bytes
of both buffers. As a result, the compiler can compare multiple bytes at one
time.  

So, memcpy should be much easier for the compiler to optimize than str(n)cmp. 

2. when the memcpy’s result is only compared against zero, it's easier
to be optimized than its result compared with other values. 

3. the implementation of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 have been in latest GCC, and
it optimizes the following
cases on ALL platforms:

memcmp (p, "fishi", 5) != 0
__builtin_memcmp (p, "fish", 8) == 0

and the implementation is a target independent one. when the length of constant
string is multiple word size, it is optimized by “strlen” pass,
when it’s NOT multiple word size, it is optimized by “expand” pass. 

4. However, this implementation to PR52171 does NOT optimize the
following cases:

A.   memcmp’s result is NOT compared to zero

memcmp (p, "fish", 4);   
__builtin_memcmp (p, “fishi”, 5);   

B.  strncmp and strcmp, when the result is or is NOT compared to zero
strncmp (p, "fi", 2) != 0
__builtin_strncmp (p, "fi", 2) != 0

strcmp (p, "fish") != 0
__builtin_strcmp (p, "fish") != 0

strncmp (p, "fi", 2) 
__builtin_strncmp (p, "fi", 2)

strcmp (p, "fish”)   
__builtin_strcmp (p, "fish”)   

Per the reason I mentioned in 1.  strncmp and strcmp can ONLY compare the char
one by one,  the implementation of PR52171 can NOT be extended 
for str(n)cmp.  


5. currently,  glibc optimizes strcmp with constant string up to size 3
in the header as Wilco mentioned in the Description part of the PR. the
optimization is as following:

strcmp (p, “fis”)

will be transformed to:

D.2234 = __s2_len = 3;, __s2_len <= 3; ? TARGET_EXPR  1 && __result == 0)
  {
__result = (int) *(__s1 + 2) - (int) *((const unsigned char *)
"fis" + 2);
if (__s2_len > 2 && __result == 0)
  {
__result = (int) *(__s1 + 3) - (int) *((const unsigned char
*) "fis" + 3);
  }
  }
  }
  }
  D.2233 = __result;
}> : __builtin_strcmp ((const char *) p, (const char *) "fis");

I.e, each character is compared one by one, and the result of the previous
comparison is used to control whether the next char should be compared or not,
as a result,  comparing of one char needs one load, one compare and one branch.
 not very cheap operations. therefore, we cannot apply the above optimization
on longer strings even though its constant string.

The Request of this PR is to move the above transformation from Glibc into GCC.

However, my run-time performance data showed in Comment 5 and 6, shows:

the glibc optimization is SLOWER than the direct call to strcmp on aarch64.

  6.  On the other hand, on a platform that has hardward strcmp insn,  for
example, X86 platform, expand the str(n)cmp to use the hardware strcmp insn
will be very fast.  GCC currently has done this for several platforms.  

  However, for a platform that does NOT have hardware strcmp insns, for
example, aarch64,  inline str(n)cmp might NOT be a good idea, I think.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #10 from Qing Zhao  ---
>> From the data, we can see the inlined version of strcmp (by glibc) is much
>> slower than the direct call to strcmp.  (this is for size 2)
>> I am using GCC farm machine gcc116:
> 
> This result doesn't make sense - it looks like GCC is moving the strcmp call 
> in
> the 2nd case as a loop invariant, so you're just measuring a loop with just a
> subtract and orr instruction…

Yes, Wilco is right here.  -ftree-loop-im moves the call to strcmp out of the
loop.
in order to avoid this issue, I changed the options to

-O -fno-tree-loop-im

and checked the assembly of the routine “cmp2” for the INLINED and Non-INLINED
version.

Inlined version:
cmp2:
mov x4, x0
mov w2, 51712
movkw2, 0x3b9a, lsl 16
mov w0, 0
mov w3, 102
b   .L3
.L2:
neg w1, w1
orr w0, w0, w1
subsw2, w2, #1
beq .L5
.L3:
ldrbw1, [x4]
subsw1, w3, w1
bne .L2
ldrbw1, [x4, 1]
neg w1, w1
b   .L2
.L5:
ret

Non-inlined version:
cmp2:
stp x29, x30, [sp, -48]!
add x29, sp, 0
stp x19, x20, [sp, 16]
stp x21, x22, [sp, 32]
mov x22, x0
mov w19, 51712
movkw19, 0x3b9a, lsl 16
mov w20, 0
adrpx21, .LC0
add x21, x21, :lo12:.LC0
.L2:
mov x1, x21
mov x0, x22
bl  strcmp
orr w20, w20, w0
subsw19, w19, #1
bne .L2
mov w0, w20
ldp x19, x20, [sp, 16]
ldp x21, x22, [sp, 32]
ldp x29, x30, [sp], 48
ret

Then, the run-time performance data is:

qinzhao@gcc116:~/Bugs/78809/const_cmp/perf$ sh t_p
/home/qinzhao/Install/latest/bin/gcc -O -fno-tree-loop-im t_p_1.c t_p.c
-DINLINED
inlined version
34.73user 0.00system 0:34.73elapsed 99%CPU (0avgtext+0avgdata 360maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps
/home/qinzhao/Install/latest/bin/gcc -O -fno-tree-loop-im t_p_1.c t_p.c
non-inlined version
138.79user 0.00system 2:18.77elapsed 100%CPU (0avgtext+0avgdata
356maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps

Yes, looks like that the inlined version is much faster than the non-inlined
version on aarch64 platform.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #11 from Qing Zhao  ---
(In reply to Wilco from comment #9)

> str(n)cmp with a constant string can be changed into memcmp if the string has 
> a
> known alignment or is an array of known size. We should check the common cases
> are implemented.

Please provide an example in which a str(n)cmp with a constant string can be
changed into
memcmp. 

(From my understanding, for the strcmp (p, “fish”),  since we don’t know what
will be in the string
pointed by “p”, and there might be NULL_terminator in any of the place of p, we
have to compare
each char in “p” one by one, do I miss anything here?)

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #13 from Qing Zhao  ---
(In reply to Richard Earnshaw from comment 12)
> That's not entirely correct.  Notionally memcmp needs to return a value
> representing the relative difference of the first different byte in the
> compared areas of memory; any later bytes are irrelevant.  Yes the compiler 
> can
> compare multiple bytes at the same time and it does not have to worry about
> page faulting, but it does have to keep track of where the first difference
> occurs.
> 
> Of course, the compiler can see how the result is used to optimize things
> further; a simple equality test will allow the compiler to generate a simpler
> sequence that could access all bytes and accumulate the overall result.

Yes.  this should be the reason why currently in GCC, only when the result of 
memcpy is compared to zero, it is optimized by “strlen” pass and “expand” pass.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #15 from Qing Zhao  ---
(In reply to Wilco from comment 14)
> The only reason we have to do a character by character comparison is because 
> we
> cannot read beyond the end of a string. However when we know the size or
> alignment we can safely process a string one word at a time.

is it possible that “NULL_terminator” is in the middle of the string even
though we 
know the size or alignment? for example:

const char s[8] = “abcd\0abc”;  // null byte in the middle of the string
int f2(void) { return __builtin_strcmp(s, "abc") != 0; }
int f3(void) { return __builtin_strcmp(s, “abc”); }

can either of the above f2 or f3 been optimized to memcmp? seems not.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #17 from Qing Zhao  ---
(In reply to Wilco from comment #16)

>> const char s[8] = “abcd\0abc”;  // null byte in the middle of the string
>> int f2(void) { return __builtin_strcmp(s, "abc") != 0; }
>> int f3(void) { return __builtin_strcmp(s, “abc”); }
>> 
>> can either of the above f2 or f3 been optimized to memcmp? seems not.
> 
> You never get that to the null byte as the memcmp only compares 
> strlen("abc"+1)
> characters.

Yes, this is correct for memcmp, but not for strcmp.  strcmp will get to the
null byte. 
as a result,  

const char s[8] = “abcd\0abc”;
strcmp (s, “abc”) != 0.   // s = “abcd", which is != “abc"
strncmp (s, “abc”, 3) == 0
memcmp(s, “abc”, 3) == 0

So, strcmp cannot optimized to memcmp 

> However do you mean an input string which is shorter than the
> constant string? That's fine as this will compare not-equal in the memcmp.

for the input string is shorter than the constant string, for example: 

const char s[8] = “ab\0\0abcd”;
strcmp (s, “abc”) != 0
strncmp (s, “abc”, 3) != 0
memcmp (s, “abc”,3) != 0

In a summary, since it’s NOT easy for the compiler to know where is the
“Null_terminator” 
in the string, strcmp is NOT reasonable to be optimized to memcmp whenever its
result is 
used to compare with zero or not.

But for strncmp, if the result is to compare with zero, it might be reasonable
to optimized it
to the corresponding memcmp, i.e

strncmp (s, “abc”, 3) != 0

could be optimized to

memcmp (s, “abc”, 3) != 0

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-11-06 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809

--- Comment #20 from Qing Zhao  ---
design doc is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html

[Bug tree-optimization/83026] missing strlen optimization for strcmp of unequal strings

2017-11-20 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #2 from Qing Zhao  ---
This is reasonable. I will add the support for this into my implementation for
PR78809.

[Bug middle-end/79538] missing -Wformat-overflow with %s and non-member array arguments

2017-11-20 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79538

Qing Zhao  changed:

   What|Removed |Added

 CC||qing.zhao at oracle dot com

--- Comment #2 from Qing Zhao  ---
this is because the routine "get_range_strlen" misses the handling of regular
arrays. adding the support should fix this issue.

[Bug tree-optimization/83026] missing strlen optimization for strcmp of unequal strings

2017-12-05 Thread qing.zhao at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026

--- Comment #4 from Qing Zhao  ---
Note, this optimization is only valid when the result of the strcmp is used to
compare with zero.

[Bug rtl-optimization/99421] ICE:in code_motion_process_successors, at sel-sched.c:6389 on aarch64

2021-03-09 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99421

--- Comment #8 from Qing Zhao  ---
> On Mar 8, 2021, at 11:58 AM, marxin at gcc dot gnu.org 
>  wrote:
> 
> Sure. I used C-Vise:
> https://github.com/marxin/cvise

>From my understanding, cvise is similar as creduce, but faster.
So, I am still wondering why creduce cannot reduce testing cases when using
profiling feedback due to
Profiling mismatch? But cvise can do this? Does Cvise reduce source code and
.gcda file at the same 
time?

[Bug target/97715] [11 Regression] ICE in insn_default_length, at config/i386/i386.md:15325 since r11-4578-gd10f3e900b0377b4

2020-11-04 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97715

--- Comment #10 from Qing Zhao  ---
> On Nov 4, 2020, at 9:45 AM, ubizjak at gmail dot com 
>  wrote:
>> fixed registers should already be excluded from zeroing. 
>> are ST registers considered FIXED registers when -mno-80387 is specified?
> 
> They used to be,

I used the following in middle end to exclude fixed registers from being
zeroed:
  if (fixed_regs[regno])
continue;
So, looks like that ST registers are not included in fixed_regs when
!TARGET_80387. 
Is this a bug in gcc?

> but now they are cleared from accessible_reg_set.
> 
>  /* If the FPU is disabled, disable the registers.  */
>  if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
>accessible_reg_set &= ~reg_class_contents[FLOAT_REGS];

The above means that ST registers are excluded from the accessible_reg_set,
i.e, they cannot be used in the function body anymore? Then, no need to zero
them?

Is this information available to middle end?
Or I have to delete them from the register set in i386 backend?

[Bug target/97715] [11 Regression] ICE in insn_default_length, at config/i386/i386.md:15325 since r11-4578-gd10f3e900b0377b4

2020-11-04 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97715

--- Comment #13 from Qing Zhao  ---
> On Nov 4, 2020, at 10:04 AM, jakub at gcc dot gnu.org 
>  wrote:
> --- Comment #11 from Jakub Jelinek  ---
> I think you should do:
> --- gcc/function.c  2020-10-31 17:41:19.756740009 +0100
> +++ gcc/function.c  2020-11-04 17:02:51.199298173 +0100
> @@ -5871,6 +5871,8 @@ gen_call_used_regs_seq (rtx_insn *ret, u
>continue;
>   if (fixed_regs[regno])
>continue;
> +  if (!TEST_HARD_REG_BIT (accessible_reg_set, regno))
> +   continue;
>   if (REGNO_REG_SET_P (live_out, regno))\

Yes, this looks like a better solution. I will add this fix.

[Bug target/97715] [11 Regression] ICE in insn_default_length, at config/i386/i386.md:15325 since r11-4578-gd10f3e900b0377b4

2020-11-04 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97715

--- Comment #17 from Qing Zhao  ---
> On Nov 4, 2020, at 10:08 AM, ubizjak at gmail dot com 
>  wrote:
>> 
>> I used the following in middle end to exclude fixed registers from being
>> zeroed:
>>  if (fixed_regs[regno])
>>continue;
>> So, looks like that ST registers are not included in fixed_regs when
>> !TARGET_80387. 
>> Is this a bug in gcc?
> 
> fixed_regs members are only set with -ffixed-REG, in addition to members,
> initialized by the target-dependent initializer. It is not a bug.
Yes.

Then, we need to check whether the register is in “accessible_reg_set” to
exclude it from being
Zeroed as Jacub suggested in  Comment #14.  That looks like a better fix.

[Bug target/97715] [11 Regression] ICE in insn_default_length, at config/i386/i386.md:15325 since r11-4578-gd10f3e900b0377b4

2020-11-04 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97715

--- Comment #20 from Qing Zhao  ---
> On Nov 4, 2020, at 11:48 AM, jakub at gcc dot gnu.org 
>  wrote:
> 
> --- Comment #18 from Jakub Jelinek  ---
> --- gcc/function.c.jj   2020-10-31 17:41:19.756740009 +0100
> +++ gcc/function.c  2020-11-04 17:56:53.790403571 +0100
> @@ -5871,6 +5871,8 @@ gen_call_used_regs_seq (rtx_insn *ret, u
>continue;
>   if (fixed_regs[regno])
>continue;
> +  if (!TEST_HARD_REG_BIT (accessible_reg_set, regno))
> +   continue;
>   if (REGNO_REG_SET_P (live_out, regno))
>continue;
>   if (only_gpr
> --- gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c.jj 2020-11-04
> 17:59:19.063795531 +0100
> +++ gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c2020-11-04
> 17:59:01.956984879 +0100
> @@ -0,0 +1,5 @@
> +/* PR target/97715 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -mno-80387" } */
> +
> +#include "../../c-c++-common/zero-scratch-regs-10.c"
> 
> isn't enough though, because while the st to st(7) registers are inaccessible
> with -mno-80387, the MMX registers are accessible, so I think the
> zero_all_st_registers change is needed too.

Yeh, I noticed the same issue too.
I will add the fix you proposed previously.

[Bug target/97715] [11 Regression] ICE in insn_default_length, at config/i386/i386.md:15325 since r11-4578-gd10f3e900b0377b4

2020-11-04 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97715

--- Comment #21 from Qing Zhao  ---
> --- Comment #19 from Jakub Jelinek  ---
> And, actually the function.c change is probably unnecessary, because the
>  if (!crtl->abi->clobbers_full_reg_p (regno))
>continue;
>  if (!fixed_regs[i])
>continue;
> should already rule them out.
>  operand_reg_set &= accessible_reg_set;
> ...
>  /* If a register is too limited to be treated as a register operand,
> then it should never be allocated to a pseudo.  */
>  if (!TEST_HARD_REG_BIT (operand_reg_set, i))
>fixed_regs[i] = 1;

So, fixed_regs should already include the information of accessible_reg_set, no
need to check accessible_reg_set anymore.

[Bug testsuite/97680] [11 Regression] new test case c-c++-common/zero-scratch-regs-10.c in r11-4578 has excess errors

2021-02-26 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97680

--- Comment #10 from Qing Zhao  ---
> but it will still fail on all targets but x86_64 (and now powerpc).  Qinzhao,
> what's the list of targets this is supported?

I believe that the targets that currently support this feature are:
x86-64
aarch64
sparc

The original patch supported x86-64 and aarch64, later the following patch
https://gcc.gnu.org/pipermail/gcc-cvs/2020-December/338342.html


Support sparc

[Bug tree-optimization/107952] tree-object-size: inconsistent size for flexible arrays nested in structs

2023-01-24 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107952

--- Comment #10 from Qing Zhao  ---
> --- Comment #9 from Richard Biener  ---
> 
> GCC handles for example
> 
> struct A { char data[1]; };
> struct B { int n; struct A a; };
> 
> as if the a.data[] array is a flex-array.

Okay. Then the maximum size of __builtin_object_size for it should be -1,
right?

>  It also handles
> 
> struct C { int n; struct A a; int x; };
> 
> as if a.data[] can be up to 4 elements large (we allow an array to extend
> flexibly to padding - but only if it is trailing).

A little confused here:  
when the structure with a trailing flexible-array member is a middle
field of 
an outer structure, GCC extension will treat it as a flexible-array
too? But the
maximum size of this flexible-array will be bounded by the size of the
padding
of that field? 
Is the above understanding correct?

>  I see that's not
> consistently handled though.
> 
> I think the [] syntax should follow the C standard as what testcases are
> accepted/rejected by the frontend and any extensions there should be
> documented

Agreed, usually where these extension should be documented?

> (those are separate from what the former array_at_struct_end_p
> allowed semantically

So, your mean to leave such extension out of “array_at_struct_end_p” (the
current “array_ref_flexible_size_p”)? 
Handle them separately instead?

[Bug tree-optimization/107952] tree-object-size: inconsistent size for flexible arrays nested in structs

2023-01-25 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107952

--- Comment #13 from Qing Zhao  ---
> On Jan 25, 2023, at 2:32 AM, rguenther at suse dot de 
>  wrote:
>> 
>> A little confused here:  
>>when the structure with a trailing flexible-array member is a middle
>> field of 
>>an outer structure, GCC extension will treat it as a flexible-array
>> too? But the
>>maximum size of this flexible-array will be bounded by the size of the
>> padding
>>of that field? 
>> Is the above understanding correct?
> 
> That's correct - at least when using the get_ref_base_and_extent
> API.  I see that when using array_ref_flexible_size_p it doesn't
> return true (but it's technically not _flexible_, we just treat it with
> a bound size that doesn't match the declared bound).
For the current array_ref_flexible_size_p, we include the following 3 cases:

   A. a ref to a flexible array member at the end of a structure;
   B. a ref to an array with a different type against the original decl;
  for example:

   short a[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
   (*((char(*)[16])&a[0]))[i+8]

   C. a ref to an array that was passed as a parameter;
  for example:

   int test (uint8_t *p, uint32_t t[1][1], int n) {
   for (int i = 0; i < 4; i++, p++)
 t[i][0] = …;

It basically mean: return true if REF is to an array whose actual size might be
larger than its upper bound implies. 

I feel that the case "when the structure with a trailing flexible-array member
is a middle field of an outer structure” still fit here, but not very sure,
need more thinking...

> 
> The first is handled by the function just fine,

No, even the first case is not recognized by the current
“array_ref_flexible_size_p”, it’s not been identified as a flexible array right
now.
Shall we include this case into “array_ref_flexible_size_p”?  (It’s a GCC
extension).

> it's the second with the bound size that's not and that isn't a good fit 
> there,
> though we do handle padding to the end of a declaration where
> we return true.
> 
>> Handle them separately instead?
> 
> I'm not sure how important it is to hande the middle array
> extending to padding, ISTR there was some real world code
> "miscompiled" when treating the array domain as written.

We can leave the 2nd case in a later time to resolve -:)
>

[Bug tree-optimization/107952] tree-object-size: inconsistent size for flexible arrays nested in structs

2023-01-25 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107952

--- Comment #15 from Qing Zhao  ---
> On Jan 25, 2023, at 11:12 AM, siddhesh at gcc dot gnu.org 
>  wrote:
>> 
>>> The first is handled by the function just fine,
>> 
>> No, even the first case is not recognized by the current
>> “array_ref_flexible_size_p”, it’s not been identified as a flexible array
>> right now.
>> Shall we include this case into “array_ref_flexible_size_p”?  (It’s a GCC
>> extension).
> 
> In the first case, array_ref_flexible_size_p recognizes S2.flex.data as having
> flexible size.  The tests in my patch[1] for this bug checks for this.
Oh, yes. That’s right.
> 
> However, array_ref_flexible_size_p does not recognize S2.flex as having
> flexible size.  It might make sense to support that, i.e. any struct or union
> with the last element as a flex array should be recognized as having flexible
> size.

Since S2.flex is not an “array_ref”, it’s correct for array_ref_fleixble_size_p
to return false for it, I think. 
We might add a new utility routine to determine whether a ref to a struct or
union have flexible array?

[Bug middle-end/107411] trivial-auto-var-init=zero invalid uninitialized variable warning

2023-02-16 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107411

--- Comment #8 from Qing Zhao  ---
> On Feb 16, 2023, at 2:35 AM, rguenther at suse dot de 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107411
> 
> --- Comment #7 from rguenther at suse dot de  ---
> On Wed, 15 Feb 2023, qinzhao at gcc dot gnu.org wrote:
> 
> 
> Hmm, I don't think so.  So this is indeed expected behavior since the
> frontend IL doesn't have variable definitions with initializers but
> instead just (immediately following) assignments.

Then, if that’s the case, it also is correct to add the .DEFERRED_INIT to them
during gimplification?

[Bug middle-end/104550] bogus warning from -Wuninitialized + -ftrivial-auto-var-init=pattern

2022-02-21 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104550

--- Comment #20 from Qing Zhao  ---
> On Feb 21, 2022, at 1:56 AM, rguenth at gcc dot gnu.org 
>  wrote:
>> 
>> my question is, is the "info" in __builtin_clear_padding(&info) a REAL use
>> of "info"? is it correct to report the uninitialized use message for it?
> 
> As said I think reads emitted from __builtin_clear_padding lowering should
> not be considered for (uninit) diagnostics.
Okay.

[Bug tree-optimization/102586] [12 Regression] ICE in clear_padding_type, at gimple-fold.c:4798 since r12-3433-ga25e0b5e6ac8a77a

2022-02-11 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102586

--- Comment #23 from Qing Zhao  ---
> On Feb 11, 2022, at 9:29 AM, jason at gcc dot gnu.org 
>  wrote:
> 
> I wonder why -fauto-var-init uses builtin_clear_padding instead of just
> zero-initializing the whole object before normal initialization, as with
> value-initialization?  With a new object we don't need to get clever.

In the initial several versions of the implementation, I didn’t use
builtin_clear_padding, other that
that, I just zero-initialized the whole object before normal initialization.
However, multiple people
suggested to use builtin_clear_padding instead. Then in the later
implementations, I used
builtin_clear_padding for the padding initialization.

[Bug tree-optimization/102586] [12 Regression] ICE in clear_padding_type, at gimple-fold.c:4798 since r12-3433-ga25e0b5e6ac8a77a

2022-02-11 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102586

--- Comment #27 from Qing Zhao  ---
> On Feb 11, 2022, at 11:47 AM, jakub at gcc dot gnu.org 
>  wrote:
> But if I do:
> struct C { char a; int b; char c; long d; C () : a (42), b (42), c (42), d 
> (42)
> {} };
> void bar (C *);
> void
> foo ()
> {
>  C c;
>  bar (&c);
> }
> then *.gimple is:
>  c = .DEFERRED_INIT (24, 1, &"c"[0]);
>  __builtin_clear_padding (&c, 0B, 1);
>  C::C (&c);
>  bar (&c);
> ...
> void C::C (struct C * const this)
> {
>  *this = {CLOBBER};
>  {
>this->a = 42;
>this->b = 42;
>this->c = 42;
>this->d = 42;
>  }
> }
> After einline this is:
>  c = .DEFERRED_INIT (24, 1, &"c"[0]);
>  MEM  [(struct C *)&c + 1B] = {};
>  MEM  [(struct C *)&c + 9B] = {};
>  c ={v} {CLOBBER};
>  c.a = 42;
>  c.b = 42;
>  c.c = 42;
>  c.d = 42;
>  bar (&c);
> and that keeps until dse1 which optimizes that out:
>  c ={v} {CLOBBER};
>  c.a = 42;
>  c.b = 42;
>  c.c = 42;
>  c.d = 42;
>  bar (&c);
> so there is no zero padding initialization at all.

Does this issue only exist with -flifetime-dse=2?
When -flifetime-dse=2, the call to __builtin_clear_padding should be inserted
AFTER the
start point of the constructor of the object, otherwise it’s dead and will be
eliminated by DSE. 
And with -flifetime-dse=2, the padding initialization should be done by C++ FE
instead of middle end.
Is this understanding correct?

[Bug middle-end/104550] bogus warning from -Wuninitialized + -ftrivial-auto-var-init=pattern

2022-02-15 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104550

--- Comment #5 from Qing Zhao  ---
> On Feb 15, 2022, at 3:38 PM, pinskia at gcc dot gnu.org 
>  wrote:
> Maybe __builtin_clear_padding lowering should mark the load "MEM[(struct
> vx_audio_level *)&info]" as not needing a warning.
> 
This sound like a good idea, I will study a little bit more on this direction.

[Bug middle-end/104550] bogus warning from -Wuninitialized + -ftrivial-auto-var-init=pattern

2022-02-17 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104550

--- Comment #14 from Qing Zhao  ---
> On Feb 17, 2022, at 1:54 AM, rguenther at suse dot de 
>  wrote:
>> 
>> If padding clearing is exposed too late till RTL expansion, some tree
>> optimization will not be able to be applied on the expanded stores? 
> 
> Doesn't the same argument apply to .DEFERRED_INIT itself?  Dependent
> on the .DEFERRED_INIT expansion strathegy the padding clearing might
> be unneccessary (for example when using memset())?

Yes, because we use memset to initialize auto-var when it’s in memory, we do
not insert a call to
__builtin_clear_padding to zero initialization.  We only insert
__builtin_clear_padding to pattern
Initialization to set the padding to zeroes. That’s why this bug only exposed
with pattern init.
> 
>> the approach to mark the load "MEM" as not needing a warning might be better?
> 
> It's probably a good thing anyway, the 'R' in the RMW cycle isn't really
> a use.
If it’s not a real use, should we exclude this case from emitting the
uninitialized warning directly?

[Bug middle-end/103127] ICE in fold_convert_loc with __vector_quad and -ftrivial-auto-var-init=pattern on powerpc64*

2021-11-08 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103127

--- Comment #9 from Qing Zhao  ---
> On Nov 8, 2021, at 1:56 PM, bergner at gcc dot gnu.org 
>  wrote:
> 
> 
> So this then?

This is better, I think.
You can send a patch review request to gcc-patch alias for more comments or
approval.

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-11-23 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

--- Comment #4 from Qing Zhao  ---
> 
> You should be able to simply do
> 
> ../configure --target=riscv64-linux
> make all-gcc
> 
> and use the built gcc/cc1 to debug such ICEs.
> 

Thanks. Yes, I can repeat the ICE with this gcc even though “make install”
still failed with the following error:

make[2]: Leaving directory '/home/opc/Work/GCC/crossbuild/libiberty'
/bin/sh: line 3: cd: ./c++tools: No such file or directory
make[1]: *** [Makefile:12252: install-c++tools] Error 1
make[1]: Leaving directory '/home/opc/Work/GCC/crossbuild'
make: *** [Makefile:2538: install] Error 2

Anyway, I can debug this current bug with this gcc now.

[Bug middle-end/102285] New flag -ftrivial-auto-var-init=zero causes many crashes in the testsuite

2021-09-10 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102285

--- Comment #4 from Qing Zhao  ---
> On Sep 10, 2021, at 5:34 PM, pinskia at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102285
> 
> --- Comment #3 from Andrew Pinski  ---
> I wonder if most of these were fixed by r12-3447-g79f488de3036a

There are two patches by Richard Biener last night to fix two bugs:

commit 79f488de3036a4a4be08df2a782e6eb02419db19
Author: Richard Biener 
Date:   Fri Sep 10 12:28:09 2021 +0200

middle-end/102273 - avoid ICE with auto-init and nested functions

This refactors expansion to consider non-decl LHS.  I suspect
the is_val argument is not needed.

commit 1dae802b685937b1dc52e49d0641c75f3186ba14
Author: Richard Biener 
Date:   Fri Sep 10 10:17:24 2021 +0200

middle-end/102269 - avoid auto-init of empty types

This avoids initializing empty types for which we'll eventually
leave a .DEFERRED_INIT call without a LHS.

I checked the failed testing cases listed by David, many of them have empty
types.

So, I guess that the above two patches might already fix these failures.


> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug sanitizer/102317] signed integer overflow sanitizer cannot work well with -fno-strict-overflow

2021-09-13 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102317

--- Comment #5 from Qing Zhao  ---
> On Sep 13, 2021, at 4:45 PM, pinskia at gcc dot gnu.org 
>  wrote:
> 
>> is it possible to make -fsanitize=signed-integer-overflow work with -fwrapv?
> 
> Why would it? they conflict.

This is a feature that is requested by Kees Cook for kernel security usage. 

"the kernel builds with -fno-strict-overflow which removes
possible undefined behavior, but I still want the sanitizer to catch
this case.”

[Bug middle-end/102285] New flag -ftrivial-auto-var-init=zero causes crash in pr82421.c

2021-10-01 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102285

--- Comment #16 from Qing Zhao  ---
> On Oct 1, 2021, at 1:51 AM, rguenth at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102285
> 
> --- Comment #13 from Richard Biener  ---
> Because the variable doesn't need to be addressable.

One question:

For the following statement in the routine “fold_builtin_alloca_with_align” in
tree-ssa-ccp.c:


  /* Fold alloca to the address of the array.  */
  return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));

Clearly, we build a add_expr for “var”, but why not “mark_addressable” for
“var” since its address is taken?

[Bug target/102683] [12 Regression] ICE in set_min_and_max_values_for_integral_type, at stor-layout.c:2851

2021-10-11 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102683

--- Comment #5 from Qing Zhao  ---
> --- Comment #4 from Richard Biener  ---
> But we should avoid
> the .DEFERRED_INIT here.  The GENERIC is
> 
>struct C y;
>  <  (void) (y = n == 1 ? TARGET_EXPR  : TARGET_EXPR  {.c=__complex__ (3.0e+0, 3.0e+0)}>) >;
> 
> so we have first
> 
>stmt 
>side-effects arg:0 
>t.i:10:5 start: t.i:10:5 finish: t.i:10:5>
> 
> and then
> 
>stmt  void>
>side-effects
>arg:0  void>
>side-effects
>arg:0  0x76558f18 void>
>side-effects
>arg:0  0x76663d20 C>
>side-effects arg:0 
>arg:1  0x76663d20 C>
> ...
> 
> so this is an INIT_EXPR which never(?) requires .DEFERRED_INIT?  Of course
> we're instrumenting the DECL_EXPR, not the INIT_EXPR and at that point
> we didn't see the INIT_EXPR.

Currently, in simplification phase, for each DECL_EXPR, we check:

  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
{
  tree init = DECL_INITIAL (decl);

  If (!init. && is_var_need_auto_init (decl))
gimple_add_init_for_auto_var (decl…)


We assume that FE will put the explicit initializer of a DECL to its
DECL_INITIAL. 
Clearly for this testing case, this is not the case.

I am wondering why FE does not put the initializer of this DECL to its
DECL_INITIAL for this case?

> 
> I wonder if we want a bit on DECL_EXPR denoting whether the decl is
> fully initialized?

Yes, if FE can mark this bit, it will be easier and simpler for the
implementation.

[Bug c++/102281] -ftrivial-auto-var-init=zero causes ice

2021-10-11 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281

--- Comment #10 from Qing Zhao  ---
> On Oct 11, 2021, at 3:29 PM, jakub at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281
> 
> --- Comment #8 from Jakub Jelinek  ---
> (In reply to qinzhao from comment #7)
>> I agree that the above additional check  is necessary.
>> 
>> one question here is, can I use the routine "bool
>> clear_padding_type_may_have_padding_p (tree type)" in gimple-fold.c instead
>> of the above new function for this purpose?
> 
> Sure, you can, but just note that it is conservative and fast, it will return
> true even for types that don't have any padding.
> But for double, _Complex double, int etc. it will quickly return false.
> I guess I should use it in c-omp.c too.

Yes, the routine “clear_padding_type_may_have_padding_p” is quicker when
returning FALSE. 
When it return TRUE, might not be very accurate, at this time, we can further
call the new routine 
to make it accurate. 
Another question here, what’s the purpose for the routine
“clear_type_padding_in_mask”?

> But also note that it will return true for x86 long double/_Complex long
> double,
> and if you have a variable that isn't addressable, you need to figure out what
> to do.
Yes, we should resolve this issue too. 

>  I think it might be easiest not to clear anything,

Do you mean not to call clear padding for any variable here? Or only for
variables with type long double/_Complex long double?

> another option is to
> create a temporary, store the var value in there, clear padding and copy back,
> but in the end the padding bits will probably not stay cleared.

Why not?

> After all, e.g. on x86-64 -m64 the return value will be either in %st or
> %st/%st(1) pair and the padding isn't present there

You mean for variables with type long double/_Complex long doubles, if they are
in register (Not-addressable), they don’t have
Padding, so, don’t need to clear padding at all? (But this is only true for
x86-64)

> (but e.g. for ia32 _Complex
> long double is returned through invisible reference and padding is there,

If they are returned through invisible reference, does this mean they are
addressable?
> but
> you'd need to perform clear padding like operation during expansion).
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug rtl-optimization/108132] Wrong instruction scheduling around function call with longjmp on aarch64 at O2

2022-12-15 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108132

--- Comment #7 from Qing Zhao  ---
> On Dec 15, 2022, at 2:33 PM, pinskia at gcc dot gnu.org 
>  wrote:
> 
> 
> There is a patch in PR 57067 even which you could try to port to a newer
> compiler version and fix up.

Okay, will check that patch.
thanks.
>

[Bug middle-end/105539] -ftrivial-auto-var-init=zero happening too late?

2022-05-11 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105539

--- Comment #9 from Qing Zhao  ---
> It's a conditional uninit use which we do not warn on early by design
> (for the fear of too many false positives, that is).
Okay. 
> 
>> 2. the ccp optimization deletes the if (z) statement completely:
>> 
>> [opc@qinzhao-aarch64-ol8 105539]$ cat t.c.034t.ccp1
>> 
>> ;; Function x (x, funcdef_no=0, decl_uid=3591, cgraph_uid=1, symbol_order=0)
>> 
>> Removing basic block 3
>> Merging blocks 2 and 4
>> int x (int z)
>> {
>>  int y;
>> 
>>   :
>>  return 10;
>> 
>> }
> 
> That's expected from optimistic lattice propagation which merges TOP with
> the constant.

If y is initialized to 0 at declaration site, the “if z” statement is NOT
deleted by CCP. 

However, per design of -ftrivial-auto-var-init, we intent to not treat

y = .DEFERRED_INIT (4, 2, &"y"[0]);
Same as
y= 0 

In order to keep -Wuninitialized warning analysis working as expected.

So, yes, I agree that this issue is unavoided based on the current design. 

> 
>> are the above two bugs?
> 
> Well, it works as designed.  But sure, that we fail to diagnose the uninit
> use is unfortunate and there exist bugs for this specific issue already.

[Bug target/101891] Adjust -fzero-call-used-regs to always use XOR

2022-05-24 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101891

--- Comment #8 from Qing Zhao  ---
> On May 24, 2022, at 11:42 AM, arjan at linux dot intel.com 
>  wrote:
> 
> --- Comment #7 from Arjan van de Ven  ---
> from a performance angle, the xor-only sequence is not so great at all; modern
> CPUs are really good at eliminating mov's so that's why the code originally 
> was
> added to do a combo of xor and mov..

Are you saying that the Xor-only sequence is slower than the previous XOR + MOV
sequence?
If so, can you explain a little bit more on why? And do you have any data for
this claim?

Thanks.

[Bug middle-end/101836] __builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails

2022-06-14 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

--- Comment #27 from Qing Zhao  ---
> On Jun 14, 2022, at 11:39 AM, siddhesh at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
> 
> --- Comment #26 from Siddhesh Poyarekar  ---
> (In reply to qinzhao from comment #25)
>> So, based on all the discussion so far, how about the following:
>> 
>> ** add the following gcc option:
>> 
>> -fstrict-flex-arrays=[0|1|2|3]
>> 
>> when -fstrict-flex-arrays=0:
>> treat all trailing arrays as flexible arrays. the default behavior;
> 
> Wouldn't this be -fno-strict-flex-arrays, i.e. the current behaviour?

Yes, it’s the same.  =0 is aliased with -fno-strict-flex-arrays.

The point is, the larger the value of LEVEL, the stricter with treating the
flexing array.

i.e, 0 is the least strict, and 3 is the strictest mode.

But we can delete the level 0 if not necessary.
> 
>> when -fstrict-flex-arrays=1:
>> Only treating [], [0], and [1] as flexible array;
>> 
>> when -fstrict-flex-arrays=2:
>> Only treating [] and [0] as flexible array;
>> 
>> when -fstrict-flex-arrays=3:
>> Only treating [] as flexible array; The strictest level.
> 
> If yes, then you end up having:
> 
> -fstrict-flex-arrays=[1|2|3]
> 
> with, I suppose, 1 as the default based on Jakub's comment about maximum
> compatibility support.
Yes.  And 3 is the one Kees requested for kernel usage.

[Bug target/106270] [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

--- Comment #4 from Qing Zhao  ---
> On Jul 12, 2022, at 1:02 PM, wilco at gcc dot gnu.org 
>  wrote:
> 
> Note that GCC could split huge .text sections automatically to allow insertion
> of linker veneers every 128MB.

Does GCC do this by default? Any option is needed for this functionality?

[Bug middle-end/101586] ICE:in clear_padding_type, at gimple-fold.c:4783 with call to __builtin_clear_padding for C++

2021-07-23 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101586

--- Comment #7 from Qing Zhao  ---
> On Jul 23, 2021, at 10:10 AM, jakub at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101586
> 
> --- Comment #6 from Jakub Jelinek  ---
> It is related to the weird FIELD_DECLs the C++ FE creates for the virtual
> inheritence, there is e.g. a FIELD_DECL with B type (where B has 16 byte size
> and includes the 8 byte virtual pointer and 1 byte A), but the size of the
> FIELD_DECL is just 8 bytes, which is something the clear padding code didn't
> expect to see.

So, is such type information generated by C++FE correct?
> 
> BTW, if you are using the clear padding code for -ftrivial*, unless it is
> clear_type_padding_in_mask, it can error e.g. on flexible array members, which
> is fine for the builtin, but probably not fine for -ftrivial*.

I noticed this,  and then I fixed this by adding a third argument for
__builtin_clear_padding to indicate whether it’s for auto init or not.
If for auto init, then not emit the error, is this fix good?

(BTW, what’s clear_type_padding_in_mask try to do? Should I use it instead?)

[Bug middle-end/101586] ICE:in clear_padding_type, at gimple-fold.c:4783 with call to __builtin_clear_padding for C++

2021-07-23 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101586

--- Comment #9 from Qing Zhao  ---
> On Jul 23, 2021, at 10:34 AM, jakub at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101586
> 
> --- Comment #8 from Jakub Jelinek  ---
> (In reply to Qing Zhao from comment #7)
>> So, is such type information generated by C++FE correct?
> 
> Yes.  It is needed that way to follow the Itanium C++ ABI.

Okay. I see.
Then with such type info, can clear_type_padding fill all the paddings for such
type?
> 
>>> BTW, if you are using the clear padding code for -ftrivial*, unless it is
>>> clear_type_padding_in_mask, it can error e.g. on flexible array members, 
>>> which
>>> is fine for the builtin, but probably not fine for -ftrivial*.
>> 
>> I noticed this,  and then I fixed this by adding a third argument for
>> __builtin_clear_padding to indicate whether it’s for auto init or not.
>> If for auto init, then not emit the error, is this fix good?
>> 
>> (BTW, what’s clear_type_padding_in_mask try to do? Should I use it instead?)
> 
> I'd need to see a patch.  

Do you want me send my current patch to you for some comments for this part?
Then I can email to you with a separate email.
> The builtin itself has a single argument and
> shouldn't accept more than one, gimple_fold_builtin_clear_padding has also 
> just
> a single argument.

Currently, in gimplification phase, we already added one more argument to this
call: (gimplify.c)


  case BUILT_IN_CLEAR_PADDING:
if (call_expr_nargs (*expr_p) == 1)
  {
/* Remember the original type of the argument in an internal
   dummy second argument, as in GIMPLE pointer conversions are
   useless.  */
p = CALL_EXPR_ARG (*expr_p, 0);
*expr_p
  = build_call_expr_loc (EXPR_LOCATION (*expr_p), fndecl, 2, p,
 build_zero_cst (TREE_TYPE (p)));
return GS_OK;
=

So, I just added one more argument in the above to distinguish the auto init.
>  clear_type_padding_in_mask doesn't emit code to clear any
> memory, but fills in an array with bitmask which bits contain padding and 
> which
> don't.
> 
> -- 
> You are receiving this mail because:
> You reported the bug.

[Bug c/116016] enhancement: add __builtin_set_counted_by(P->FAM, COUNT) or equivalent

2024-07-30 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016

--- Comment #25 from Qing Zhao  ---
> --- Comment #24 from Martin Uecker  ---
> 
> A builtin that returns the name or an lvalue for the member would seem to be
> more useful, but then harder to ignore when there is no attribute.

You mean, instead of providing __builtin_set_counted_by (P->FAM, COUNT),
providing the following:

__builtin_get_counted_by (P->FAM)

To return the counted_by field P->count, then let the user to add the
assignment in the source code. 

i.e,  in the source code level, instead of

__builtin_set_counted_by (P->FAM, COUNT);

The source code need to be:

If (__builtin_get_counted_by (P->FAM))
  __builtin_get_counted_by (P->FAM) = COUNT;

Yes, I agree that this is good too for the original purpose. And also even
simpler and more flexible.
Kees might have more comments here. (Not sure any other impact on handling the
original problem he had with the new __builtin_get_counted_by).

> 
> But I wonder why a new assignment is needed. Shouldn't there be an assignment
> anyway?

I think the assignment could be easily to be added in the source code.

[Bug c/116016] enhancement: add __builtin_set_counted_by(P->FAM, COUNT) or equivalent

2024-07-30 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116016

--- Comment #27 from Qing Zhao  ---
--- Comment #26 from Siddhesh Poyarekar  ---
> (In reply to Qing Zhao from comment #25)
>> If (__builtin_get_counted_by (P->FAM))
>>  __builtin_get_counted_by (P->FAM) = COUNT;
> 
> Do we have language constructs where the existence of an identifier/expression
> (and not its value) is evaluated in a condition like this?
Good point (I am not sure here). Any suggestion on what’s the prototype of this
new __builtin_get_counted_by, and how to 
use it in the source code to resolve the original problem?

[Bug middle-end/116933] various issues of -ftrivial-auto-var-init=zero with Ada

2024-10-04 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116933

--- Comment #11 from Qing Zhao  ---
> On Oct 4, 2024, at 14:03, ebotcazou at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116933
> 
> --- Comment #7 from Eric Botcazou  ---
> Another installment in the series "How come this worked before?":

We added one more argument for __builtin_clear_padding to distinguish whether
this call is for AUTO_INIT or not. 
> 
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index bc50afca9a3..095c02c5474 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -9848,7 +9848,6 @@ build_common_builtin_nodes (void)
>   ftype = build_function_type_list (void_type_node,
>ptr_type_node,
>ptr_type_node,
> -   integer_type_node,

This integer_type_node is for the new argument.

[Bug middle-end/116933] various issues of -ftrivial-auto-var-init=zero with Ada

2024-10-07 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116933

--- Comment #18 from Qing Zhao  ---
> On Oct 7, 2024, at 11:34, ebotcazou at gcc dot gnu.org 
>  wrote:
> I see, thanks for investigation!  This was overlooked because the C family of
> compiler do not use the declaration built in common_builtin_nodes, but rather
> that derived from builtins.def:
> 
> DEF_GCC_BUILTIN(BUILT_IN_CLEAR_PADDING, "clear_padding",
> BT_FN_VOID_VAR, ATTR_NOTHROW_NONNULL_TYPEGENERIC_LEAF)
> 
> which accepts any number of arguments.

Oh, I see. That’s the reason this issue was just exposed now..
Thank you for fixing this issue.

[Bug tree-optimization/116585] [12/13/14 Regression] SSA corruption with -O3

2024-09-26 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116585

--- Comment #7 from Qing Zhao  ---
> 
> Yes, I'll pick it up after some soaking on trunk during my next backporting
> round.  If you want to do the work of cherry-picking and regtesting feel free
> to do this earlier - it's been a week on trunk and a quite safe change.
> 

Sure, I can do the backporting to GCC12, 13,and 14.

  1   2   >