Gcc profile questions

2019-02-19 Thread Qing Zhao
Hi,

Suppose we have a program called foo which is built with gcc 
-fprofile-generate,  Now when foo is executed a bunch of .gcda files are 
created.  

What happens when foo is executed more than once.  Are the .gcda files updated 
with each execution?  Or are the .gcda files overwritten with new .gcda files 
(and thus contain profile info only for the last execution)?

Thanks for the info.

Qing

A bug in vrp_meet?

2019-02-28 Thread Qing Zhao
Hi,

I have been debugging a runtime error caused by value range propagation. and 
finally located to the following gcc routine:

vrp_meet_1 in gcc/tree-vrp.c


/* Meet operation for value ranges.  Given two value ranges VR0 and
   VR1, store in VR0 a range that contains both VR0 and VR1.  This
   may not be the smallest possible such range.  */

static void 
vrp_meet_1 (value_range *vr0, const value_range *vr1)
{
  value_range saved;

  if (vr0->type == VR_UNDEFINED)
{
  set_value_range (vr0, vr1->type, vr1->min, vr1->max, vr1->equiv);
  return;
}

  if (vr1->type == VR_UNDEFINED)
{
  /* VR0 already has the resulting range.  */
  return;
}


In the above, when one of vr0 or vr1 is VR_UNDEFINED,  the meet result of these 
two will be  the other VALUE. 

This seems not correct to me. 

For example, the following is the located incorrect value range propagation:  
(portion from the dump file *.181t.dom3)


Visiting PHI node: i_83 = PHI <_152(20), 0(22)>
Argument #0 (20 -> 10 executable)
_152: UNDEFINED
Argument #1 (22 -> 10 executable)
0: [0, 0]
Meeting
  UNDEFINED
and
  [0, 0]
to
  [0, 0]
Intersecting
  [0, 0]
and
  [0, 65535]
to
  [0, 0]



In the above, “i_83” is defined as PHI <_152(20), 0(22)>,   the 1st argument is 
UNDEFINED at this time(but its value range definitely is NOT [0,0]),
 and the 2nd argument is 0.

“vrp_meet” generate a VR_RANGE with [0,0] for “i_83” based on the current 
algorithm.  Obviously, this result VR_RANGE with [0,0] does NOT 
contain the value ranges for _152. 

 the result of “vrp_meet” is Not correct.  and this incorrect value range 
result finally caused the runtime error. 

I ‘d like to modify the vrp_meet_1 as following:


static void 
vrp_meet_1 (value_range *vr0, const value_range *vr1)
{
  value_range saved;

  if (vr0->type == VR_UNDEFINED)
{
  /* VR0 already has the resulting range. */
  return;
}

  if (vr1->type == VR_UNDEFINED)
{
  set_value_range_to_undefined (vr0)
 return;
}


let me know your opinion.

thanks a lot for the help.

Qing




Re: A bug in vrp_meet?

2019-03-01 Thread Qing Zhao
Jeff,

thanks a lot for the reply.

this is really helpful.

I double checked the dumped intermediate file for pass “dom3", and located the 
following for _152:

BEFORE the pass “dom3”, there is no _152, the corresponding Block looks 
like:

   [local count: 12992277]:
  _98 = (int) ufcMSR_52(D);
  k_105 = (sword) ufcMSR_52(D);
  i_49 = _98 > 0 ? k_105 : 0;

***During the pass “doms”,  _152 is generated as following:

Optimizing block #4
….
Visiting statement:
i_49 = _98 > 0 ? k_105 : 0;
Meeting
  [0, 65535]
and
  [0, 0]
to
  [0, 65535]
Intersecting
  [0, 65535]
and
  [0, 65535]
to
  [0, 65535]
Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
  Replaced 'k_105' with variable '_98'
gimple_simplified to _152 = MAX_EXPR <_98, 0>;
i_49 = _152;
  Folded to: i_49 = _152;
LKUP STMT i_49 = _152
 ASGN i_49 = _152

then bb 4 becomes:

   [local count: 12992277]:
  _98 = (int) ufcMSR_52(D);
  k_105 = _98;
  _152 = MAX_EXPR <_98, 0>;
  i_49 = _152;

and all the i_49 are replaced with _152. 

However, the value range info for _152 doesnot reflect the one for i_49, it 
keeps as UNDEFINED. 

is this the root problem?  

thanks a lot.

Qing

> On Feb 28, 2019, at 1:54 PM, Jeff Law  wrote:
> 
> On 2/28/19 10:05 AM, Qing Zhao wrote:
>> Hi,
>> 
>> I have been debugging a runtime error caused by value range propagation. and 
>> finally located to the following gcc routine:
>> 
>> vrp_meet_1 in gcc/tree-vrp.c
>> 
>> 
>> /* Meet operation for value ranges.  Given two value ranges VR0 and
>>   VR1, store in VR0 a range that contains both VR0 and VR1.  This
>>   may not be the smallest possible such range.  */
>> 
>> static void 
>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>> {
>>  value_range saved;
>> 
>>  if (vr0->type == VR_UNDEFINED)
>>{
>>  set_value_range (vr0, vr1->type, vr1->min, vr1->max, vr1->equiv);
>>  return;
>>}
>> 
>>  if (vr1->type == VR_UNDEFINED)
>>{
>>  /* VR0 already has the resulting range.  */
>>  return;
>>}
>> 
>> 
>> In the above, when one of vr0 or vr1 is VR_UNDEFINED,  the meet result of 
>> these two will be  the other VALUE. 
>> 
>> This seems not correct to me. 
> That's the optimistic nature of VRP.  It's generally desired behavior.
> 
>> 
>> For example, the following is the located incorrect value range propagation: 
>>  (portion from the dump file *.181t.dom3)
>> 
>> 
>> Visiting PHI node: i_83 = PHI <_152(20), 0(22)>
>>Argument #0 (20 -> 10 executable)
>>_152: UNDEFINED
>>Argument #1 (22 -> 10 executable)
>>0: [0, 0]
>> Meeting
>>  UNDEFINED
>> and
>>  [0, 0]
>> to
>>  [0, 0]
>> Intersecting
>>  [0, 0]
>> and
>>  [0, 65535]
>> to
>>  [0, 0]
>> 
>> 
>> 
>> In the above, “i_83” is defined as PHI <_152(20), 0(22)>,   the 1st argument 
>> is UNDEFINED at this time(but its value range definitely is NOT [0,0]),
>> and the 2nd argument is 0.
> If it's value is undefined then it can be any value we want.  We choose
> to make it equal to the other argument.
> 
> If VRP later finds that _152 changes, then it will go back and
> reevaluate the PHI.  That's one of the basic design principles of the
> optimistic propagators.
> 
>> 
>> “vrp_meet” generate a VR_RANGE with [0,0] for “i_83” based on the current 
>> algorithm.  Obviously, this result VR_RANGE with [0,0] does NOT 
>> contain the value ranges for _152. 
>> 
>> the result of “vrp_meet” is Not correct.  and this incorrect value range 
>> result finally caused the runtime error. 
>> 
>> I ‘d like to modify the vrp_meet_1 as following:
>> 
>> 
>> static void 
>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>> {
>>  value_range saved;
>> 
>>  if (vr0->type == VR_UNDEFINED)
>>{
>>  /* VR0 already has the resulting range. */
>>  return;
>>}
>> 
>>  if (vr1->type == VR_UNDEFINED)
>>{
>>  set_value_range_to_undefined (vr0)
>> return;
>>}
>> 
>> 
>> let me know your opinion.
>> 
>> thanks a lot for the help.
> I think we (Richi and I) went through this about a year ago and the
> conclusion was we should be looking at how you're getting into the
> vrp_meet with the VR_UNDEFINED.
> 
> If it happens because the user's code has an undefined use, then, the
> consensus has been to go ahead and optimize it.
> 
> If it happens for any other reason, then it's likely a bug in GCC.  We
> had a couple of these when we made EVRP a re-usable module and started
> exploiting its data in the jump threader.
> 
> So you need to work backwards from this point to figure out how you got
> here.
> 
> jeff



Re: A bug in vrp_meet?

2019-03-01 Thread Qing Zhao


> On Mar 1, 2019, at 1:25 PM, Richard Biener  wrote:
> 
> On March 1, 2019 6:49:20 PM GMT+01:00, Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> Jeff,
>> 
>> thanks a lot for the reply.
>> 
>> this is really helpful.
>> 
>> I double checked the dumped intermediate file for pass “dom3", and
>> located the following for _152:
>> 
>> BEFORE the pass “dom3”, there is no _152, the corresponding Block
>> looks like:
>> 
>>  [local count: 12992277]:
>> _98 = (int) ufcMSR_52(D);
>> k_105 = (sword) ufcMSR_52(D);
>> i_49 = _98 > 0 ? k_105 : 0;
>> 
>> ***During the pass “doms”,  _152 is generated as following:
>> 
>> Optimizing block #4
>> ….
>> Visiting statement:
>> i_49 = _98 > 0 ? k_105 : 0;
>> Meeting
>> [0, 65535]
>> and
>> [0, 0]
>> to
>> [0, 65535]
>> Intersecting
>> [0, 65535]
>> and
>> [0, 65535]
>> to
>> [0, 65535]
>> Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
>> Replaced 'k_105' with variable '_98'
>> gimple_simplified to _152 = MAX_EXPR <_98, 0>;
>> i_49 = _152;
>> Folded to: i_49 = _152;
>> LKUP STMT i_49 = _152
>>  ASGN i_49 = _152
>> 
>> then bb 4 becomes:
>> 
>>  [local count: 12992277]:
>> _98 = (int) ufcMSR_52(D);
>> k_105 = _98;
>> _152 = MAX_EXPR <_98, 0>;
>> i_49 = _152;
>> 
>> and all the i_49 are replaced with _152. 
>> 
>> However, the value range info for _152 doesnot reflect the one for
>> i_49, it keeps as UNDEFINED. 
>> 
>> is this the root problem?  
> 
> It looks like DOM fails to visit stmts generated by simplification. Can you 
> open a bug report with a testcase?

The problem is, It took me quite some time in order to come up with a small and 
independent testcase for this problem,
a little bit change made the error disappear.  

do you have any suggestion on this?  or can you give me some hint on how to fix 
this in DOM?  then I can try the fix on my side?

Thanks a lot.

Qing


> 
> Richard. 
> 



Re: A bug in vrp_meet?

2019-03-04 Thread Qing Zhao
Richard,

thanks a lot for your suggested fix. 

I will try it.

Qing
> On Mar 4, 2019, at 5:45 AM, Richard Biener  wrote:
> 
> On Fri, Mar 1, 2019 at 10:02 PM Qing Zhao  wrote:
>> 
>> 
>> On Mar 1, 2019, at 1:25 PM, Richard Biener  
>> wrote:
>> 
>> On March 1, 2019 6:49:20 PM GMT+01:00, Qing Zhao  
>> wrote:
>> 
>> Jeff,
>> 
>> thanks a lot for the reply.
>> 
>> this is really helpful.
>> 
>> I double checked the dumped intermediate file for pass “dom3", and
>> located the following for _152:
>> 
>> BEFORE the pass “dom3”, there is no _152, the corresponding Block
>> looks like:
>> 
>>  [local count: 12992277]:
>> _98 = (int) ufcMSR_52(D);
>> k_105 = (sword) ufcMSR_52(D);
>> i_49 = _98 > 0 ? k_105 : 0;
>> 
>> ***During the pass “doms”,  _152 is generated as following:
>> 
>> Optimizing block #4
>> ….
>> Visiting statement:
>> i_49 = _98 > 0 ? k_105 : 0;
>> Meeting
>> [0, 65535]
>> and
>> [0, 0]
>> to
>> [0, 65535]
>> Intersecting
>> [0, 65535]
>> and
>> [0, 65535]
>> to
>> [0, 65535]
>> Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
>> Replaced 'k_105' with variable '_98'
>> gimple_simplified to _152 = MAX_EXPR <_98, 0>;
>> i_49 = _152;
>> Folded to: i_49 = _152;
>> LKUP STMT i_49 = _152
>>  ASGN i_49 = _152
>> 
>> then bb 4 becomes:
>> 
>>  [local count: 12992277]:
>> _98 = (int) ufcMSR_52(D);
>> k_105 = _98;
>> _152 = MAX_EXPR <_98, 0>;
>> i_49 = _152;
>> 
>> and all the i_49 are replaced with _152.
>> 
>> However, the value range info for _152 doesnot reflect the one for
>> i_49, it keeps as UNDEFINED.
>> 
>> is this the root problem?
>> 
>> 
>> It looks like DOM fails to visit stmts generated by simplification. Can you 
>> open a bug report with a testcase?
>> 
>> 
>> The problem is, It took me quite some time in order to come up with a small 
>> and independent testcase for this problem,
>> a little bit change made the error disappear.
>> 
>> do you have any suggestion on this?  or can you give me some hint on how to 
>> fix this in DOM?  then I can try the fix on my side?
> 
> I remember running into similar issues in the past where I tried to
> extract temporary nonnull ranges from divisions.
> I have there
> 
> @@ -1436,11 +1436,16 @@ dom_opt_dom_walker::before_dom_children
>   m_avail_exprs_stack->pop_to_marker ();
> 
>   edge taken_edge = NULL;
> -  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -{
> -  evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), false);
> -  taken_edge = this->optimize_stmt (bb, gsi);
> -}
> +  gsi = gsi_start_bb (bb);
> +  if (!gsi_end_p (gsi))
> +while (1)
> +  {
> +   evrp_range_analyzer.record_def_ranges_from_stmt (gsi_stmt (gsi), 
> false);
> +   taken_edge = this->optimize_stmt (bb, &gsi);
> +   if (gsi_end_p (gsi))
> + break;
> +   evrp_range_analyzer.record_use_ranges_from_stmt (gsi_stmt (gsi));
> +  }
> 
>   /* Now prepare to process dominated blocks.  */
>   record_edge_info (bb);
> 
> OTOH the issue in your case is that fold emits new stmts before gsi but the
> above loop will never look at them.  See tree-ssa-forwprop.c for code how
> to deal with this (setting a pass-local flag on stmts visited and walking back
> to unvisited, newly inserted ones).  The fold_stmt interface could in theory
> also be extended to insert new stmts on a sequence passed to it so the
> caller would be responsible for inserting them into the IL and could then
> more easily revisit them (but that's a bigger task).
> 
> So, does the following help?
> 
> Index: gcc/tree-ssa-dom.c
> ===
> --- gcc/tree-ssa-dom.c  (revision 269361)
> +++ gcc/tree-ssa-dom.c  (working copy)
> @@ -1482,8 +1482,25 @@ dom_opt_dom_walker::before_dom_children
>   edge taken_edge = NULL;
>   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> {
> +  gimple_stmt_iterator pgsi = gsi;
> +  gsi_prev (&pgsi);
>   evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), false);
>   taken_edge = this->optimize_stmt (bb, gsi);
> +  gimple_stmt_iterator npgsi = gsi;
> +  gsi_prev (&npgsi);
> +  /* Walk new stmts eventually inserted by DOM.  gsi_stmt (gsi) itself
> +while it may be changed should not have gotten a new definition.  */
> +  if (gsi_stmt (pgsi) != gsi_stmt (npgsi))
> +   do
> + {
> +   if (gsi_end_p (pgsi))
> + pgsi = gsi_start_bb (bb);
> +   else
> + gsi_next (&pgsi);
> +   evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (pgsi),
> +false);
> + }
> +   while (gsi_stmt (pgsi) != gsi_stmt (gsi));
> }
> 
>   /* Now prepare to process dominated blocks.  */
> 
> 
> Richard.
> 
>> Thanks a lot.
>> 
>> Qing
>> 
>> 
>> 
>> Richard.
>> 
>> 



Re: A bug in vrp_meet?

2019-03-04 Thread Qing Zhao
Hi, Richard,

> On Mar 4, 2019, at 5:45 AM, Richard Biener  wrote:
>> 
>> It looks like DOM fails to visit stmts generated by simplification. Can you 
>> open a bug report with a testcase?
>> 
>> 
>> The problem is, It took me quite some time in order to come up with a small 
>> and independent testcase for this problem,
>> a little bit change made the error disappear.
>> 
>> do you have any suggestion on this?  or can you give me some hint on how to 
>> fix this in DOM?  then I can try the fix on my side?
> 
> I remember running into similar issues in the past where I tried to
> extract temporary nonnull ranges from divisions.
> I have there
> 
> @@ -1436,11 +1436,16 @@ dom_opt_dom_walker::before_dom_children
>   m_avail_exprs_stack->pop_to_marker ();
> 
>   edge taken_edge = NULL;
> -  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -{
> -  evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), false);
> -  taken_edge = this->optimize_stmt (bb, gsi);
> -}
> +  gsi = gsi_start_bb (bb);
> +  if (!gsi_end_p (gsi))
> +while (1)
> +  {
> +   evrp_range_analyzer.record_def_ranges_from_stmt (gsi_stmt (gsi), 
> false);
> +   taken_edge = this->optimize_stmt (bb, &gsi);
> +   if (gsi_end_p (gsi))
> + break;
> +   evrp_range_analyzer.record_use_ranges_from_stmt (gsi_stmt (gsi));
> +  }
> 
>   /* Now prepare to process dominated blocks.  */
>   record_edge_info (bb);
> 
> OTOH the issue in your case is that fold emits new stmts before gsi but the
> above loop will never look at them.  See tree-ssa-forwprop.c for code how
> to deal with this (setting a pass-local flag on stmts visited and walking back
> to unvisited, newly inserted ones).  The fold_stmt interface could in theory
> also be extended to insert new stmts on a sequence passed to it so the
> caller would be responsible for inserting them into the IL and could then
> more easily revisit them (but that's a bigger task).
> 
> So, does the following help?

Yes, this change fixed the error in my side, now, in the dumped file for pass 
dom3:


Visiting statement:
i_49 = _98 > 0 ? k_105 : 0;
Meeting
  [0, 65535]
and
  [0, 0]
to
  [0, 65535]
Intersecting
  [0, 65535]
and
  [0, 65535]
to
  [0, 65535]
Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
  Replaced 'k_105' with variable '_98'
gimple_simplified to _152 = MAX_EXPR <_98, 0>;
i_49 = _152;
  Folded to: i_49 = _152;
LKUP STMT i_49 = _152
 ASGN i_49 = _152

Visiting statement:
_152 = MAX_EXPR <_98, 0>;

Visiting statement:
i_49 = _152;
Intersecting
  [0, 65535]  EQUIVALENCES: { _152 } (1 elements)
and
  [0, 65535]
to
  [0, 65535]  EQUIVALENCES: { _152 } (1 elements)


We can clearly see from the above, all the new stmts generated by fold are 
visited now. 

it is also confirmed that the runtime error caused by this bug was gone with 
this fix.

So, what’s the next step for this issue?

will you commit this fix to gcc9 and gcc8  (we need it in gcc8)?

or I can test this fix on my side and commit it to both gcc9 and gcc8?

thanks.

Qing

> 
> Index: gcc/tree-ssa-dom.c
> ===
> --- gcc/tree-ssa-dom.c  (revision 269361)
> +++ gcc/tree-ssa-dom.c  (working copy)
> @@ -1482,8 +1482,25 @@ dom_opt_dom_walker::before_dom_children
>   edge taken_edge = NULL;
>   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> {
> +  gimple_stmt_iterator pgsi = gsi;
> +  gsi_prev (&pgsi);
>   evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), false);
>   taken_edge = this->optimize_stmt (bb, gsi);
> +  gimple_stmt_iterator npgsi = gsi;
> +  gsi_prev (&npgsi);
> +  /* Walk new stmts eventually inserted by DOM.  gsi_stmt (gsi) itself
> +while it may be changed should not have gotten a new definition.  */
> +  if (gsi_stmt (pgsi) != gsi_stmt (npgsi))
> +   do
> + {
> +   if (gsi_end_p (pgsi))
> + pgsi = gsi_start_bb (bb);
> +   else
> + gsi_next (&pgsi);
> +   evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (pgsi),
> +false);
> + }
> +   while (gsi_stmt (pgsi) != gsi_stmt (gsi));
> }
> 
>   /* Now prepare to process dominated blocks.  */
> 
> 
> Richard.
> 
>> Thanks a lot.
>> 
>> Qing
>> 
>> 
>> 
>> Richard.
>> 
>> 



How to build gcc with address sanitizer?

2019-12-09 Thread Qing Zhao
Hello,

When using gcc8.2.1 to build one application, it’s out of memory during “cc1”, 
We suspect that there are some memory leak problem in “cc1”, therefore tried to 
build
Gcc with address sanitizer in order to detect the memory leak during 
compilation. 

However, it took me a lot of time in order to build the gcc (or just “cc1”) 
with address sanitizer. Still cannot do this successfully till now. 

Did anyone do this before? Any documentation on how to do this?

Thanks a lot for any help.

Qing

Does gcc automatically lower optimization level for very large routines?

2019-12-19 Thread Qing Zhao
Hi,

When using GCC to compile a very large routine with -O2, it failed with out of 
memory during run time.  (O1 is Okay)

As I checked within gdb,  when “cc1” was consuming around 95% of the memory,  
it’s at :

(gdb) where 
#0  0x00ddbcb3 in df_chain_create (src=0x631006480f08, 
dst=0x63100f306288) at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2267 
#1  0x001a in df_chain_create_bb_process_use ( 
local_rd=0x7ffc109bfaf0, use=0x63100f306288, top_flag=0) 
at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2441 
#2  0x00dde5a7 in df_chain_create_bb (bb_index=16413) 
at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2490 
#3  0x00ddeaa9 in df_chain_finalize (all_blocks=0x63100097ac28) 
at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2519 
#4  0x00dbe95e in df_analyze_problem (dflow=0x60600027f740, 
blocks_to_consider=0x63100097ac28, postorder=0x7f23761f1800, 
n_blocks=40768) at ../../gcc-8.2.1-20180905/gcc/df-core.c:1179 
#5  0x00dbedac in df_analyze_1 () 
….

The routine that was compiled is very big, has about 119258 lines of code. 
I suspected that GCC’s data flow analysis might not handle very large routine 
very well, consume too much memory, therefore out of memory for very big 
routines. 

Currently, I found one GCC’s source level pragma, 

#pragma GCC optimize ("O1”) 

And added it before the large routine (also added another one #pragma GCC 
reset_options after the routine), this workaround the out of memory issue for 
now.

However, manually locating large routines is time consuming, I am wondering 
whether GCC can automatically detect large routines and lower the optimization 
for those
Routines automatically? Or is there any internal parameters inside GCC’s data 
flow analysis that compute the complexity of the routine, if it’s very big, 
then will turn off
The aggressive analysis automatically?  Or any option provided to end user to 
control the aggressive data flow manually ?


Thanks a lot for any help.

Qing

Re: Does gcc automatically lower optimization level for very large routines?

2019-12-19 Thread Qing Zhao
Hi, Dmitry,

Thanks for the responds. 

Yes, routine size only cannot determine the complexity of the routine. 
Different compiler analysis might have different formula with multiple 
parameters to compute its complexity. 

However, the common issue is: when the complexity of a specific routine for a 
specific compiler analysis exceeds a threshold, the compiler might consume all 
the available memory and abort the compilation. 

Therefore,  in order to avoid the failed compilation due to out of memory, some 
compilers might set a threshold for the complexity of a specific compiler 
analysis (for example, the more aggressive data flow analysis), when the 
threshold is met, the specific aggressive analysis will be turned off for this 
specific routine. Or the optimization level will be lowered for the specific 
routine (and given a warning during compilation time for such adjustment).  

I am wondering whether GCC has such capability? Or any option provided to 
increase or decrease the threshold for some of the common analysis (for 
example, data flow)?

Thanks.

Qing

> On Dec 19, 2019, at 4:50 PM, Dmitry Mikushin  wrote:
> 
> This issue is well-known in research/scientific software. The problem of
> compiler hang or RAM overconsumption is actually not about the routine
> size, but about too complicated control flow. When optimizing, the compiler
> traverses the control flow graph, which may have the misfortune to explode
> in terms of complexity. So you may want to check whether your routine
> heavily deploys nested cascades of "if ... else" or goto-s. That is, the
> routine size is not a good metric to catch this behavior. GCC may rather
> attempt "reversible" strategy of optimizations to stop and undo those that
> get beyond a certain threshold.
> 
> Kind regards,
> - Dmitry.
> 
> 
> чт, 19 дек. 2019 г. в 17:38, Qing Zhao :
> 
>> Hi,
>> 
>> When using GCC to compile a very large routine with -O2, it failed with
>> out of memory during run time.  (O1 is Okay)
>> 
>> As I checked within gdb,  when “cc1” was consuming around 95% of the
>> memory,  it’s at :
>> 
>> (gdb) where
>> #0  0x00ddbcb3 in df_chain_create (src=0x631006480f08,
>>dst=0x63100f306288) at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2267
>> #1  0x001a in df_chain_create_bb_process_use (
>>local_rd=0x7ffc109bfaf0, use=0x63100f306288, top_flag=0)
>>at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2441
>> #2  0x00dde5a7 in df_chain_create_bb (bb_index=16413)
>>at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2490
>> #3  0x00ddeaa9 in df_chain_finalize (all_blocks=0x63100097ac28)
>>at ../../gcc-8.2.1-20180905/gcc/df-problems.c:2519
>> #4  0x00dbe95e in df_analyze_problem (dflow=0x60600027f740,
>>blocks_to_consider=0x63100097ac28, postorder=0x7f23761f1800,
>>n_blocks=40768) at ../../gcc-8.2.1-20180905/gcc/df-core.c:1179
>> #5  0x00dbedac in df_analyze_1 ()
>> ….
>> 
>> The routine that was compiled is very big, has about 119258 lines of code.
>> I suspected that GCC’s data flow analysis might not handle very large
>> routine very well, consume too much memory, therefore out of memory for
>> very big routines.
>> 
>> Currently, I found one GCC’s source level pragma,
>> 
>> #pragma GCC optimize ("O1”)
>> 
>> And added it before the large routine (also added another one #pragma GCC
>> reset_options after the routine), this workaround the out of memory issue
>> for now.
>> 
>> However, manually locating large routines is time consuming, I am
>> wondering whether GCC can automatically detect large routines and lower the
>> optimization for those
>> Routines automatically? Or is there any internal parameters inside GCC’s
>> data flow analysis that compute the complexity of the routine, if it’s very
>> big, then will turn off
>> The aggressive analysis automatically?  Or any option provided to end user
>> to control the aggressive data flow manually ?
>> 
>> 
>> Thanks a lot for any help.
>> 
>> Qing



Re: Does gcc automatically lower optimization level for very large routines?

2019-12-20 Thread Qing Zhao
Thanks a lot for all these help.

So, currently, if GCC compilation aborts due to this reason, what’s the best 
way for the user to resolve it? 
I added “#pragma GCC optimize (“O1”) to the large routine in order to 
workaround this issue.  
Is there other better way to do it?

Is GCC planning to resolve such issue better in the future?

thanks.

Qing

> On Dec 20, 2019, at 5:13 AM, Richard Biener  
> wrote:
> 
> On December 20, 2019 1:41:19 AM GMT+01:00, Jeff Law  <mailto:l...@redhat.com>> wrote:
>> On Thu, 2019-12-19 at 17:06 -0600, Qing Zhao wrote:
>>> Hi, Dmitry,
>>> 
>>> Thanks for the responds. 
>>> 
>>> Yes, routine size only cannot determine the complexity of the
>> routine. Different compiler analysis might have different formula with
>> multiple parameters to compute its complexity. 
>>> 
>>> However, the common issue is: when the complexity of a specific
>> routine for a specific compiler analysis exceeds a threshold, the
>> compiler might consume all the available memory and abort the
>> compilation. 
>>> 
>>> Therefore,  in order to avoid the failed compilation due to out of
>> memory, some compilers might set a threshold for the complexity of a
>> specific compiler analysis (for example, the more aggressive data flow
>> analysis), when the threshold is met, the specific aggressive analysis
>> will be turned off for this specific routine. Or the optimization level
>> will be lowered for the specific routine (and given a warning during
>> compilation time for such adjustment).  
>>> 
>>> I am wondering whether GCC has such capability? Or any option
>> provided to increase or decrease the threshold for some of the common
>> analysis (for example, data flow)?
>>> 
>> There are various places where if we hit a limit, then we throttle
>> optimization.  But it's not done consistently or pervasively.
>> 
>> Those limits are typically around things like CFG complexity.
> 
> Note we also have (not consistently used) -Wmissed-optimizations which is 
> supposed to warn when we run into this kind of limiting telling the user 
> which knob he might be able to tune. 
> 
> Richard. 
> 
>> We do _not_ try to recover after an out of memory error, or anything
>> like that.
>> 
>> jeff



Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-03 Thread Qing Zhao
Hi, 

This is the first time I am asking for a design review for fixing a GCC 
enhancement request, Let me know if I need to send this email to other mailing 
list as well.

I have been studying PR 78809 for some time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809>

with a lot of help from Wilco and other people, and detailed study about the 
previous discussion and current GCC behavior, I was able to come up with the 
following writeup
(basically serve as a design doc), and ready for implementation. 

Please take a look at it, and let me know any comments and suggestions:

thanks a lot.

Qing


str(n)cmp and memcmp optimization in gcc
-- A design document for PR78809

11/01/2017

Qing Zhao
===

0. Summary:

   For PR 78809 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809,
   Will add the following str(n)cmp and memcmp optimizations into GCC 8:

   A. for strncmp (s1, s2, n) 
  if one of "s1" or "s2" is a constant string, "n" is a constant, and 
larger than the length of the constant string:
  change strncmp (s1, s2, n) to strcmp (s1, s2);

   B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0
  if the result is ONLY used to do a simple equality test against zero, one 
of "s1" or "s2" is a small constant string, n is a constant, and the other 
non-constant string is guaranteed to not read beyond the end of the string:
  change strncmp (s1, s2, n) or strcmp (s1, s2) to corresponding memcmp 
(s1, s2, n); 

  (NOTE, currently, memcmp(s1, s2, N) (!)=0 has been optimized to a simple 
sequence to access all bytes and accumulate the overall result in GCC by 
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171
  ) 
  as a result, such str(n)cmp call would like to be replaced by simple 
sequences to access all types and accumulate the overall results. 

   C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n)
  if the result is NOT used to do simple equality test against zero, one of 
"s1" or "s2" is a small constant string, n is a constant, and the Min value of 
the length of the constant string and "n" is smaller than a predefined 
threshold T, 
  inline the call by a byte-to-byte comparision sequence to avoid calling 
overhead. 

   A would like to be put in gimple fold phase (in routine 
"gimple_fold_builtin_string_compare" of gimple-fold.c)
   B would like to be put in strlen phase (add new "handle_builtin_str(n)cmp" 
routines in tree-ssa-strlen.c)
   C would like to be put in expand phase (tree-ssa-strlen.c or builtins.c): 
run-time performance testing is needed to decide the predefined threshold T. 

   The reasons to put the optimizations in the above order are:

   * A needs very simple information from source level, and A should be BEFORE 
B to simplify algorithm in B, gimple-fold phase should be a good place to do 
it. (Currently, similar transformation such as replacing strncat with strcat is 
put in this phase);
   * B needs type information (size and alignment) for the safety checking, and 
B should be BEFORE expand phase to utilize the available memcmp optimization in 
expand phase, strlen phase is a good place to do it. 
   * C would like to replace the call to byte-to-byte comparision, since we 
want some high-level optimization been applied first on the calls, the inlining 
of the call might be better to be done in the late stage of the tree 
optimization stage.  expand phase is a good place to do it. 
 
   These 3 optimization can be implemented seperated, 3 patches might be 
needed. 
 
the following are details to explain the above.

1. some background:

   #include 
   int strcmp(const char *s1, const char *s2);
   int strncmp(const char *s1, const char *s2, size_t n);
   int memcmp(const void *s1, const void *s2, size_t n);

   • strcmp compares null-terminated C strings
   • strncmp compares at most N characters of null-terminated C strings
   • memcmp compares binary byte buffers of N bytes.

The major common part among these three is:

   * they all return an integer less than, equal to, or greater than zero if s1 
is found, respectively, to be less than, to match, or be greater than s2.

The major different part among these three is:

   * both strcmp and strncmp might early stop at NULL terminator of the 
compared strings. but memcmp will NOT early stop, it means to compare exactly N 
bytes of both buffers.
   * strcmp compare the whole string, but strncmp only compare the first n 
chars (or fewer, if the string ends sooner) of the string.  

So, when optimizing memcmp and str(n)cmp, we need to consider the following:

   * The compiler can compare multiple bytes at the same time and doesn't have 
to worry about beyond the end of a string for memcmp, but hav

Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-16 Thread Qing Zhao
Hi, Jeff, 
thanks a lot for your comments. please see my reply in below:


> On Nov 16, 2017, at 12:47 PM, Jeff Law  wrote:
> 
>> 
>>   B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0
>>  if the result is ONLY used to do a simple equality test against zero, 
>> one of "s1" or "s2" is a small constant string, n is a constant, and the 
>> other non-constant string is guaranteed to not read beyond the end of the 
>> string:
>>  change strncmp (s1, s2, n) or strcmp (s1, s2) to corresponding memcmp 
>> (s1, s2, n); 
> So how to you determine the non-constant string is long enough to avoid
> reading past its end?  I guess you might get that from range
> information.  But when it applies it seems reasoanble.  Again, this
> could be considered a canonicalization step.

In my current local implementation, I used the following routine to get the 
range info:  (and use the MINMAXLEN[1]+1 for the length of the non-constant 
string)

/* Determine the minimum and maximum value or string length that ARG
   refers to and store each in the first two elements of MINMAXLEN.
   For expressions that point to strings of unknown lengths that are
   character arrays, use the upper bound of the array as the maximum
   length.  For example, given an expression like 'x ? array : "xyz"'
   and array declared as 'char array[8]', MINMAXLEN[0] will be set
   to 3 and MINMAXLEN[1] to 7, the longest string that could be
   stored in array.
   Return true if the range of the string lengths has been obtained
   from the upper bound of an array at the end of a struct.  Such
   an array may hold a string that's longer than its upper bound
   due to it being used as a poor-man's flexible array member.  */

bool
get_range_strlen (tree arg, tree minmaxlen[2])
{
}

However, this routine currently miss a very obvious case as the following:

char s[100] = {'a','b','c','d’};

__builtin_strcmp(s, "abc") != 0

So, I have to change this routine to include such common case.  

do you think using this routine is good? or do you have other suggestions 
(since I am still not very familiar with the internals of GCC, might not find 
the best available one now…)

> 
> 
>> 
>>   C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n)
>>  if the result is NOT used to do simple equality test against zero, one 
>> of "s1" or "s2" is a small constant string, n is a constant, and the Min 
>> value of the length of the constant string and "n" is smaller than a 
>> predefined threshold T, 
>>  inline the call by a byte-to-byte comparision sequence to avoid calling 
>> overhead. 
> Also seems reasonable.
>> 
>>   A would like to be put in gimple fold phase (in routine 
>> "gimple_fold_builtin_string_compare" of gimple-fold.c

>> OK.  Note that various optimizations can expose N or one of the strings
> to be a constant.  So having it as part of the folders makes a lot of
> sense .

I have finished this part of change and sent the patch to gcc-patch alias 
already. 
https://patchwork.ozlabs.org/patch/838200/ 


Do you think it’s necessary to add the same functionality at other places, such 
as tree-ssa-strlen.c, in order to catch more cases?

> 
>>   B would like to be put in strlen phase (add new "handle_builtin_str(n)cmp" 
>> routines in tree-ssa-strlen.c)
> Which is where you're most likely to have some kind of range information
> which ISTM you need to prove the non-constant string is large enough
> that you're not reading past its end.
Yes,that’s the main reason. another reason is: memcmp != 0 optimization is 
implemented at -O2,  if we want to use this available work,
implement B at strlen phase is better.

> 
>>   C would like to be put in expand phase (tree-ssa-strlen.c or builtins.c): 
>> run-time performance testing is needed to decide the predefined threshold T. 
> If you wanted to put it into expansion, I wouldn't object -- we could
> always do experiments to see if there's any value in moving it early
> (expansion to byte comparisons could possible expose other optimizations).
earlier to where? do you have any suggestion?  I can definitely do some 
experiments. 
> 
> In general I like what you're suggesting.  And on a higher level I like
> that we're looking to rationalize where these kinds of things happen
> (compiler vs library).  It's something I've wanted to see happen for a
> long time.
Part of A and C has been implemented in glibc previously,  Wilco removed them 
from Glibc  at
  https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2 


as I wrote in the writeup:

The reasons to delete it from glibc and move to GCC are:
 ** when doing it in glibc, the user cannot disable it by using 
-fno-builtin-strcmp/strncmp;
 ** __builtin_strncmp cannot be replaced similarly;
 ** when compile C++ or not using glibc, the transformation is not applied.

Thanks a lot.

Qing
> 
> 
> Je

Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-17 Thread Qing Zhao

> On Nov 16, 2017, at 5:55 PM, Martin Sebor  wrote:
>> 
>>   A. for strncmp (s1, s2, n)
>>  if one of "s1" or "s2" is a constant string, "n" is a constant, and 
>> larger than the length of the constant string:
>>  change strncmp (s1, s2, n) to strcmp (s1, s2);
> 
> Here and I think in some (all?) the other cases below, N doesn't
> strictly have to be constant but can be in a range whose lower
> bound is greater than the string length.
Yes, I agree.  If the value range info is available, we can relax N to be an 
expression 
whose value can be determined to be larger than the length of the constant 
string.

Do you know when the value range info is available in GCC, and the interface to 
use it?
> 
> FWIW, I also recently noticed bug 82950 in a related area.

For the opportunities mentioned in PR82950, I also realized during my 
implementation of B in tree-ssa-strlen.c. the major thing missing there is
the “content” of the string is NOT recorded in the strinfo, only the “length” 
of the string is recorded. So, the information is Not completely available 
now to support this opportunity. 

However, for PR 83026, the information should be enough in tree-ssa-strlen.c, I 
can add the support this in tree-ssa-strlen.c by using the length info
of the string recorded in strinfo. 


> 
>> 
>>   B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0
>>  if the result is ONLY used to do a simple equality test against zero, 
>> one of "s1" or "s2" is a small constant string, n is a constant, and the 
>> other non-constant string is guaranteed to not read beyond the end of the 
>> string:
>>  change strncmp (s1, s2, n) or strcmp (s1, s2) to corresponding memcmp 
>> (s1, s2, n);
>> 
> 
> It's probably not important but I'm not sure I understand what you
> mean by
> 
>  "the other non-constant string is guaranteed to not read beyond
>  the end of the string.”

We need to be sure that we can read valid memory from the non-constant string 
after the null-character when change to corresponding memcmp. 

> 
>>  (NOTE, currently, memcmp(s1, s2, N) (!)=0 has been optimized to a 
>> simple sequence to access all bytes and accumulate the overall result in GCC 
>> by
>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171
>>  )
>>  as a result, such str(n)cmp call would like to be replaced by simple 
>> sequences to access all types and accumulate the overall results.
> 
> Thinking about this case made me realize there's another opportunity
> here that could be exploited in tree-ssa-strlen even for non-constant
> strings by relying on its knowledge of string lengths.   I opened bug
> 83026 with the details.

Yes, I agree. and will add this in my implementation in Part B. should be very 
straightforward to implement. 

> 
>> 
>>   C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n)
>>  if the result is NOT used to do simple equality test against zero, one 
>> of "s1" or "s2" is a small constant string, n is a constant, and the Min 
>> value of the length of the constant string and "n" is smaller than a 
>> predefined threshold T,
>>  inline the call by a byte-to-byte comparision sequence to avoid calling 
>> overhead.
> 
> The str{,n}cmp and memcmp cases must be treated differently because
> the former compares up to N bytes while the latter exactly N bytes.
> With that, I'm wondering if the precondition "one of "s1" or "s2"
> is a small constant string" is necessary for memcmp.  I.e., why
> not inline it regardless of whether s1 or s2 are constant?

I agree. memcmp can ONLY check on “n”, if its small enough, should be inlined.  

in addition to this, for strncmp,  if both s1 and s2 are NOT constant, but if 
“n” is small enough, we can inline it too, right? 
> 
>> 
>>   A would like to be put in gimple fold phase (in routine 
>> "gimple_fold_builtin_string_compare" of gimple-fold.c)
> 
> Except for constant strings gimple-fold doesn't know about string
> lengths so it will only be able to do so much.  I'm wondering what
> the benefits are of doing the transformation there instead of just
> in tree-ssa-strlen (along with B).

As Jeff mentioned in another email, 

“Most passes call into the folder when they make noteable changes to a
statement.  So if a later pass exposes one argument as a string constant
your code for "A" should get a chance to fold the call.”

We can see whether this is enough or not, If not enough, adding  A to B is very 
easy thing to do.

thanks a lot for your comments and suggestions.

Qing




Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-17 Thread Qing Zhao

> On Nov 16, 2017, at 6:24 PM, Martin Sebor  wrote:
>> 
>> In my current local implementation, I used the following routine to get the 
>> range info:  (and use the MINMAXLEN[1]+1 for the length of the non-constant 
>> string)
>> 
>> /* Determine the minimum and maximum value or string length that ARG
>>   refers to and store each in the first two elements of MINMAXLEN.
>>   For expressions that point to strings of unknown lengths that are
>>   character arrays, use the upper bound of the array as the maximum
>>   length.  For example, given an expression like 'x ? array : "xyz"'
>>   and array declared as 'char array[8]', MINMAXLEN[0] will be set
>>   to 3 and MINMAXLEN[1] to 7, the longest string that could be
>>   stored in array.
>>   Return true if the range of the string lengths has been obtained
>>   from the upper bound of an array at the end of a struct.  Such
>>   an array may hold a string that's longer than its upper bound
>>   due to it being used as a poor-man's flexible array member.  */
>> 
>> bool
>> get_range_strlen (tree arg, tree minmaxlen[2])
>> {
>> }
>> 
>> However, this routine currently miss a very obvious case as the following:
>> 
>> char s[100] = {'a','b','c','d’};
>> 
>> __builtin_strcmp(s, "abc") != 0
>> 
>> So, I have to change this routine to include such common case.
> 
> There was a discussion some time ago about converting CONSTRUCTOR
> trees emitted for array initializers like the above to STRING_CST
> (see bug 71625 for some background).  I think that would still be
> the ideal solution.  Then you wouldn't have to change
> get_range_strlen.

thanks for the info, Martin.

In my case, it’s the size of “100” cannot be collected in the MINMAXLEN[1] for 
the string “s”. 

I need to make sure that the size of variable string s is larger than the size 
of constant string “abc” to guarantee the safety of the transformation.

currently, “get_range_strlen” cannot identify the simple VAR_DECL with 
array_type to determine the maximum size of the string. 

Qing
> 
> Martin



Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-17 Thread Qing Zhao
Hi, Jeff,

> On Nov 16, 2017, at 7:14 PM, Jeff Law  wrote:
>> 
>> In my current local implementation, I used the following routine to get
>> the range info:  (and use the MINMAXLEN[1]+1 for the length of the
>> non-constant string)
>> 
>> /* Determine the minimum and maximum value or string length that ARG
>>refers to and store each in the first two elements of MINMAXLEN.
>>For expressions that point to strings of unknown lengths that are
>>character arrays, use the upper bound of the array as the maximum
>>length.  For example, given an expression like 'x ? array : "xyz"'
>>and array declared as 'char array[8]', MINMAXLEN[0] will be set
>>to 3 and MINMAXLEN[1] to 7, the longest string that could be
>>stored in array.
>>Return true if the range of the string lengths has been obtained
>>from the upper bound of an array at the end of a struct.  Such
>>an array may hold a string that's longer than its upper bound
>>due to it being used as a poor-man's flexible array member.  */
>> 
>> bool
>> get_range_strlen (tree arg, tree minmaxlen[2])
>> {
>> }
>> 
>> However, this routine currently miss a very obvious case as the following:
>> 
>> char s[100] = {'a','b','c','d’};
>> 
>> __builtin_strcmp(s, "abc") != 0
>> 
>> So, I have to change this routine to include such common case.  
>> 
>> do you think using this routine is good? or do you have other
>> suggestions (since I am still not very familiar with the internals of
>> GCC, might not find the best available one now…)
> The range information attached to an SSA_NAME is global data.  ie, it
> must hold at all locations where the object in question might be
> referenced.  This implies that it will sometimes (often?) be less
> precise than you might like.

do you mean the “value_range” attached to SSA_NAME?

For my purpose, I’d like to get the maximum length of char array s[100] is 100, 
which is larger than the size of constant string “abc”, then
I can safely apply the transformation to memcmp. 

can “value_range” info serve this purpose?

> 
> I am currently working towards an embeddable context sensitive range
> analyzer that in theory could be used within tree-ssa-strlen pass to
> give more precise range information.  I'm hoping to wrap that work up in
> the next day or so so that folks can use it in gcc-8.

such context sensitive range info should be useful when we relax the constant 
“N” to be an expression whose Min value is larger than the length
of constant string, with it, we can catch more opportunities. 
let me know when this info is available.

thanks.

Qing

> 
> 
> 
> 
>> 
>>> 
>>> 
 
   C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n)
  if the result is NOT used to do simple equality test against
 zero, one of "s1" or "s2" is a small constant string, n is a
 constant, and the Min value of the length of the constant string and
 "n" is smaller than a predefined threshold T, 
  inline the call by a byte-to-byte comparision sequence to avoid
 calling overhead. 
>>> Also seems reasonable.
 
   A would like to be put in gimple fold phase (in routine
 "gimple_fold_builtin_string_compare" of gimple-fold.c
>> 
 OK.  Note that various optimizations can expose N or one of the strings
>>> to be a constant.  So having it as part of the folders makes a lot of
>>> sense .
>> 
>> I have finished this part of change and sent the patch to gcc-patch
>> alias already. 
>> https://patchwork.ozlabs.org/patch/838200/
>> 
>> Do you think it’s necessary to add the same functionality at other
>> places, such as tree-ssa-strlen.c, in order to catch more cases?
> For "A"?  No, I think having it in the gimple folder is fine.  Most
> passes call into the folder when they make noteable changes to a
> statement.  So if a later pass exposes one argument as a string constant
> your code for "A" should get a chance to fold the call.
> 
>>> 
   B would like to be put in strlen phase (add new
 "handle_builtin_str(n)cmp" routines in tree-ssa-strlen.c)
>>> Which is where you're most likely to have some kind of range information
>>> which ISTM you need to prove the non-constant string is large enough
>>> that you're not reading past its end.
>> Yes,that’s the main reason. another reason is: memcmp != 0 optimization
>> is implemented at -O2,  if we want to use this available work,
>> implement B at strlen phase is better.
> Noted.
> 
>> 
>>> 
   C would like to be put in expand phase (tree-ssa-strlen.c or
 builtins.c): run-time performance testing is needed to decide the
 predefined threshold T. 
>>> If you wanted to put it into expansion, I wouldn't object -- we could
>>> always do experiments to see if there's any value in moving it early
>>> (expansion to byte comparisons could possible expose other optimizations).
>> earlier to where? do you have any suggestion?  I can definitely do some
>> experiments. 
> The biggest question in my mind is what se

Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-17 Thread Qing Zhao

 
   A would like to be put in gimple fold phase (in routine
 "gimple_fold_builtin_string_compare" of gimple-fold.c
>> 
 OK.  Note that various optimizations can expose N or one of the strings
>>> to be a constant.  So having it as part of the folders makes a lot of
>>> sense .
>> 
>> I have finished this part of change and sent the patch to gcc-patch
>> alias already. 
>> https://patchwork.ozlabs.org/patch/838200/
>> 
>> Do you think it’s necessary to add the same functionality at other
>> places, such as tree-ssa-strlen.c, in order to catch more cases?
> For "A"?  No, I think having it in the gimple folder is fine.  Most
> passes call into the folder when they make noteable changes to a
> statement.  So if a later pass exposes one argument as a string constant
> your code for "A" should get a chance to fold the call.

Okay. 
> 
>>> 
   B would like to be put in strlen phase (add new
 "handle_builtin_str(n)cmp" routines in tree-ssa-strlen.c)
>>> Which is where you're most likely to have some kind of range information
>>> which ISTM you need to prove the non-constant string is large enough
>>> that you're not reading past its end.
>> Yes,that’s the main reason. another reason is: memcmp != 0 optimization
>> is implemented at -O2,  if we want to use this available work,
>> implement B at strlen phase is better.
> Noted.
> 
>> 
>>> 
   C would like to be put in expand phase (tree-ssa-strlen.c or
 builtins.c): run-time performance testing is needed to decide the
 predefined threshold T. 
>>> If you wanted to put it into expansion, I wouldn't object -- we could
>>> always do experiments to see if there's any value in moving it early
>>> (expansion to byte comparisons could possible expose other optimizations).
>> earlier to where? do you have any suggestion?  I can definitely do some
>> experiments. 
> The biggest question in my mind is what secondary opportunities arise
> when we expose the byte comparisons.  So things like if-conversion
> if-combination, propagation of equality, particularly in the single byte
> case, etc.

make sense to me.

> 
> The difficulty is tracking when exposure leads to these secondary
> opportunities.  I often end up looking for this kind of stuff by first
> identifying many source files where the transformation applies.  Then I
> generate dumps & assembly code for the transformation in each candidate
> location.  I first analyze the assembly files for differences.  If an
> assembly file shows a difference, then I look more closely to try and
> characterize the difference and if it looks interesting, then I work
> backwards to the dump files for more details.

usually, what kind of benchmarks or test cases are you recommending? 
I used to work a lot with SPEC, not sure whether that’s the same for working in 
GCC?

>>> 
>>> In general I like what you're suggesting.  And on a higher level I like
>>> that we're looking to rationalize where these kinds of things happen
>>> (compiler vs library).  It's something I've wanted to see happen for a
>>> long time.
>> Part of A and C has been implemented in glibc previously,  Wilco removed
>> them from Glibc  at
>>   https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2
> I know :-)  I loosely watch the glibc lists too and have expressed
> support for the glibc's team's effort to move transformations to the
> points where they make the most sense.

Yes, moving them to GCC makes good sense.

thanks.

Qing
> 
> jeff



Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-17 Thread Qing Zhao

> On Nov 17, 2017, at 1:50 AM, Jakub Jelinek  wrote:
> 
> On Thu, Nov 16, 2017 at 06:14:35PM -0700, Jeff Law wrote:
>>> However, this routine currently miss a very obvious case as the following:
>>> 
>>> char s[100] = {'a','b','c','d’};
>>> 
>>> __builtin_strcmp(s, "abc") != 0
>>> 
>>> So, I have to change this routine to include such common case.  
>>> 
>>> do you think using this routine is good? or do you have other
>>> suggestions (since I am still not very familiar with the internals of
>>> GCC, might not find the best available one now…)
>> The range information attached to an SSA_NAME is global data.  ie, it
>> must hold at all locations where the object in question might be
>> referenced.  This implies that it will sometimes (often?) be less
>> precise than you might like.
>> 
>> I am currently working towards an embeddable context sensitive range
>> analyzer that in theory could be used within tree-ssa-strlen pass to
>> give more precise range information.  I'm hoping to wrap that work up in
>> the next day or so so that folks can use it in gcc-8.
> 
> Well, one thing is a range of integral SSA_NAME at a certain point, the
> other is the length of the C string pointed by a certain pointer (that is
> something the strlen pass tracks), and another case is the content of that
> string, which you'd presumably need to optimize the strcmp at compile time.
> I think current strlen pass ought to find out that strlen (s) is 4 at that
> point, if it doesn't, please file a PR with a testcase.

for the safety checking purpose, when we try to convert

__builtin_strcmp(s, "abc") != 0

to 

__builtin_memcmp (s, “abc”, 4) != 0

we have to make sure that the size of variable “s” is larger than “4”. 

if  “s” is declared as

char s[100];

currently,  the “get_range_strlen” cannot determine its maximum length is 100. 
(it just return UNKNOWN).

so, I have to update “get_range_strlen” for such simple case. 

this does provide the information I want.  However, since the routine 
“get_range_strlen” is also used in other places, 
for example, in gimple-ssa-sprintf.c,  the implementation of the sprintf 
overflow warning uses the routine “get_range_strlen” 
to decide the string’s maximum size and buffer size. 

my change in “get_range_strlen” triggered some new warnings for  
-Werror=format-overflow (from gimple-ssa-sprintf.c
mentioned above) as following:

qinzhao@gcc116:~/Bugs/warning$ cat t.c
#include 

void foo(const char *macro)
{
  char buf1[256], buf2[256];
  sprintf (buf1, "%s=%s", macro, buf2);
  return;
}

with my private GCC:

qinzhao@gcc116:~/Bugs/warning$ /home/qinzhao/Install/latest/bin/gcc t.c 
-Werror=format-overflow -S
t.c: In function ‘foo’:
t.c:6:18: error: ‘sprintf’ may write a terminating nul past the end of the 
destination [-Werror=format-overflow=]
   sprintf (buf1, "%s=%s", macro, buf2);
  ^~~
t.c:6:3: note: ‘sprintf’ output 2 or more bytes (assuming 257) into a 
destination of size 256
   sprintf (buf1, "%s=%s", macro, buf2);
   ^~~~
cc1: some warnings being treated as errors

At this time, I am really not sure whether it’s good to expose such new 
warnings with my change? even though after some study, 
I think that these new warning are correct warnings, maybe we should keep? 

let me know your comments and suggestions.

thanks a lot.

Qing

>   Jakub



Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-20 Thread Qing Zhao

> On Nov 17, 2017, at 5:54 PM, Martin Sebor  wrote:
> 
>> 
>> for the safety checking purpose, when we try to convert
>> 
>> __builtin_strcmp(s, "abc") != 0
>> 
>> to
>> 
>> __builtin_memcmp (s, “abc”, 4) != 0
>> 
>> we have to make sure that the size of variable “s” is larger than “4”.
> 
> Presumably you mean "is at least 4?”

Yes.:-)
> 
>> 
>> if  “s” is declared as
>> 
>> char s[100];
>> 
>> currently,  the “get_range_strlen” cannot determine its maximum length is 
>> 100. (it just return UNKNOWN).
>> 
>> so, I have to update “get_range_strlen” for such simple case.
>> 
>> this does provide the information I want.  However, since the routine 
>> “get_range_strlen” is also used in other places,
>> for example, in gimple-ssa-sprintf.c,  the implementation of the sprintf 
>> overflow warning uses the routine “get_range_strlen”
>> to decide the string’s maximum size and buffer size.
>> 
>> my change in “get_range_strlen” triggered some new warnings for  
>> -Werror=format-overflow (from gimple-ssa-sprintf.c
>> mentioned above) as following:
>> 
>> qinzhao@gcc116:~/Bugs/warning$ cat t.c
>> #include 
>> 
>> void foo(const char *macro)
>> {
>>  char buf1[256], buf2[256];
>>  sprintf (buf1, "%s=%s", macro, buf2);
>>  return;
>> }
>> 
>> with my private GCC:
>> 
>> qinzhao@gcc116:~/Bugs/warning$ /home/qinzhao/Install/latest/bin/gcc t.c 
>> -Werror=format-overflow -S
>> t.c: In function ‘foo’:
>> t.c:6:18: error: ‘sprintf’ may write a terminating nul past the end of the 
>> destination [-Werror=format-overflow=]
>>   sprintf (buf1, "%s=%s", macro, buf2);
>>  ^~~
>> t.c:6:3: note: ‘sprintf’ output 2 or more bytes (assuming 257) into a 
>> destination of size 256
>>   sprintf (buf1, "%s=%s", macro, buf2);
>>   ^~~~
>> cc1: some warnings being treated as errors
> 
> When the length of one or more of the strings referenced by
> the argument passed to get_range_strlen() is unknown
> the -Wformat-overflow checker uses get_range_strlen() to compute
> the length of the longest string that fits in an array reference
> by the subexpression (i.e., sizeof array - 1) and uses it to
> issue warnings.  This works with member arrays but because of
> a bug/limitation it doesn't work for non-member arrays.  Bug
> 79538 tracks this.  So the warning above suggests your change
> has fixed the problem -- good work! :)

really thanks for the info and bug id.

I just checked the 2 testing cases in PR 79538, with my private GCC, both of 
the warnings are reported. 
I am assign this bug to myself too.

Qing
> 
> Martin



Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-20 Thread Qing Zhao

> On Nov 17, 2017, at 7:32 PM, Jeff Law  wrote:
> 
> On 11/17/2017 03:45 PM, Qing Zhao wrote:
>>>> do you think using this routine is good? or do you have other
>>>> suggestions (since I am still not very familiar with the internals of
>>>> GCC, might not find the best available one now…)
>>> The range information attached to an SSA_NAME is global data.  ie, it
>>> must hold at all locations where the object in question might be
>>> referenced.  This implies that it will sometimes (often?) be less
>>> precise than you might like.
>> 
>> do you mean the “value_range” attached to SSA_NAME?
>> 
>> For my purpose, I’d like to get the maximum length of char array s[100] is 
>> 100, which is larger than the size of constant string “abc”, then
>> I can safely apply the transformation to memcmp. 
>> 
>> can “value_range” info serve this purpose?
> No it can't.  Sorry for leading you the wrong direction.  What you're
> looking for is the object size interfaces.
> 
> See tree-object-size.[ch]
> 
> That's a pass that tries to compute the sizes of various objects
> referenced by the IL.
> 
> Note that the object size is different than say the length of a string
> stored in an object for which you'll probably be looking at
> tree-ssa-strlen's interfaces.

thanks for the info. 

Yes, during my current implementation for B, I tried to use the interface for 
“tree-object-size”, i.e, “compute_builtin_object_size”
to decide the maximum length of the arrays, I noticed that it did not provide 
the information for the simple cases. then I switched to use 
“get_range_strlen”.

I will double check on this. 

> Ranges are more for integer objects.  ie, i has the value [0,25] or ~[0,0].

Okay, I see.
> 
>>> 
>>> I am currently working towards an embeddable context sensitive range
>>> analyzer that in theory could be used within tree-ssa-strlen pass to
>>> give more precise range information.  I'm hoping to wrap that work up in
>>> the next day or so so that folks can use it in gcc-8.
>> 
>> such context sensitive range info should be useful when we relax the 
>> constant “N” to be an expression whose Min value is larger than the length
>> of constant string, with it, we can catch more opportunities. 
>> let me know when this info is available.
> Hoping to have the basics into the trunk within the next few days as
> reviews flow in.

thanks.

Qing
> 
> jeff



Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

2017-11-20 Thread Qing Zhao

> On Nov 17, 2017, at 7:39 PM, Jeff Law  wrote:
> 
>>> 
>> 
>> thanks for the info, Martin.
>> 
>> In my case, it’s the size of “100” cannot be collected in the
>> MINMAXLEN[1] for the string “s”. 
>> 
>> I need to make sure that the size of variable string s is larger than
>> the size of constant string “abc” to guarantee the safety of the
>> transformation.
>> 
>> currently, “get_range_strlen” cannot identify the simple VAR_DECL with
>> array_type to determine the maximum size of the string. 
> It sounds more like you want the object_size interfaces.  See
> tree-object-size.[ch]

As I mentioned in the other email, I tried the interface of object_size, i.e, 
“compute_builtin_object_size” to see whether it
can provide me good info on several simple examples, at that time, I didn’t see 
that it can provide good info for 
very simple cases, so I switched to use “get_range_strlen” and modified it to 
serve my purpose. 

I will double check on this.

Qing
> Jeff


Question on -fopt-info-inline

2018-07-03 Thread Qing Zhao
Hi,

In order to collect complete information on all the inlining transformation 
that GCC applies on a given program,
I searched online, and found that the option -fopt-info-inline might be the 
right option to use:

https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 


in which, it mentioned:

"As another example,
gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
outputs information about missed optimizations as well as optimized locations 
from all the inlining passes into inline.txt. 

“

Then I checked a very small testcase with GCC9 as following:

[qinzhao@localhost inline_report]$ cat inline_1.c
static int foo (int a)
{
  return a + 10;
}

static int bar (int b)
{
  return b - 20;
}

static int boo (int a, int b)
{
  return foo (a) + bar (b);
}

extern int v_a, v_b;
extern int result;

int compute ()
{
  result = boo (v_a, v_b);
  return result; 
}

[qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc -O3 
-fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
[qinzhao@localhost inline_report]$ ls -l inline.txt
-rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
[qinzhao@localhost inline_report]$ cat inline_1.s
.file   "inline_1.c"
.text
.p2align 4,,15
.globl  compute
.type   compute, @function
compute:
.LFB3:
.cfi_startproc
movlv_a(%rip), %edx
movlv_b(%rip), %eax
leal-10(%rdx,%rax), %eax
movl%eax, result(%rip)
ret
.cfi_endproc
.LFE3:
.size   compute, .-compute
.ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
.section.note.GNU-stack,"",@progbits

From the above, we can see:
1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute” are 
completely inlined into “compute”;
2. However, there is NO any inline information is dumped into “inline.txt”.


So, My questions are:

1. Is the option -fopt-info-inline  the right option to use to get the complete 
inlining transformation info from GCC?
2. is this a bug that the current -fopt-info-inline cannot dump anything for 
this testing case?


Thanks a lot for your help.

Qing

Re: Question on -fopt-info-inline

2018-07-03 Thread Qing Zhao


> On Jul 3, 2018, at 11:48 AM, Richard Biener  
> wrote:
> 
> On July 3, 2018 6:01:19 PM GMT+02:00, Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> Hi,
>> 
>> In order to collect complete information on all the inlining
>> transformation that GCC applies on a given program,
>> I searched online, and found that the option -fopt-info-inline might be
>> the right option to use:
>> 
>> https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 
>> <https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html>
>> <https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 
>> <https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html>>
>> 
>> in which, it mentioned:
>> 
>> "As another example,
>> gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
>> outputs information about missed optimizations as well as optimized
>> locations from all the inlining passes into inline.txt. 
>> 
>> “
>> 
>> Then I checked a very small testcase with GCC9 as following:
>> 
>> [qinzhao@localhost inline_report]$ cat inline_1.c
>> static int foo (int a)
>> {
>> return a + 10;
>> }
>> 
>> static int bar (int b)
>> {
>> return b - 20;
>> }
>> 
>> static int boo (int a, int b)
>> {
>> return foo (a) + bar (b);
>> }
>> 
>> extern int v_a, v_b;
>> extern int result;
>> 
>> int compute ()
>> {
>> result = boo (v_a, v_b);
>> return result; 
>> }
>> 
>> [qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc
>> -O3 -fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
>> [qinzhao@localhost inline_report]$ ls -l inline.txt
>> -rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
>> [qinzhao@localhost inline_report]$ cat inline_1.s
>>  .file   "inline_1.c"
>>  .text
>>  .p2align 4,,15
>>  .globl  compute
>>  .type   compute, @function
>> compute:
>> .LFB3:
>>  .cfi_startproc
>>  movlv_a(%rip), %edx
>>  movlv_b(%rip), %eax
>>  leal-10(%rdx,%rax), %eax
>>  movl%eax, result(%rip)
>>  ret
>>  .cfi_endproc
>> .LFE3:
>>  .size   compute, .-compute
>>  .ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
>>  .section.note.GNU-stack,"",@progbits
>> 
>> From the above, we can see:
>> 1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute”
>> are completely inlined into “compute”;
>> 2. However, there is NO any inline information is dumped into
>> “inline.txt”.
>> 
>> 
>> So, My questions are:
>> 
>> 1. Is the option -fopt-info-inline  the right option to use to get the
>> complete inlining transformation info from GCC?
>> 2. is this a bug that the current -fopt-info-inline cannot dump
>> anything for this testing case?
> 
> I think the early inliner doesn't use opt-info yet. 

so, shall we add the opt-info support to early inliner?

Qing
> 
> Richard. 
> 
>> 
>> Thanks a lot for your help.
>> 
>> Qing



Re: Question on -fopt-info-inline

2018-07-03 Thread Qing Zhao


>> 
>>> 
>>> In order to collect complete information on all the inlining
>>> transformation that GCC applies on a given program,
>>> I searched online, and found that the option -fopt-info-inline might be
>>> the right option to use:
>>> 
>>> https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 
>>>  
>>> >> >
>>> >> >
>>> 
>>> in which, it mentioned:
>>> 
>>> "As another example,
>>> gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
>>> outputs information about missed optimizations as well as optimized
>>> locations from all the inlining passes into inline.txt. 
>>> 
>>> “
>>> 
>>> Then I checked a very small testcase with GCC9 as following:
>>> 
>>> [qinzhao@localhost inline_report]$ cat inline_1.c
>>> static int foo (int a)
>>> {
>>> return a + 10;
>>> }
>>> 
>>> static int bar (int b)
>>> {
>>> return b - 20;
>>> }
>>> 
>>> static int boo (int a, int b)
>>> {
>>> return foo (a) + bar (b);
>>> }
>>> 
>>> extern int v_a, v_b;
>>> extern int result;
>>> 
>>> int compute ()
>>> {
>>> result = boo (v_a, v_b);
>>> return result; 
>>> }
>>> 
>>> [qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc
>>> -O3 -fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
>>> [qinzhao@localhost inline_report]$ ls -l inline.txt
>>> -rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
>>> [qinzhao@localhost inline_report]$ cat inline_1.s
>>> .file   "inline_1.c"
>>> .text
>>> .p2align 4,,15
>>> .globl  compute
>>> .type   compute, @function
>>> compute:
>>> .LFB3:
>>> .cfi_startproc
>>> movlv_a(%rip), %edx
>>> movlv_b(%rip), %eax
>>> leal-10(%rdx,%rax), %eax
>>> movl%eax, result(%rip)
>>> ret
>>> .cfi_endproc
>>> .LFE3:
>>> .size   compute, .-compute
>>> .ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
>>> .section.note.GNU-stack,"",@progbits
>>> 
>>> From the above, we can see:
>>> 1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute”
>>> are completely inlined into “compute”;
>>> 2. However, there is NO any inline information is dumped into
>>> “inline.txt”.
>>> 
>>> 
>>> So, My questions are:
>>> 
>>> 1. Is the option -fopt-info-inline  the right option to use to get the
>>> complete inlining transformation info from GCC?
>>> 2. is this a bug that the current -fopt-info-inline cannot dump
>>> anything for this testing case?
>> 
>> I think the early inliner doesn't use opt-info yet. 
> 
> so, shall we add the opt-info support to early inliner?

I just created the following PR to record this work:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86395 


let me know if I missed anything.

thanks.

Qing




Re: Question on -fopt-info-inline

2018-07-05 Thread Qing Zhao


> On Jul 3, 2018, at 7:19 PM, Jeff Law  wrote:
> 
> On 07/03/2018 12:28 PM, Qing Zhao wrote:
>> 
>>>> 
>>>>> 
>>>>> In order to collect complete information on all the inlining
>>>>> transformation that GCC applies on a given program,
>>>>> I searched online, and found that the option -fopt-info-inline might be
>>>>> the right option to use:
>>>>> 
>>>>> in which, it mentioned:
>>>>> 
>>>>> "As another example,
>>>>> gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
>>>>> outputs information about missed optimizations as well as optimized
>>>>> locations from all the inlining passes into inline.txt. 
>>>>> 
>>>>> “
>>>>> 
>>>>> Then I checked a very small testcase with GCC9 as following:
>>>>> 
>>>>> [qinzhao@localhost inline_report]$ cat inline_1.c
>>>>> static int foo (int a)
>>>>> {
>>>>> return a + 10;
>>>>> }
>>>>> 
>>>>> static int bar (int b)
>>>>> {
>>>>> return b - 20;
>>>>> }
>>>>> 
>>>>> static int boo (int a, int b)
>>>>> {
>>>>> return foo (a) + bar (b);
>>>>> }
>>>>> 
>>>>> extern int v_a, v_b;
>>>>> extern int result;
>>>>> 
>>>>> int compute ()
>>>>> {
>>>>> result = boo (v_a, v_b);
>>>>> return result; 
>>>>> }
>>>>> 
>>>>> [qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc
>>>>> -O3 -fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
>>>>> [qinzhao@localhost inline_report]$ ls -l inline.txt
>>>>> -rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
>>>>> [qinzhao@localhost inline_report]$ cat inline_1.s
>>>>>   .file   "inline_1.c"
>>>>>   .text
>>>>>   .p2align 4,,15
>>>>>   .globl  compute
>>>>>   .type   compute, @function
>>>>> compute:
>>>>> .LFB3:
>>>>>   .cfi_startproc
>>>>>   movlv_a(%rip), %edx
>>>>>   movlv_b(%rip), %eax
>>>>>   leal-10(%rdx,%rax), %eax
>>>>>   movl%eax, result(%rip)
>>>>>   ret
>>>>>   .cfi_endproc
>>>>> .LFE3:
>>>>>   .size   compute, .-compute
>>>>>   .ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
>>>>>   .section.note.GNU-stack,"",@progbits
>>>>> 
>>>>> From the above, we can see:
>>>>> 1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute”
>>>>> are completely inlined into “compute”;
>>>>> 2. However, there is NO any inline information is dumped into
>>>>> “inline.txt”.
>>>>> 
>>>>> 
>>>>> So, My questions are:
>>>>> 
>>>>> 1. Is the option -fopt-info-inline  the right option to use to get the
>>>>> complete inlining transformation info from GCC?
>>>>> 2. is this a bug that the current -fopt-info-inline cannot dump
>>>>> anything for this testing case?
>>>> 
>>>> I think the early inliner doesn't use opt-info yet. 
>>> 
>>> so, shall we add the opt-info support to early inliner?
>> 
>> I just created the following PR to record this work:
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86395 
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86395> 
>> let me know if I missed anything.
> I'm hoping that the work David is doing WRT optimization information
> will be usable for the inliner as well.

where can I find more details of David’s work?

>  In fact, inlining and
> vectorization are the two use cases we identified internally as the
> first targets.

actually, during my study, I noticed that vectorization, some loop 
transformation, and OMP
are all generating information for -fopt-info (might not be complete yet, 
though). But inliner
is generating Nothing for -fopt-info.

Qing
> 
> 
> jeff



Re: Question on -fopt-info-inline

2018-07-10 Thread Qing Zhao
Hi, David,

thanks a lot for your information. very helpful.

specifically, I am mostly interested in the inline report part of the opt-info:

1. what’s the current status of inlining report through opt-info?  (with the 
upstream GCC last week,
the -fopt-info-inline report nothing)
2. what’s the plan for opt-info-inline? when will it be available?
3. is there any design available for the messages from opt-info-inline? will 
the call-chain info, profiling 
feedback info be available in the inlining report?

thanks.

Qing

> On Jul 5, 2018, at 3:55 PM, David Malcolm  wrote:
>>> 
>>> where can I find more details of David’s work?
>> 
>> I don't have pointers to all the discussion handy.  BUt here's one of
>> the early messages:
>> 
>> https://gcc.gnu.org/ml/gcc-patches/2018-05/msg01675.html
> 
> I'm currently attacking the problem of "better optimization
> information" from two directions:
> 
> (a) More destinations for the existing optimization reports: being able
> to send them through the diagnostics subsystem, and to be able to save
> them in a machine-readable format from which reports can be generated
> (e.g. prioritized by code hotness). The initial patch kit Jeff linked
> to above introduced a new API for doing that, but I'm no longer doing
> that, instead working on using the existing "dump_*" API in dumpfile.h.
> Some of this work is now in trunk: dump messages are now tagged with
> metadata about the hotness of the code being optimized, and where in
> GCC's own code the messages was emitted from ...but this new metadata
> is being dropped on the floor in dumpfile.c right now.  The latest
> version of the patch kit for (a) is awaiting review at:
>  "[PATCH 0/2] v4: optinfo framework and remarks"
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00066.html 
> 
> 
> (b) I'm looking at new, improved optimization reports for
> vectorization, by capturing higher-level information about why a loop
> can't be vectorized, in a form that hopefully is useful to an end-user. 
> See a (very rough) prototype here:
> 
>  * "[PATCH] [RFC] Higher-level reporting of vectorization problems"
>* https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01462.html 
> 
> 
> I'm working on a less rough version of (b) and hope to post it to gcc-
> patches soon.
> 
> Hope this sounds sane
> Dave



Re: Question on -fopt-info-inline

2018-07-10 Thread Qing Zhao


> On Jul 10, 2018, at 11:32 AM, Richard Biener  
> wrote:
> 
> On July 10, 2018 5:42:40 PM GMT+02:00, Qing Zhao  wrote:
>> Hi, David,
>> 
>> thanks a lot for your information. very helpful.
>> 
>> specifically, I am mostly interested in the inline report part of the
>> opt-info:
>> 
>> 1. what’s the current status of inlining report through opt-info? 
>> (with the upstream GCC last week,
>> the -fopt-info-inline report nothing)
>> 2. what’s the plan for opt-info-inline? when will it be available?
> 
> Just implement it yourself? There is already Winline you can look at. 
Yes, I can definitely do that. Just want to be consistent with the current work 
David is doing, any specific requirement?
and also want to make sure no duplicated effort spent on this work. :-)

> 
>> 3. is there any design available for the messages from opt-info-inline?
>> will the call-chain info, profiling 
>> feedback info be available in the inlining report?
> 
> What do you need the report for? With C++ too much inlining happens to be 
> interesting. You can also always look at the ipa-inline and einline dumps. 

we are mostly interested in C inlining report. mainly to find all the callers 
of a callee that is inlined. 

I will check the dump files of ipa-inline and einline to see whether they can 
be filtered to get enough info we want first.

thanks.

Qing
> 
> Richard. 



Re: [wish] Flexible array members in unions

2023-05-15 Thread Qing Zhao via Gcc


> On May 12, 2023, at 2:16 AM, Richard Biener via Gcc  wrote:
> 
> On Thu, May 11, 2023 at 11:14 PM Kees Cook via Gcc  wrote:
>> 
>> On Thu, May 11, 2023 at 08:53:52PM +, Joseph Myers wrote:
>>> On Thu, 11 May 2023, Kees Cook via Gcc wrote:
>>> 
 On Thu, May 11, 2023 at 06:29:10PM +0200, Alejandro Colomar wrote:
> On 5/11/23 18:07, Alejandro Colomar wrote:
> [...]
>> Would you allow flexible array members in unions?  Is there any
>> strong reason to disallow them?
 
 Yes please!! And alone in a struct, too.
 
 AFAICT, there is no mechanical/architectural reason to disallow them
 (especially since they _can_ be constructed with some fancy tricks,
 and they behave as expected.) My understanding is that it's disallowed
 due to an overly strict reading of the very terse language that created
 flexible arrays in C99.
>>> 
>>> Standard C has no such thing as a zero-size object or type, which would
>>> lead to problems with a struct or union that only contains a flexible
>>> array member there.
>> 
>> Ah-ha, okay. That root cause makes sense now.
> 
> Hmm. but then the workaround
> 
> struct X {
>  int n;
>  union u {
>  char at_least_size_one;
>  int iarr[];
>  short sarr[];
>  };
> };
> 
> doesn't work either.  We could make that a GNU extension without
> adverse effects?

I think that this might be  a very nice extension, which addresses the standard 
C’s restriction  on the zero-size object, and also can resolve kernel’s need. 
(And also other users’s similar programming need?)
And maybe it’s also possible to add such extension later to Standard C?

Similar as flexible array member in Standard C, we should limit such union as 
the last field of another structure.  (Since basically this union can be treated
As a flexible array member)

Qing

> 
> Richard.
> 
>> Why are zero-sized objects missing in Standard C? Or, perhaps, the better
>> question is: what's needed to support the idea of a zero-sized object?
>> 
>> --
>> Kees Cook



Re: [wish] Flexible array members in unions

2023-05-18 Thread Qing Zhao via Gcc


> On May 18, 2023, at 12:25 PM, Martin Uecker via Gcc  wrote:
> 
> 
> 
>> On Thu, May 11, 2023 at 11:14 PM Kees Cook via Gcc  wrote:
>>> 
>>> On Thu, May 11, 2023 at 08:53:52PM +, Joseph Myers wrote:
 On Thu, 11 May 2023, Kees Cook via Gcc wrote:
 
> On Thu, May 11, 2023 at 06:29:10PM +0200, Alejandro Colomar wrote:
>> On 5/11/23 18:07, Alejandro Colomar wrote:
>> [...]
>>> Would you allow flexible array members in unions?  Is there any
>>> strong reason to disallow them?
> 
> Yes please!! And alone in a struct, too.
> 
> AFAICT, there is no mechanical/architectural reason to disallow them
> (especially since they _can_ be constructed with some fancy tricks,
> and they behave as expected.) My understanding is that it's disallowed
> due to an overly strict reading of the very terse language that created
> flexible arrays in C99.
 
 Standard C has no such thing as a zero-size object or type, which would
 lead to problems with a struct or union that only contains a flexible
 array member there.
> 
> (I think it is fundamentally not too problematic to have zero-size
> objects, although it would take some work to specify the semantics
> exactly.)
> 
> But my preference would be to make structs / unions with FAM an
> incomplete type which would then restrict their use (for the cases
> now supported we would need backwards compatible exceptions).
> We could then allow such a struct / union as the last member
> of another struct / union which would make this an incomplete
> type too.

Yes, I like this approach. 
And we can make them GCC extensions first,  promote to C standard later.

My proposed patch sets (originally targeted on GCC13, now might need to target 
on GCC14) will
make one part of the above a GCC extension:
Allowing the struct with FAM as the last member of another struct. (See 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832)

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614794.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614790.html

I’d like these changes going into GCC first to improve this area.

> 
> We then would need a special macro (based on a builtin) instead
> of sizeof to get the size, but this would be safer anyway.
> 
> In principle, an even better solution would be to allow dynamic
> arrays because then it has a dynamic bound where the type with
> the bound could propagate to some user. Bounds checking would
> work as expected and more cases.
> 
> struct foo {
>  int len;
>  char buf[.len];
> };
> 
> But this takes a bit more work to get right.
> 
>>> 
>>> Ah-ha, okay. That root cause makes sense now.
>> 
>> Hmm. but then the workaround
>> 
>> struct X {
>>  int n;
>>  union u {
>>  char at_least_size_one;
>>  int iarr[];
>>  short sarr[];
>>  };
>> };
>> 
>> doesn't work either.  We could make that a GNU extension without
>> adverse effects?
> 
> I think we could allow this even without the "at_least_size_one"
> without a problem when allowing the use of such unions only as
> a last member of some structure. Allowing it elsewhere seems
> questionable anyway.

Yes,  Such an union can be treated as an flexible array member 
(just multiple flexible arrays sharing the same storage).  Therefore it’s 
reasonable
To only allow it as the last field of a structure. 

thanks.

Qing.

> 
>> Richard.
>> 
>>> Why are zero-sized objects missing in Standard C? Or, perhaps, the better
>>> question is: what's needed to support the idea of a zero-sized object?
> 
> Probably a lot of convincing that it actually does not cause problems,
> and is useful. Also a lot of work in making sure the standard is revised
> everywhere where it is necessary. I think zero sized objects and
> especially arrays are very useful also to avoid special code for corner
> cases in numerical algorithms. But I think here some restrictions on
> the use of the FAM will do.
> 
> 
> Martin



Questions on parallel processing and multi-threading support in GCC's profiling collection

2023-07-17 Thread Qing Zhao via Gcc
Hi, Jan,

I did a little search online and also into GCC’s documentation, and found the 
following 
several options to support parallel processing  and multi-threading during 
profiling collection:

https://gcc.gnu.org/onlinedocs/gcc-10.5.0/gcc/Instrumentation-Options.html

-fprofile-dir=path and 
-fprofile-reproducible=[multithreaded|parallel-runs|serial]

===
-fprofile-dir=path
…..
When an executable is run in a massive parallel environment, it is recommended 
to save profile to different folders. That can be done with variables in path 
that are exported during run-time:

%p
process ID.

%q{VAR}
value of environment variable VAR

-fprofile-reproducible=[multithreaded|parallel-runs|serial]
….

With -fprofile-reproducible=parallel-runs collected profile stays reproducible 
regardless the order of streaming of the data into gcda files. This setting 
makes it possible to run multiple instances of instrumented program in parallel 
(such as with make -j). This reduces quality of gathered data, in particular of 
indirect call profiling.

==

I have the following questions on GCC’s  profiling collection with parallel 
processing or multi-threading environment: 

1. In addition to the above two options, are there any other changes in GCC to 
support parallel processing or multi-threading?
2. Is there any documentation on how to collect and use profiling feedback for 
a parallel processing or multi-threading environment safely and accurately?
3. I noted that -fprofile-dir=path was added in GCC9, and 
-fprofile-reproducible was added in GCC10, is it doable to add these two 
options back to GCC8 to 
Support parallel processing and multi-threading profiling collection? 

Thanks a lot for your help.

Qing

Re: Should GCC warn about sizeof(flexible_struct)?

2023-08-14 Thread Qing Zhao via Gcc


> On Aug 14, 2023, at 2:41 AM, Richard Biener via Gcc  wrote:
> 
> On Fri, Aug 11, 2023 at 8:30 PM Alejandro Colomar via Gcc
>  wrote:
>> 
>> Hi!
>> 
>> Structures with flexible array members have restrictions about being
>> used in arrays or within other structures, as the size of the enclosing
>> aggregate type would be... inconsistent.
>> 
>> In general, sizeof(flexible_struct) is a problematic thing that rarely
>> means what programmers think it means.  It is not the size of the
>> structure up to the flexible array member; or expressed using C,
>> the following can be true:
>> 
>>sizeof(s) != offsetof(s, fam)
>> 
>> See the program at the bottom that demonstrates how this is problematic.
>> 
>> It's true that if one uses
>> 
>>malloc(offseof(s, fam) + sizeof_member(s, fam[0]) * N);
>> 
>> and N is very small (0 or 1 usually), the allocation would be smaller
>> than the object size, which for GCC seems to be fine, but I'm worried the
>> standard is not clear enough about its validity[1].
>> 
>> [1]: 
>> 
>> To avoid having UB there, pedantically one would need to call
>> 
>>malloc(MAX(sizeof(s), offseof(s, fam) + sizeof_member(s, fam[0]) * 
>> N));
>> 
>> But I think that's the only correct use of sizeof() with structures
>> containing flexible array members.  So it seems sizeof() by itself is
>> a valid thing, but when adding it to something else to get the total size,
>> or doing any arithmetic with it, that's dubious code.
>> 
>> How about some -Wfam-sizeof-arithmetic that would not warn about taking
>> sizeof(s) but would warn if that sizeof is used in any arithmetic?
> 
> There are probably many ways sizeof() plus arithmetic can yield a correct
> size for allocation.  After all _all_ uses of FAM requires allocation
> and there's
> no convenient standard way of calculating the required size (sizeof
> (fam-type[n])?).
> 
> Iff we want to diagnose anything then possibly a computation that looks like
> a size computation but that's actually smaller than required,

Yes, I think that an warning on insufficient allocation might be useful in 
general, 
which might be combined with Martin’s previous proposed patch together: 
-Walloc_type?
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625172.html

Another thing is: is there a place in GCC’s doc where we can add some 
clarification
or suggestion to the users on how to allocate space for structures with FAM?
Looks like that this is a very confusion area..

thanks.

Qing

> but
> other than that - what
> would you suggest to fix such reported warnings?
> 
> Richard.
> 
>> Cheers,
>> Alex
>> 
>> ---
>> 
>> $ cat off.c
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> 
>> 
>> struct s {
>>int   i;
>>char  c;
>>char  fam[];
>> };
>> 
>> 
>> static inline void *xmalloc(size_t size);
>> 
>> 
>> int
>> main(void)
>> {
>>char  *p;
>>struct s  *s;
>> 
>>printf("sizeof: %zu\n", sizeof(struct s));
>>printf("offsetof: %zu\n", offsetof(struct s, fam));
>> 
>>puts("\nWith sizeof():");
>> 
>>s = xmalloc(sizeof(struct s) + sizeof("Hello, sizeof!"));
>>strcpy(s->fam, "Hello, sizeof!");
>>p = (char *) s + sizeof(struct s);
>>puts(p);
>>free(s);
>> 
>>puts("\nWith offsetof(3):");
>> 
>>s = xmalloc(offsetof(struct s, fam) + sizeof("Hello, offsetof!"));
>>strcpy(s->fam, "Hello, offsetof!");
>>p = (char *) s + offsetof(struct s, fam);
>>puts(p);
>>free(s);
>> 
>>exit(EXIT_SUCCESS);
>> }
>> 
>> 
>> static inline void *
>> xmalloc(size_t size)
>> {
>>void  *p;
>> 
>>p = malloc(size);
>>if (p == NULL)
>>err(EXIT_FAILURE, "malloc");
>>return p;
>> }
>> $ gcc-13 -Wall -Wextra -Wpadded -fanalyzer off.c
>> off.c:12:1: warning: padding struct size to alignment boundary with 3 bytes 
>> [-Wpadded]
>>   12 | };
>>  | ^
>> 
>> 
>> The only warning I know that is triggered in the code above is -Wpadded,
>> which is related to this problem, but I think there should be something
>> to warn about sizeof() in this context.
>> 
>> 
>> --
>> 
>> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



Re: Handling C2Y zero-length operations on null pointers

2024-10-07 Thread Qing Zhao via Gcc


> On Oct 7, 2024, at 11:22, Jakub Jelinek  wrote:
> 
> On Mon, Oct 07, 2024 at 03:14:22PM +0000, Qing Zhao wrote:
>>> Consider the qsort case.  My understanding was that the paper is making
>>> typedef int (*cmpfn) (const void *, const void *);
>>> qsort (NULL, 0, 1, (cmpfn) NULL);
>>> valid (but is
>>> qsort (NULL, 1, 0, (cmpfn) NULL);
>>> still invalid?).
>>> How do you express that with access attribute, which can only have 1 size
>>> argument?  The accessed memory for the read/write pointee of the first
>>> argument has nmemb * size parameter bytes size.
>> 
>> For the other attribute “alloc_size”, we have two forms, 
>> A. alloc_size (position)
>> and
>> B. alloc_size (position-1, position-2)
>> 
>> The 2nd form is used to represent nmemb * size. 
>> 
>> Is it possible that we extend the attribute “access” similarly? 
>> 
>> Then we can use the attribute “access” consistently for this purpose?
> 
> We could do that and express the array pointer of qsort/bsearch that way.
> But there is also the function pointer case, there we don't access any bytes
> (and what exactly a function pointer means depends on architecture, can be
> code pointer, or can be pointer to function descriptor etc.), so we really
> need to express this pointer must be non-NULL if some other argument (or
> their product?) is non-0.

I have a question here (maybe a stupid question-:): do we really need to 
express such situation for function pointers? 
I cannot construct a use case for this…

Qing

> If one passes constant(s) to those arguments, then such checking can be
> done through a warning like we warn for passing NULL to nonnull attributed
> parameters right now (or not if 0), if it is non-constant, then it can be
> diagnosed in sanitizers.
> 
> Jakub
> 



Re: Handling C2Y zero-length operations on null pointers

2024-10-07 Thread Qing Zhao via Gcc


> On Oct 7, 2024, at 10:13, Jakub Jelinek via Gcc  wrote:
> 
> On Fri, Oct 04, 2024 at 12:42:24AM +0200, Florian Weimer wrote:
>> * Joseph Myers:
>> 
>>> The real question is how to achieve optimal warnings in the absence of the 
>>> attribute.  Should we have a variant of the nonnull attribute that warns 
>>> for NULL arguments but without optimizing based on them?
>> 
>> I think attribute access already covers part of it:
>> 
>> #include 
>> void read_array (void *, size_t) __attribute__ ((access (read_only, 1, 2)));
>> void
>> f (void)
>> {
>>  read_array (NULL, 0); // No warning.
>>  read_array (NULL, 1); // Warning.
>> }
>> 
>> It does not work for functions like strndup that support both string
>> arguments (of any length) and array arguments of a specified size.
>> The read_only variant requires an initialized array of the specified
>> length.
> 
> access attribute can't deal with various other things.
> 
> Consider the qsort case.  My understanding was that the paper is making
> typedef int (*cmpfn) (const void *, const void *);
> qsort (NULL, 0, 1, (cmpfn) NULL);
> valid (but is
> qsort (NULL, 1, 0, (cmpfn) NULL);
> still invalid?).
> How do you express that with access attribute, which can only have 1 size
> argument?  The accessed memory for the read/write pointee of the first
> argument has nmemb * size parameter bytes size.

For the other attribute “alloc_size”, we have two forms, 
A. alloc_size (position)
and
B. alloc_size (position-1, position-2)

The 2nd form is used to represent nmemb * size. 

Is it possible that we extend the attribute “access” similarly? 

Then we can use the attribute “access” consistently for this purpose?

Qing

> And using access attribute for function pointers doesn't work, there is
> no data to be read/written there, just code.
> 
> Guess some of the nonnull cases could be replaced by access attribute
> if we clarify the documentation that if SIZE_INDEX is specified and that
> argument is non-zero then the pointer has to be non-NULL, and teach
> sanitizers etc. to sanitize those.
> 
> For the rest, perhaps we need some nonnull_if_nonzero argument
> which requires that the parameter identified by the first attribute
> argument must be pointer which is non-NULL if the parameter identified
> by the second attribute argument is non-zero.
> And get clarified the qsort/bsearch cases whether it is about just
> nmemb == 0 or nmemb * size == 0.
> 
> Jakub
> 



Re: Handling C2Y zero-length operations on null pointers

2024-11-13 Thread Qing Zhao via Gcc


> On Nov 12, 2024, at 01:51, Martin Uecker  wrote:
> 
> Am Montag, dem 07.10.2024 um 15:14 + schrieb Qing Zhao:
>> 
>>> On Oct 7, 2024, at 10:13, Jakub Jelinek via Gcc  wrote:
>>> 
>>> On Fri, Oct 04, 2024 at 12:42:24AM +0200, Florian Weimer wrote:
>>>> * Joseph Myers:
>>>> 
>>>>> The real question is how to achieve optimal warnings in the absence of 
>>>>> the 
>>>>> attribute.  Should we have a variant of the nonnull attribute that warns 
>>>>> for NULL arguments but without optimizing based on them?
>>>> 
>>>> I think attribute access already covers part of it:
>>>> 
>>>> #include 
>>>> void read_array (void *, size_t) __attribute__ ((access (read_only, 1, 
>>>> 2)));
>>>> void
>>>> f (void)
>>>> {
>>>> read_array (NULL, 0); // No warning.
>>>> read_array (NULL, 1); // Warning.
>>>> }
>>>> 
>>>> It does not work for functions like strndup that support both string
>>>> arguments (of any length) and array arguments of a specified size.
>>>> The read_only variant requires an initialized array of the specified
>>>> length.
>>> 
>>> access attribute can't deal with various other things.
>>> 
>>> Consider the qsort case.  My understanding was that the paper is making
>>> typedef int (*cmpfn) (const void *, const void *);
>>> qsort (NULL, 0, 1, (cmpfn) NULL);
>>> valid (but is
>>> qsort (NULL, 1, 0, (cmpfn) NULL);
>>> still invalid?).
>>> How do you express that with access attribute, which can only have 1 size
>>> argument?  The accessed memory for the read/write pointee of the first
>>> argument has nmemb * size parameter bytes size.
>> 
>> For the other attribute “alloc_size”, we have two forms, 
>> A. alloc_size (position)
>> and
>> B. alloc_size (position-1, position-2)
>> 
>> The 2nd form is used to represent nmemb * size. 
>> 
>> Is it possible that we extend the attribute “access” similarly? 
>> 
>> Then we can use the attribute “access” consistently for this purpose?
> 
> We also miss sanitizer support.
> 
> How about letting "access" only be about access range
> and instead have separate attribute that can be used to
> express more complicated preconditions?

Sounds reasonable to me. 
Yes, it’s not a good idea to mix them together with one attribute. 

Qing
> 
> void* foo(void *p, size_t mmemb, size_t size)
> [[precondition((p == NULL) == (mmemb * size == 0)]];
> 
> (not saying this is the right condition for any function
> in the standard library)
> 
> Martin
> 
>> 
>> Qing
>> 
>>> And using access attribute for function pointers doesn't work, there is
>>> no data to be read/written there, just code.
>>> 
>>> Guess some of the nonnull cases could be replaced by access attribute
>>> if we clarify the documentation that if SIZE_INDEX is specified and that
>>> argument is non-zero then the pointer has to be non-NULL, and teach
>>> sanitizers etc. to sanitize those.
>>> 
>>> For the rest, perhaps we need some nonnull_if_nonzero argument
>>> which requires that the parameter identified by the first attribute
>>> argument must be pointer which is non-NULL if the parameter identified
>>> by the second attribute argument is non-zero.
>>> And get clarified the qsort/bsearch cases whether it is about just
>>> nmemb == 0 or nmemb * size == 0.
>>> 
>>> Jakub