Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Richard Biener Wed, 25 Oct 2023 03:25:45 -0700

> Am 25.10.2023 um 10:16 schrieb Martin Uecker <uec...@tugraz.at>:
> 
> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>> 
>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uec...@tugraz.at>:
>>> 
>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>> Hi, Sid,
>>>> 
>>>> Really appreciate for your example and detailed explanation. Very helpful.
>>>> I think that this example is an excellent example to show (almost) all the 
>>>> issues we need to consider.
>>>> 
>>>> I slightly modified this example to make it to be compilable and run-able, 
>>>> as following: 
>>>> (but I still cannot make the incorrect reordering or DSE happening, 
>>>> anyway, the potential reordering possibility is there…)
>>>> 
>>>> 1 #include <malloc.h>
>>>> 2 struct A
>>>> 3 {
>>>> 4  size_t size;
>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>> 6 };
>>>> 7 
>>>> 8 static size_t
>>>> 9 get_size_from (void *ptr)
>>>> 10 {
>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>> 12 }
>>>> 13 
>>>> 14 void
>>>> 15 foo (size_t sz)
>>>> 16 {
>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>>> sizeof(char));
>>>> 18  obj->size = sz;
>>>> 19  obj->buf[0] = 2;
>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>> 21  return;
>>>> 22 }
>>>> 23 
>>>> 24 int main ()
>>>> 25 {
>>>> 26  foo (20);
>>>> 27  return 0;
>>>> 28 }
>>>> 
>>>> With my GCC, it was compiled and worked:
>>>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>> 20
>>>> Situation 1: With O1 and above, the routine “get_size_from” was inlined 
>>>> into “foo”, therefore, the call to __bdos is in the same routine as the 
>>>> instantiation of the object, and the TYPE information and the attached 
>>>> counted_by attribute information in the TYPE of the object can be USED by 
>>>> the __bdos call to compute the final object size. 
>>>> 
>>>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>> -1
>>>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
>>>> “foo”, therefore, the call to __bdos is Not in the same routine as the 
>>>> instantiation of the object, As a result, the TYPE info and the attached 
>>>> counted_by info of the object can NOT be USED by the __bdos call. 
>>>> 
>>>> Keep in mind of the above 2 situations, we will refer them in below:
>>>> 
>>>> 1. First,  the problem we are trying to resolve is:
>>>> 
>>>> (Your description):
>>>> 
>>>>> the reordering of __bdos w.r.t. initialization of the size parameter but 
>>>>> to also account for DSE of the assignment, we can abstract this problem 
>>>>> to that of DFA being unable to see implicit use of the size parameter in 
>>>>> the __bdos call.
>>>> 
>>>> basically is correct.  However, with the following exception:
>>>> 
>>>> The implicit use of the size parameter in the __bdos call is not always 
>>>> there, it ONLY exists WHEN the __bdos is able to evaluated to an 
>>>> expression of the size parameter in the “objsz” phase, i.e., the 
>>>> “Situation 1” of the above example. 
>>>> In the “Situation 2”, when the __bdos does not see the TYPE of the real 
>>>> object,  it does not see the counted_by information from the TYPE, 
>>>> therefore,  it is not able to evaluate the size of the object through the 
>>>> counted_by information.  As a result, the implicit use of the size 
>>>> parameter in the __bdos call does NOT exist at all.  The optimizer can 
>>>> freely reorder the initialization of the size parameter with the __bdos 
>>>> call since there is no data flow dependency between these two. 
>>>> 
>>>> With this exception in mind, we can see that your proposed “option 2” 
>>>> (making the type of size “volatile”) is too conservative, it will  disable 
>>>> many optimizations  unnecessarily, even though it’s safe and simple to 
>>>> implement. 
>>>> 
>>>> As a compiler optimization person for many many years, I really don’t want 
>>>> to take this approach at this moment.  -:)
>>>> 
>>>> 2. Some facts I’d like to mention:
>>>> 
>>>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
>>>> optimization stage. During RTL stage,  the __bdos call has already been 
>>>> replaced by an expression of the size parameter or a constant, the data 
>>>> dependency is explicitly in the IR already.  I believe that the data 
>>>> analysis in RTL stage should pick up the data dependency correctly, No 
>>>> special handling is needed in RTL.
>>>> 
>>>> B. If the __bdos call cannot see the real object , it has no way to get 
>>>> the “counted_by” field from the TYPE of the real object. So, if we try to 
>>>> add the implicit use of the “counted_by” field to the __bdos call, the 
>>>> object instantiation should be in the same routine as the __bdos call.  
>>>> Both the FE and the gimplification phase are too early to do this work. 
>>>> 
>>>> 2. Then, what’s the best approach to resolve this problem:
>>>> 
>>>> There were several suggestions so far:
>>>> 
>>>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>>>     A.1, during FE;
>>>>     A.2, during gimplification phase;
>>>> B.  Encode the implicit USE  in the type of size, to make the size 
>>>> “volatile”;
>>>> C.  Encode the implicit USE  in the type of buf, then update the 
>>>> optimization passes to use this implicit USE encoded in the type of buf.
>>>> 
>>>> As I explained in the above, 
>>>> ** Approach A (both A.1 and A.2) does not work;
>>>> ** Approach B will have big performance impact, I’d prefer not to take 
>>>> this approach at this moment.
>>>> ** Approach C will be a lot of change in GCC, and also not very necessary 
>>>> since the ONLY implicit use of the size parameter is in the __bdos call 
>>>> when __bdos can see the real object.
>>>> 
>>>> So, all the above proposed approaches, A, B, C, are not very good. 
>>>> 
>>>> Then, maybe the following might work better?
>>>> 
>>>> In the tree optimization stage, 
>>>>   * After the inlining transformation applied,  
>>>> +  * Before the data-flow related optimization happens, 
>>>> +  * when the data flow analysis is constructed, 
>>>> 
>>>> For each call to __bdos, add the implicit use of size parameter. 
>>>> 
>>>> Is this doable? 
>>> 
>>> Here is another proposal:  Add a new builtin function
>>> 
>>> __builtin_with_size(x, size)
>>> 
>>> that return x but behaves similar to an allocation
>>> function in that BDOS can look at the size argument
>>> to discover the size.
>>> 
>>> The FE insers this function when the field is accessed:
>> 
>> When it’s set I suppose.  Turn
>> 
>> X.l = n;
>> 
>> Into
>> 
>> X.l = __builtin_with_size (x.buf, n);
> 
> It would turn 
> 
> some_variable = (&) x.buf
> 
> into 
> 
> some_variable = __builtin_with_size ( (&) x.buf. x.len)

Unless you use the address of x.Len this will not work when len is initialized 
after buf.  And the address will not have a meaningful data dependence.
> 
> So the later access to x.buf and not the initialization
> of a member of the struct (which is too early).

>> 
>> And indeed we need sth like a fat pointer to reliably solve all the issues.
> 
> What happens for other languages such as FORTRAN 
> and ADA do?  Are those pointers lowered in the FE?

Yes

> To me it seems there are two sound ways to introduce
> such information:
> 
> - either by using the type system.  This works in
> the FE in C using variably modified types
> 
> char buf[n];
> __auto_type p = &buf;
> 
> ... = sizeof (*p);
> 
> But if I understand Jakob's comment to some PR 
> correctly the size information in the TREE_TYPE
> is not processed correctly anymore in the
> middle-end. 

The type based info is lowered during gimplification and in particular for 
pointer types the middle-end quickly loses track of the original type.

Richard 

> 
> - or one injects the information via some
> tree node or builtin at certain points in
> time as suggested here, and the compiler
> derives the information from these points 
> as tree-object-size does.  
> 
> 
> The use of attributes seems fragile and - looking
> at the access attribute also overly complex.  And 
> we somehow support this only for function types
> and not elsewhere and also this then gets lost
> during  inlining.   So I think for all this stuff
> (nonnull, access, counted_by) I think a better
> approach is needed.
> 
> 
> Martin
> 
> 
>> 
>> Richard 
> 
> 
> 
> 
>> 
>>> __builtin_with_size(x.buf, x.L);
>>> 
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>>> 
>>>> Otherwise, we might need to take the “volatile” approach. 
>>>> 
>>>> Let me know your suggestion and comment.
>>>> 
>>>> Thanks a lot.
>>>> 
>>>> Qing
>>>> 
>>>> 
>>>>> __bdos is the one such implicit user of the size parameter and you're 
>>>>> proposing to solve this by encoding the relationship between buffer and 
>>>>> size at the __bdos call site.  But what about the case when the 
>>>>> instantiation of the object is not at the same place as the __bdos call 
>>>>> site, i.e. the DFA is unable to make that relationship?
>>>>> 
>>>>> The example Martin showed where the subobject gets "hidden" behind a 
>>>>> pointer was a trivial one where DFA *may* actually work in practice 
>>>>> (because the object-size pass can thread through these assignments) but 
>>>>> think about this one:
>>>>> 
>>>>> struct A
>>>>> {
>>>>> size_t size;
>>>>> char buf[] __attribute__((counted_by(size)));
>>>>> }
>>>>> 
>>>>> static size_t
>>>>> get_size_of (void *ptr)
>>>>> {
>>>>> return __bdos (ptr, 1);
>>>>> }
>>>>> 
>>>>> void
>>>>> foo (size_t sz)
>>>>> {
>>>>> struct A *obj = __builtin_malloc (sz);
>>>>> obj->size = sz;
>>>>> 
>>>>> ...
>>>>> __builtin_printf ("%zu\n", get_size_of (obj->array));
>>>>> ...
>>>>> }
>>>>> 
>>>>> Until get_size_of is inlined, no DFA can see the __bdos call in the same 
>>>>> place as the point where obj is allocated.  As a result, the assignment 
>>>>> to obj->size could get reordered (or the store eliminated) w.r.t. the 
>>>>> __bdos call until the inlining happens.
>>>>> 
>>>>> As a result, the relationship between buf and size established by the 
>>>>> attribute needs to be encoded into the type somehow.  There are two 
>>>>> options:
>>>>> 
>>>>> Option 1: Encode the relationship in the type of buf
>>>>> 
>>>>> This is kinda what you end up doing with component_ref_has_counted_by and 
>>>>> it does show the relationship if one is looking (through that call), but 
>>>>> nothing more that can be used to, e.g. prevent reordering or tell the 
>>>>> optimizer that the reference to the buf member may imply a reference to 
>>>>> the size member as well.  This could be remedied by somehow encoding the 
>>>>> USES relationship for size into the type of buf that the optimization 
>>>>> passes can see.  I feel like this may be a bit convoluted to specify in a 
>>>>> future language extension in a way that will actually be well understood 
>>>>> by developers, but it will likely generate faster runtime code.  This 
>>>>> will also likely require a bigger change across passes.
>>>>> 
>>>>> Option 2: Encode the relationship in the type of size
>>>>> 
>>>>> The other option is to enhance the type of size somehow so that it 
>>>>> discourages reordering and store elimination, basically pessimizing code. 
>>>>>  I think volatile semantics might be the way to do this and may even be 
>>>>> straightforward to specify in the future language extension given that it 
>>>>> builds on a known language construct and is thematically related.  
>>>>> However it does pessimize output for code that implements __counted_by__.
>>>>> 
>>>>> Thanks,
>>>>> Sid
>>>> 
>>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging
> 
>
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Reply via email to