Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener:
> 
> > Am 25.10.2023 um 10:16 schrieb Martin Uecker <uec...@tugraz.at>:
> > 
> > Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
> > > 
> > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker <uec...@tugraz.at>:
> > > > 
> > > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > > > Hi, Sid,
> > > > > 
> > > > > Really appreciate for your example and detailed explanation. Very 
> > > > > helpful.
> > > > > I think that this example is an excellent example to show (almost) 
> > > > > all the issues we need to consider.
> > > > > 
> > > > > I slightly modified this example to make it to be compilable and 
> > > > > run-able, as following: 
> > > > > (but I still cannot make the incorrect reordering or DSE happening, 
> > > > > anyway, the potential reordering possibility is there…)
> > > > > 
> > > > > 1 #include <malloc.h>
> > > > > 2 struct A
> > > > > 3 {
> > > > > 4  size_t size;
> > > > > 5  char buf[] __attribute__((counted_by(size)));
> > > > > 6 };
> > > > > 7 
> > > > > 8 static size_t
> > > > > 9 get_size_from (void *ptr)
> > > > > 10 {
> > > > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > > > 12 }
> > > > > 13 
> > > > > 14 void
> > > > > 15 foo (size_t sz)
> > > > > 16 {
> > > > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > > > sizeof(char));
> > > > > 18  obj->size = sz;
> > > > > 19  obj->buf[0] = 2;
> > > > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > > > 21  return;
> > > > > 22 }
> > > > > 23 
> > > > > 24 int main ()
> > > > > 25 {
> > > > > 26  foo (20);
> > > > > 27  return 0;
> > > > > 28 }
> > > > > 
> > > > > With my GCC, it was compiled and worked:
> > > > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > > > 20
> > > > > Situation 1: With O1 and above, the routine “get_size_from” was 
> > > > > inlined into “foo”, therefore, the call to __bdos is in the same 
> > > > > routine as the instantiation of the object, and the TYPE information 
> > > > > and the attached counted_by attribute information in the TYPE of the 
> > > > > object can be USED by the __bdos call to compute the final object 
> > > > > size. 
> > > > > 
> > > > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > > > -1
> > > > > Situation 2: With O0, the routine “get_size_from” was NOT inlined 
> > > > > into “foo”, therefore, the call to __bdos is Not in the same routine 
> > > > > as the instantiation of the object, As a result, the TYPE info and 
> > > > > the attached counted_by info of the object can NOT be USED by the 
> > > > > __bdos call. 
> > > > > 
> > > > > Keep in mind of the above 2 situations, we will refer them in below:
> > > > > 
> > > > > 1. First,  the problem we are trying to resolve is:
> > > > > 
> > > > > (Your description):
> > > > > 
> > > > > > the reordering of __bdos w.r.t. initialization of the size 
> > > > > > parameter but to also account for DSE of the assignment, we can 
> > > > > > abstract this problem to that of DFA being unable to see implicit 
> > > > > > use of the size parameter in the __bdos call.
> > > > > 
> > > > > basically is correct.  However, with the following exception:
> > > > > 
> > > > > The implicit use of the size parameter in the __bdos call is not 
> > > > > always there, it ONLY exists WHEN the __bdos is able to evaluated to 
> > > > > an expression of the size parameter in the “objsz” phase, i.e., the 
> > > > > “Situation 1” of the above example. 
> > > > > In the “Situation 2”, when the __bdos does not see the TYPE of the 
> > > > > real object,  it does not see the counted_by information from the 
> > > > > TYPE, therefore,  it is not able to evaluate the size of the object 
> > > > > through the counted_by information.  As a result, the implicit use of 
> > > > > the size parameter in the __bdos call does NOT exist at all.  The 
> > > > > optimizer can freely reorder the initialization of the size parameter 
> > > > > with the __bdos call since there is no data flow dependency between 
> > > > > these two. 
> > > > > 
> > > > > With this exception in mind, we can see that your proposed “option 2” 
> > > > > (making the type of size “volatile”) is too conservative, it will  
> > > > > disable many optimizations  unnecessarily, even though it’s safe and 
> > > > > simple to implement. 
> > > > > 
> > > > > As a compiler optimization person for many many years, I really don’t 
> > > > > want to take this approach at this moment.  -:)
> > > > > 
> > > > > 2. Some facts I’d like to mention:
> > > > > 
> > > > > A.  The incorrect reordering (or CSE) potential ONLY exists in the 
> > > > > TREE optimization stage. During RTL stage,  the __bdos call has 
> > > > > already been replaced by an expression of the size parameter or a 
> > > > > constant, the data dependency is explicitly in the IR already.  I 
> > > > > believe that the data analysis in RTL stage should pick up the data 
> > > > > dependency correctly, No special handling is needed in RTL.
> > > > > 
> > > > > B. If the __bdos call cannot see the real object , it has no way to 
> > > > > get the “counted_by” field from the TYPE of the real object. So, if 
> > > > > we try to add the implicit use of the “counted_by” field to the 
> > > > > __bdos call, the object instantiation should be in the same routine 
> > > > > as the __bdos call.  Both the FE and the gimplification phase are too 
> > > > > early to do this work. 
> > > > > 
> > > > > 2. Then, what’s the best approach to resolve this problem:
> > > > > 
> > > > > There were several suggestions so far:
> > > > > 
> > > > > A.  Add an additional argument, the size parameter,  to __bdos, 
> > > > >     A.1, during FE;
> > > > >     A.2, during gimplification phase;
> > > > > B.  Encode the implicit USE  in the type of size, to make the size 
> > > > > “volatile”;
> > > > > C.  Encode the implicit USE  in the type of buf, then update the 
> > > > > optimization passes to use this implicit USE encoded in the type of 
> > > > > buf.
> > > > > 
> > > > > As I explained in the above, 
> > > > > ** Approach A (both A.1 and A.2) does not work;
> > > > > ** Approach B will have big performance impact, I’d prefer not to 
> > > > > take this approach at this moment.
> > > > > ** Approach C will be a lot of change in GCC, and also not very 
> > > > > necessary since the ONLY implicit use of the size parameter is in the 
> > > > > __bdos call when __bdos can see the real object.
> > > > > 
> > > > > So, all the above proposed approaches, A, B, C, are not very good. 
> > > > > 
> > > > > Then, maybe the following might work better?
> > > > > 
> > > > > In the tree optimization stage, 
> > > > >   * After the inlining transformation applied,  
> > > > > +  * Before the data-flow related optimization happens, 
> > > > > +  * when the data flow analysis is constructed, 
> > > > > 
> > > > > For each call to __bdos, add the implicit use of size parameter. 
> > > > > 
> > > > > Is this doable? 
> > > > 
> > > > Here is another proposal:  Add a new builtin function
> > > > 
> > > > __builtin_with_size(x, size)
> > > > 
> > > > that return x but behaves similar to an allocation
> > > > function in that BDOS can look at the size argument
> > > > to discover the size.
> > > > 
> > > > The FE insers this function when the field is accessed:
> > > 
> > > When it’s set I suppose.  Turn
> > > 
> > > X.l = n;
> > > 
> > > Into
> > > 
> > > X.l = __builtin_with_size (x.buf, n);
> > 
> > It would turn 
> > 
> > some_variable = (&) x.buf
> > 
> > into 
> > 
> > some_variable = __builtin_with_size ( (&) x.buf. x.len)
> 
> Unless you use the address of x.Len this will not work when len is 
> initialized after buf.  And the address will not have a meaningful data 
> dependence.
> > 

It would be a semantic requirement for this feature that
x.len needs to be initialized before x.buf is accessed.  

Otherwise, I am not sure how to define the time point 
at which x.len should be evaluated. 

> > So the later access to x.buf and not the initialization
> > of a member of the struct (which is too early).
> 
> > > 
> > > And indeed we need sth like a fat pointer to reliably solve all the 
> > > issues.
> > 
> > What happens for other languages such as FORTRAN 
> > and ADA do?  Are those pointers lowered in the FE?
> 
> Yes
> 
> > To me it seems there are two sound ways to introduce
> > such information:
> > 
> > - either by using the type system.  This works in
> > the FE in C using variably modified types
> > 
> > char buf[n];
> > __auto_type p = &buf;
> > 
> > ... = sizeof (*p);
> > 
> > But if I understand Jakob's comment to some PR 
> > correctly the size information in the TREE_TYPE
> > is not processed correctly anymore in the
> > middle-end. 
> 
> The type based info is lowered during gimplification and in particular for 
> pointer types the middle-end quickly loses track of the original type.
> 

Would it work if we make sure that we find a suitable
type? Or in other words, are the (non-constant) size 
expressions inside it still useful in later passes? 

Martin


> Richard 
> 
> > 
> > - or one injects the information via some
> > tree node or builtin at certain points in
> > time as suggested here, and the compiler
> > derives the information from these points 
> > as tree-object-size does.  
> > 
> > 
> > The use of attributes seems fragile and - looking
> > at the access attribute also overly complex.  And 
> > we somehow support this only for function types
> > and not elsewhere and also this then gets lost
> > during  inlining.   So I think for all this stuff
> > (nonnull, access, counted_by) I think a better
> > approach is needed.
> > 
> > 
> > Martin
> > 
> > 
> > > 
> > > Richard 
> > 
> > 
> > 
> > 
> > > 
> > > > __builtin_with_size(x.buf, x.L);
> > > > 
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > 
> > > > > 
> > > > > Otherwise, we might need to take the “volatile” approach. 
> > > > > 
> > > > > Let me know your suggestion and comment.
> > > > > 
> > > > > Thanks a lot.
> > > > > 
> > > > > Qing
> > > > > 
> > > > > 
> > > > > > __bdos is the one such implicit user of the size parameter and 
> > > > > > you're proposing to solve this by encoding the relationship between 
> > > > > > buffer and size at the __bdos call site.  But what about the case 
> > > > > > when the instantiation of the object is not at the same place as 
> > > > > > the __bdos call site, i.e. the DFA is unable to make that 
> > > > > > relationship?
> > > > > > 
> > > > > > The example Martin showed where the subobject gets "hidden" behind 
> > > > > > a pointer was a trivial one where DFA *may* actually work in 
> > > > > > practice (because the object-size pass can thread through these 
> > > > > > assignments) but think about this one:
> > > > > > 
> > > > > > struct A
> > > > > > {
> > > > > > size_t size;
> > > > > > char buf[] __attribute__((counted_by(size)));
> > > > > > }
> > > > > > 
> > > > > > static size_t
> > > > > > get_size_of (void *ptr)
> > > > > > {
> > > > > > return __bdos (ptr, 1);
> > > > > > }
> > > > > > 
> > > > > > void
> > > > > > foo (size_t sz)
> > > > > > {
> > > > > > struct A *obj = __builtin_malloc (sz);
> > > > > > obj->size = sz;
> > > > > > 
> > > > > > ...
> > > > > > __builtin_printf ("%zu\n", get_size_of (obj->array));
> > > > > > ...
> > > > > > }
> > > > > > 
> > > > > > Until get_size_of is inlined, no DFA can see the __bdos call in the 
> > > > > > same place as the point where obj is allocated.  As a result, the 
> > > > > > assignment to obj->size could get reordered (or the store 
> > > > > > eliminated) w.r.t. the __bdos call until the inlining happens.
> > > > > > 
> > > > > > As a result, the relationship between buf and size established by 
> > > > > > the attribute needs to be encoded into the type somehow.  There are 
> > > > > > two options:
> > > > > > 
> > > > > > Option 1: Encode the relationship in the type of buf
> > > > > > 
> > > > > > This is kinda what you end up doing with 
> > > > > > component_ref_has_counted_by and it does show the relationship if 
> > > > > > one is looking (through that call), but nothing more that can be 
> > > > > > used to, e.g. prevent reordering or tell the optimizer that the 
> > > > > > reference to the buf member may imply a reference to the size 
> > > > > > member as well.  This could be remedied by somehow encoding the 
> > > > > > USES relationship for size into the type of buf that the 
> > > > > > optimization passes can see.  I feel like this may be a bit 
> > > > > > convoluted to specify in a future language extension in a way that 
> > > > > > will actually be well understood by developers, but it will likely 
> > > > > > generate faster runtime code.  This will also likely require a 
> > > > > > bigger change across passes.
> > > > > > 
> > > > > > Option 2: Encode the relationship in the type of size
> > > > > > 
> > > > > > The other option is to enhance the type of size somehow so that it 
> > > > > > discourages reordering and store elimination, basically pessimizing 
> > > > > > code.  I think volatile semantics might be the way to do this and 
> > > > > > may even be straightforward to specify in the future language 
> > > > > > extension given that it builds on a known language construct and is 
> > > > > > thematically related.  However it does pessimize output for code 
> > > > > > that implements __counted_by__.
> > > > > > 
> > > > > > Thanks,
> > > > > > Sid
> > > > > 
> > > > 
> > 
> > -- 
> > Univ.-Prof. Dr. rer. nat. Martin Uecker
> > Graz University of Technology
> > Institute of Biomedical Imaging
> > 
> > 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging


Reply via email to