[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Chuanqi Xu via Phabricator via cfe-commits Tue, 06 Jul 2021 18:52:49 -0700

ChuanqiXu added a comment.

In D97915#2860916 <https://reviews.llvm.org/D97915#2860916>, @ychen wrote:


> In D97915#2859237 <https://reviews.llvm.org/D97915#2859237>, @ChuanqiXu wrote:
>
>> In D97915#2848816 <https://reviews.llvm.org/D97915#2848816>, @ychen wrote:
>>
>>>> Thanks for clarifying. Let's solve the semantics problem first.
>>>> With the introduction about 'raw frame', I think it's necessary to 
>>>> introduce this concept in the section 'Switched-Resume Lowering' or even 
>>>> the section 'Introduction' in the document. Add a section to tell the 
>>>> terminology is satisfied too.
>>>
>>> Done.
>>>
>>>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 
>>>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same 
>>>> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying 
>>>> to solve the problem about memory leak. But I think we could use 
>>>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame 
>>>> (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can 
>>>> omit a field in the frame to save space.
>>>
>>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address 
>>> instead of raw frame pointer)
>>>
>>> Apologies for the confusion. I've briefly explained it here 
>>> https://reviews.llvm.org/D102145#2752445 I think it is not clear. 
>>> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine 
>>> frame field storing the `raw frame pointer`" only after `insertSpills` in 
>>> CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an 
>>> alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in 
>>> this review page). Using  "llvm.coro.raw.frame.ptr.offset" instead of  
>>> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please 
>>> check line 31. The downside is that the write to coroutine frame is not 
>>> through an alloca but a direct write. It is unusual because all fields in 
>>> the frame are stored as 1. special/header fields 2. alloca 3. splills. 
>>> Doing the write indirectly as Alloca makes me comfortable. The tradeoff is 
>>> one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
>>>
>>>   19 coro.alloc.align:                                 ; preds = 
>>> %coro.alloc.check.align
>>>   20   %3 = sub nsw i64 64, 16
>>>   21   %4 = add i64 128, %3
>>>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>>>   23   %mask = sub i64 64, 1
>>>   24   %intptr = ptrtoint i8* %call1 to i64
>>>   25   %over_boundary = add i64 %intptr, %mask
>>>   26   %inverted_mask = xor i64 %mask, -1
>>>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>>>   28   %diff = sub i64 %aligned_intptr, %intptr
>>>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>>>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 
>>> 64) ]
>>>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     
>>>   
>>>        ; Replace line 31 with below, and must makes sure line 46~line 48 is 
>>> skipped.
>>>        ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>>>        ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
>>>        ; %addr1 = bitcast i8* %addr to i8**
>>>        ; store i8* %call1, i8** %addr1, align 8
>>>   
>>>   
>>>   32   br label %coro.init.from.coro.alloc.align
>>>   33
>>>   34 coro.init.from.coro.alloc.align:                  ; preds = 
>>> %coro.alloc.align
>>>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, 
>>> %coro.alloc.align ]
>>>   36   br label %coro.init
>>>   37
>>>   38 coro.init:                                        ; preds = 
>>> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>>>      o.init.from.coro.alloc
>>>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ 
>>> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>>>      .coro.init, %coro.init.from.coro.alloc.align ]
>>>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>>>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 0
>>>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** 
>>> %resume.addr, align 8
>>>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void 
>>> (%f0.Frame*)* @f0.cleanup
>>>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 1
>>>   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, 
>>> align 8
>>>   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, 
>>> i32 2
>>>   47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
>>>   48   store i8* %8, i8** %7, align 8
>>>   49   br label %AllocaSpillBB
>>>   50
>>>   51 AllocaSpillBB:                                    ; preds = %coro.init
>>>   52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 4
>>>   53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 5
>>>   54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 6
>>>   55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 7
>>>   56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* 
>>> %FramePtr, i32 0, i32 8
>>>   57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, 
>>> %f0.Frame* %FramePtr, i32 0, i32 10
>>>   58   br label %PostSpill
>>>
>>>
>>>
>>>
>>>> Then I am a little confused for the design again, since we would treat the 
>>>> value for CoroBegin as the address of coroutine frame in the past and it 
>>>> looks like to be the raw frame now. Let me reconsider if it is OK.
>>>
>>> The returned value of CoroBegin is still coroutine frame not a raw frame 
>>> even if the frame is overaligned. You could check the above code.
>>
>> Thanks for clarifying!
>>
>> I don't understand why we need to store the address for coroutine raw frame 
>> in the coroutine frame. For example, `%call1` in your example marks the 
>> address for the raw frame. Then can we use the value `%call1` in every place 
>> where we want to use the address for coroutine frame?
>> If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in 
>> the frontend if we need to use the address for the raw frame. Then in the 
>> middle end, we could replace `llvm.coro.raw.frame` with `%call1` simply. 
>> Similarly, we could define intrinsic `llvm.coro.raw.frame.size`. As far as I 
>> know from the codes, the address for the coroutine frame is mainly used for 
>> deallocation. So it should be fine I guess.
>>
>> ---
>>
>> Then the code generated now looks roughly like:
>>
>>   if (should over align) {
>>      /// ...
>>      mem = ...
>>   } else {
>>      /// ...
>>      mem = ...
>>   }
>>   coro.begin(id, mem);
>>
>> It looks redundant since the `then` part and `else` part looks  very 
>> similar. I understand it would be eliminated in the middle end. But another 
>> problem is that the redundant implementation in clang. Maybe we could solve 
>> it by refactoring.
>> But I am wondering if it is possible to use another pattern (assume 
>> `llvm.coro.alloc` returns true):
>>
>>   %raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
>>   %true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we 
>> need a better name
>>   call @llvm.coro.begin(coro.id, %true.frame.ptr)
>>
>> Then for `llvm.coro.frame`, we could return `@raw.frame.ptr ` simply if the 
>> alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or 
>> we could do simply to align up for the coroutine frame. There are many APIs 
>> in Align.h.
>> And for the destruction, we could emit:
>>
>>   call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size()) 
>>
>> In this way, I guess we would get simpler implementation and generated codes.
>>
>> BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and 
>> llvm.coro.size would change slightly. They would stands for the address and 
>> size for the coroutine frame if we don't need over alignment.
>>
>> How do you think about this?
>
> I was confused by this and @rjmccall explained it here 
> https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover 
> "raw frame pointer" (`%call1`) from coroutine frame pointer statically at 
> deallocation time.

Oh, I understand why we need to store the address for raw frame now. Another 
question is that how do you think combine the pattern:

  if (should over align) {
     /// ...
     mem = ...
  } else {
     /// ...
     mem = ...
  }
  coro.begin(id, mem);

into this one:

  %true.frame.ptr = call @llvm.coro.create.frame(new(call 
@llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name
                                                                                
                                                                                
           ; It would be lowered to store the address of the raw frame to the 
alloca in the middle end if needed
  call @llvm.coro.begin(coro.id, %true.frame.ptr)

and this one:

  call @delete(call @llvm.coro.raw.frame.ptr(), call 
@llvm.coro.raw.frame.size())                                         ; Then use 
 `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly whenever 
we want.

It looks like we could generate the same code in the front for normal and over 
aligned coroutine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Reply via email to